For a full list of episodes, guests, and topics, check out our episode guide.
Go to GuideIn this episode of Alter Everything, we chat with Patrick Leung, CTO of Faro Health, about the innovative ways his company is using generative AI to revolutionize the clinical trial process. They discuss handling unstructured data, insights generated by AI to optimize trials, and significant cost savings in the pharmaceutical space! Not sure what a clinical trial is? Fear not! Patrick explains the structure of clinical trials, Faro Health's AI-driven enhancements in medical writing, and the importance of data extraction and analysis.
[00:00:00] Introduction[00:00:00] Introduction to the Podcast and Guest
[00:00:00] Megan Bowers: Welcome to Alter Everything, a podcast about data science and analytics culture. I'm Megan Bowers, and today I am talking with Patrick Leung, CTO at Faro Health. In this episode, we chat about how his company is using generative AI to solve challenges in the clinical trial process, how they deal with unstructured data and what it looks like to get fresh insights and cost savings from AI in the pharmaceutical space.
Let's get started.
Hey Patrick, it's great to have you on our show today. Could you give a quick introduction to yourself for our listeners?
[00:00:39] Patrick Leung: Sure. I'm Patrick Leung. I'm the CTO of Faro Health, and I've been in the tech business for a number of decades now, including stints at Google and Two Sigma Investments, and a couple of companies that I co-founded.
[00:00:52] Megan Bowers: Awesome.
[00:00:52] Mission and Work of Faro Health
[00:00:52] Megan Bowers: I'm super excited to chat today, and I think a good place to start would just be to hear a little bit more about the mission of Faro Health and what you guys are doing.
[00:01:01] Patrick Leung: Sure. So Faro Health basically is an AI based platform for designing clinical trials and making them better for patients, for sponsors, and for sites.
We do that by providing a really full featured design tool where you can really fully model out the trial and receive insights as far as how that trial's gonna fare with respect to things like cost and patient burden, and all sorts of other metrics.
[00:01:24] Understanding Clinical Trials
[00:01:24] Megan Bowers: So before we jump in, let's just talk about what is a clinical trial and how are they structured just for those listeners who aren't in the space.
[00:01:33] Patrick Leung: Sure. So a clinical trial is basically a. Process and a plan to test the safety and efficacy of a new drug or of a drug, of an existing drug that maybe is being applied or tested. And the clinical trials that we focus on in at Faro are human trials. So once you pass your animal trials and the drug is at least proved to be somewhat safe, then we have these three different phases of clinical trials that initially, like the phase one trials focus a lot on the safety of the drug.
And so for instance, establishing what is a safe dosing. And so a lot, many times you get this kind of varying dosage that you start really low and then you gradually increase it and figure out at what point the side effects kick in. And we need to set a limit on dosage of the drug for safety reasons.
And then as you progress to phase two and phase three, the emphasis becomes more an efficacy, which is how well does this drug actually treat this condition relative to a placebo. So there's a lot of. Complexities associated with clinical trials. But one way of thinking about it is that it's like a giant schedule where you have, I imagine a spreadsheet and the rows are basically procedures to be performed on each patient, and the columns are basically time.
So you have weeks and days and so on. And so you had these, um, spreadsheet of Xs where each X is. This assessment like electrocardiogram, or maybe it's administering the drug or maybe it's testing vital signs, gets essentially performed on a patient on every, on this particular day or on this recurrence of days.
Maybe it's every second day of the week or weeks one, five and seven or whatever the case may be. So that's a way of a mental model for thinking about the trial. It's like a giant table of events that it get performed on the patients. And of course you can go down a rabbit hole of lots of different details there, which is what Faro has done with this product.
[00:03:17] Megan Bowers: Gotcha. So then what are some of the main problems that you guys are solving with ai?
[00:03:23] AI Solutions in Clinical Trials
[00:03:23] Patrick Leung: Yeah, so we're using AI in a couple of different ways. One notable one is medical writing, so we've built a system that can essentially take the data from our study designer tool and use large language models like GBT to generate the clinical protocol document and in the future, other documents as well.
And this is typically a really time consuming process, and we can basically sort of automate it with the help of human review using a proprietary kind of multi-model agent system that we've been building over the past year and a half. And, um, this is a lot of benefits to the sponsor, including saving a lot of time and coming up with trials that are, that are designed like more optimized because the insights we offer.
And then really eliminating duplication like in many cases, there's lots of. Different sections in a typical protocol document that might have similar information in them, so we can eliminate that duplicated data entry and writing, and also across documents. So when we expand beyond clinical trial protocol documents, we'll be able to leverage the same data to generate content across multiple documents.
So it's really making that whole process of medical writing much more efficient.
[00:04:30] Challenges and Innovations in Medical Writing
[00:04:30] Megan Bowers: So then before tools like this, like it was just super manual or there were like tons of Word documents stored in lots of different places, or what did the process look like before this?
[00:04:42] Patrick Leung: Yeah, this is one of the things that really surprised me coming into this space is that as you can imagine, a clinical trial.
Takes many years and spans multiple sites and involves potentially hundreds of people receiving all sorts of different really detailed assessments. So this is quite a complex plan, right? That that involves quite a lot of detailed design and previously. These plans essentially were modeled in Microsoft Word.
So for me, coming out of years of Tech, it was just really surprising to me, like, wow, how could you even do that? You know, whose idea was this? In the sense that I would've thought it would at least be Excel. At least it would be Excel, at least the spreadsheet, right? Yeah. But instead of, and this of course, is the standard tool for financial analysis.
So all the Wall Street people use Excel, but not even Excel. And so Faro jumped in there some number of years ago, maybe five years ago. The company was founded with the intent of building a really sophisticated sort of modern SaaS-based modeling system for actually really properly modeling these trials.
And as a result, we can produce insights, metrics, analytics, and really fuel these AI systems in ways that were not possible before because you're stuck with this very unstructured format in the form of a Microsoft Word document.
[00:05:51] Megan Bowers: I can imagine, and for people listening in all sorts of fields, dealing with data, like having your data in, in a Word document presents all kinds of challenges.
And then even extracting it out of that Word document, if the documents have passed hands, if there's hundreds of people working on it, if not every document has the same sections, I feel like that would be really challenging to even start getting that data out of there.
[00:06:18] Patrick Leung: Yeah, exactly.
[00:06:19] Data Extraction and Analytics
[00:06:19] Patrick Leung: Megan, this is actually one area that we've invested quite a lot of time and effort into as well, where it's pretty much like the opposite of generating documents, which is what I mentioned earlier, where we actually take a completed document, whether it's a PDF, that you can find a ton of these clinical trial PDFs and clinical trials.gov and other sites.
Essentially using AI and other methods to pass those documents and reconstruct or reverse engineer a really structured representation of that trial. And so we can essentially take a completed document and kinda reverse engineer it into our system and then provide analytics and all sorts of insights that.
Were previously not possible. So that's really exciting because a lot of pharmaceutical companies are sitting on quite a large trove of these documents, and so we can go in there and say, Hey, we can offer you benchmarking and analytics and metrics and even some kind of AI insights on these existing trials by passing them through this trial extraction pipeline that we've built.
So that's also like a real area of interest and investment for us on the AI front.
[00:07:20] Megan Bowers: Yeah, it's super cool to hear about, hear about a use case where it's unlocking things with AI that just previously there was no solution for.
[00:07:27] Patrick Leung: Well, it's, it's super interesting because at the outset, Scott, who's our CEO and founder and I, we sat down and started brainstorming what could we do with AI to enhance our product?
And we played with a bunch of things, including just using GPT, this fire up a GPT instance, and just ask it to go generate some. Critical sections from a clinical trial. And we quickly realized that on the face of it, sometimes it looked like it didn't know what it was talking about, but you know how these large language models are, right?
Like sometimes they just make things up and they, they won't tell you, they're not sure. They'll just go and blithely generate text that it thinks is the most likely response to your prompt. And so we really had to build a lot of safeguards and checks and balances into the system to get it up to the level of quality required to make medical writing teams happy and to make the FDA happy.
That's where a lot of our investment on the generative on the document generation side has gone into is ensuring that the quality is there, that there are no emissions, that the style and consistency is there, and that takes a lot of additional kind of AI based agents to go and examine the generated output and ensure that it's gonna be okay across many different dimensions of, okay.
So that that was really a big learning experience for us is taking a lot of the expertise in our team, the clinical expertise, and embodying it into this evaluation system for ensuring that the document quality is good. And we've learned a lot, and this is what I speak at, at conferences and do all these blog posts and podcasts like this, just talk about how the ins and outs of essentially generating this clinical documentation.
[00:08:58] Megan Bowers: And what you were saying about evaluating those outputs, were you saying that you use AI to basically evaluate the AI outputs or did I, did I understand that correctly?
[00:09:08] Patrick Leung: Yeah, that's exactly, your understanding is correct in the sense that you have one set of models that generates subsections within the protocol.
Each subsection can be quite different, and so it requires different data, different context, in some cases, even external documents to be uploaded to, to be able to generate those sections correctly. And then we have a whole other class of AI based agents that go in there and examine the output and asking questions like, Hey, did you talk about contacting family members or following up?
Or, you know, all these kind of details that need to be pretty much prescribed on a per subsection basis. So you have specific agents to generate each subsection, and then you have very specific agents to evaluate the quality of the output of each subsection.
[00:09:50] Megan Bowers: Okay. Wow. Like an army of agents sounds like a lot of agents.
Yeah. You provide
[00:09:54] Patrick Leung: and conquer kind of approach where you have to be very hyper specific because the documents are highly technical and need to be correct, and so having a one size fits all where it looks at the whole document, there are certain things you can do that with checking for overall style and consistency and tone.
But in terms of technical details, it has to be real divide and conquer.
[00:10:12] Megan Bowers: That's super interesting. I'm not sure that I've heard of an AI use case yet where you're training like different models fit for purpose for each kind of section of just one project. So that's really interesting. So then how do you evaluate the success You talked about like.
Getting things approved by the FDA. Obviously it has to meet certain standards, but how do you evaluate success for your models?
[00:10:36] Patrick Leung: Well, it really comes down to looking at the evaluation of the output, because the thing is that with these large language models, sometimes the output can be quite different depending on the inputs that you provide it with.
And so you really just have to look at whatever's generated and have some kind of notion of how complete is this? Section of text, like if we're talking about, say, an electrocardiogram section within the assessments and measurements section of the protocol document that has some very specific things that need to be there, and also we need to safeguard for.
The correctness of the content that's generated. And so go back and ensure that actually what's generated is correct and complete and has the right level of tone and all this kind of thing. So it really is just up to having a separate system that assesses, that's quite independent from the generation system that could independently come in and say.
How complete is this, and it is definitely like a checklist type of system. There's no sort of quality metric other than at least that we primarily pay attention to other than what percentage of these tests actually passed. There are some objective measures like Blur and Rouge and so on that are traditionally use for natural language processing, but we don't tend to use those so much because of the non-deterministic nature of these large language models.
And the fact that there's no sort of one correct answer because of the variability based on the inputs that we provide.
[00:11:55] Megan Bowers: Super interesting.
[00:11:56] Optimizing Clinical Trials with AI
[00:11:56] Megan Bowers: And then I wanted to touch again on something you mentioned earlier about like being able to provide more insights into clinical trials or the ai, like suggesting improvements.
Like what has that looked like and have you been surprised by any of the suggestions or consolidation or anything that's really come out of some of these models?
[00:12:16] Patrick Leung: Yeah, that, that's a really big topic. It's, it's super interesting because this, this kind of emerged out of our whole data extraction process that I mentioned before.
So we started downloading a bunch of trials from clinical trials.gov and analyzing them and putting them into this kind of data warehouse type of structure with our knowledge graph, all these kind of really good sort of data engineering kind of techniques. And then we found that when we started querying, we could actually make queries using the large language model to, to ask things like, Hey, given this trial that you're working on.
What other recent trials out there are similar to this trial and normally this would be of quite a potentially difficult and time consuming query to to respond to because it might involve downloading a whole bunch of candidate trials and reading through them and figuring out how similar are they really?
I. But with a suitably informed large language model that has access to a structured trial repository, you can actually answer that pretty quickly. And then you can go on from that and say, how did they end up doing? Did they run into issues with the FDA? Were there amendments required? That can cost tens of billions of dollars.
And so you can start then. Essentially performing inferences, like what would it take to actually reduce the chance of incurring an amendment? What would it take to make this trial better for patients and reduce the patient burden? So we just started trying out these queries and it started coming up with some pretty good suggestions on, for instance, removing certain assessments that it didn't consider were.
Actually critical to the trial and potentially could reduce the patient burden by reducing the amount of blood being drawn or the amount of time spent in the site because of fewer assessments being performed. So we found that we were actually on this kind of path to really being able to optimize clinical trials in ways that probably had never been done before.
And there is this problem sometimes where trials become bloated because of the whole, we're designing this at Microsoft Word, let's just copy and paste from a previous trial and then add more stuff. Let's add more assessments. We need to measure this. We need to measure that right in the trial before too long.
If you keep doing that, each successive trial gets more and more sort of bloated. Yeah, and so there's this whole lean trial methodology out there, but we found that we could actually do this very targeted surgical sort of lean trial process where we could actually make trials more optimal very quickly.
So it's really exciting. It's definitely the cutting edge as far as we're concerned. And so it's an act. It's an area of active work for us to really delve into the ways in which we can use AI to optimize the design of these trials and potentially save. We have a paper W that we published with Merck. We demonstrated we could save $120 million from a trial through basically avoiding amendments and getting the trial launched earlier and so that the, the new drug revenue could come in earlier.
So, so there's like a lot of factors that contribute to really potentially saving a lot of money for the sponsor and creating a better experience for patients that participate in the trial, which is also really important.
[00:14:57] Megan Bowers: Yeah, that's a huge win-win and it makes me think of just other generative AI use cases I've been hearing about where it's like, it is so good for those things where you're trying to find that needle in a haystack.
You're trying to find that one similar trial and this huge database of trials and like zero in on it and generative AI can just. Do it so much faster. Um,
[00:15:18] Patrick Leung: yeah, absolutely. Because even once you find the trial, you still gotta read it, you know, and these trials are like, yeah, obviously hundreds of pages long, right?
And so in very dense, very technical. And the LLM can easily read the whole thing and then just give you insights. It's just that our view is that it's really critical to model the trial thoroughly. And so if all we're doing is reading the text of the protocol document. It's actually not enough to fully understand the implications of the trial.
Like how long is, how many assessments are there, how burdensome are they on the patient? The protocol document that you might scrape from clinical trials.gov doesn't give you the whole story. So what we do is we mix in all these other data sets that relate to the cost and complexity of each procedure being performed, and also unpacking the schedule of activities so we have a full view on how many times the assessment is performed during the course of the trial.
These are things you can't just in a straightforward way, just rip out of the PDF. You have to really think about this and model it. And that's what we built this kind of automated trial extraction pipeline to do.
[00:16:18] Megan Bowers: Gotcha. That's super cool. Definitely dealing with a lot of data and a lot of data sources there.
Um, this is obviously very cost saving, like you said, and it's helps patient outcomes, but you're dealing with like real people in these trials. So do you face like privacy concerns when it comes to pulling in all this kind of data, working with this kind of data, using ai?
[00:16:42] Patrick Leung: We don't yet pull any sort of patient records or anything that would identify individual patients.
We're operating at the trial design level where the patient information is in the form of, Hey, in this cohort of patients going through this arm, there's a hundred people. And I think as time goes on, we're gonna have to really, I. Be concerned about this more and more because if we go downstream into doing things, like trying to predict, Hey, what's the likelihood of actually involving all the patients that are required for this study, that's gonna require more data about medical outcomes.
And of course that data can be anonymized, like we don't have to know the person's name or any information like that. And so we would definitely scrubble that out. What we're interested in is what characteristics of patients did and didn't make it into the trial. And how long did that trial, if we're analyzing historical trials, how long did that trial take to enroll patients?
And that can be done without any personally identifiable information at all. So I think it's something that we have to be very careful about, um, and that our customers care a lot about as well. And obviously patient advocacy groups and so on also care about this. And so it's a constant concern, especially when you're dealing with these large language models where if you are not careful about what you're doing, your data could end up being commingled with other customers or other companies or what have you.
And that's completely a no-no. Uh, at least for all of our customers, as you could imagine.
[00:17:59] Megan Bowers: Definitely.
[00:18:00] Future of Generative AI in Biotech
[00:18:00] Megan Bowers: So then, what makes you excited about the future for generative AI applications in your space?
[00:18:07] Patrick Leung: Ultimately, it all comes down to the mission, like how might we get groundbreaking new medications into people's hands cheaper and with less time?
And how might we help pharmaceutical companies to get as many of these medications through the process as they can? So that's exciting. I, I've always been really interested in design tools. Simply because it's rarely the case that there's one optimal solution. There's usually some sort of trade off, and so that makes for a really interesting problem to solve.
There are multiple factors, right? There's time to market, there's the patient experience in terms of burden and blood draw and so on. There's the site experience, so this is a very much a multi-party problem, and each party has their own utility function of what works for them, and so coming up with an optimal clinical trial ultimately is quite a challenging technical problem.
So that's exciting and I think that we can, just, the idea that we could really do things that just would be unthinkable before, like this kind of research questions of what recent trials are similar to this one and how did they fare and all this kind of operations that would normally take a lot of, I.
Trial and error and research and just time consuming work and therefore we're never done. We can now automate using these large language models. And ultimately, I think the whole process of designing a clinical trial will be much more informed by, both, by the proposed design and how to optimize it, as well as the future prospects of that trial.
Like being able to view the trade-offs between enrolling patients, enrolling sites, getting through the FDA and having no amendments, this kind of thing. And just making a really informed choice as far as what design you want for your trial. And ultimately being able to really over time, expand the pipeline of trials going through the process.
And I think there has been this effect of a real slowdown. Like I talk a lot about E room's law when I talk at conferences, which is this slightly dark humor around the fact that we have Moore's Law, which is the doubling of like silicon chip density. Transistor density every 18 months. E'S law is more spelled backwards and basically it means every nine years the cost of developing a new drug doubles.
Inflation adjusted, and that's held over the last, oh, really? Last 15 years. Yeah. It's just shocking. And there are a number of factors that's not just one smoking gun. There's a number of factors conspiring to make this the case consistently. And so the idea is that maybe AI can help bend that curve and essentially result in a whole bunch more innovation in the biotech space because the prospects of getting those drugs through trials is enhanced.
So that's my dream, and I think it's gonna be really interesting, especially with all these advances in life sciences, and I think that there's just a lot of room for innovation here, which is super exciting to me as a technologist.
[00:20:46] Megan Bowers: Definitely, it sounds like a much improved future in the biotech space and super exciting to just unlock things that couldn't be done before that required so many man hours to do that.
It was just not feasible or too costly.
[00:21:00] Conclusion and Farewell
[00:21:00] Megan Bowers: It's been really nice to chat with you. Thanks for sharing about your company, what you guys are doing. Um, really cool use cases. Really cool use of ai, so thanks for joining.
[00:21:10] Patrick Leung: Thank you, Megan. This been real pleasure.
[00:21:13] Megan Bowers: Thanks for listening to learn more about Patrick and Faro Health.
Head over to our show notes on alteryx.com/podcast. And if you like this episode, leave us a review. See you next time.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.