
AI is revolutionizing drug discovery with its ability to analyze vast datasets. Particularly beneficial in the early drug discovery phase, AI can help identify potential drug targets more effectively than traditional methods, accelerating the initial stages of drug development by quickly sifting through large volumes of data.
According to a GlobalData survey published in April 2025, AI is considered the most disruptive technology among businesses today, including the healthcare industry, with 73% of respondents claiming that AI would either significantly or slightly disrupt their industry[i]. Indeed, in 2024, the Nobel Prize in Chemistry was awarded to the DeepMind team for their work on AlphaFold[ii], an AI system that accurately predicts protein structures. Awarded for ‘computational protein design’ and ‘protein structure prediction,’ Heiner Linke, Chair of the Nobel Committee for Chemistry, said that “predicting protein structures from their amino acid sequences…opens up vast possibilities.”
Once targets are identified, AI continues to play a crucial role in lead generation and optimization. AI models can predict molecular interactions and assist in designing novel compounds tailored to specific therapeutic goals. Generative AI, for instance, is being used to create small molecules or proteins that meet specific criteria, enhancing the drug design process[iii].
AI’s role also extends into clinical trials, with AI already being used to enhance trial design, feasibility and site selection, as well as patient recruitment and retention, alongside data analysis and regulatory submission and review. In 2023, a study found that a clinical trial patient matching tool was able to reduce pre-screening checking time for physicians by 90%, by leveraging Large Language Models (LLMs)[iv].
This integration of AI into the drug discovery process has the potential to significantly lower costs by improving efficiency. Some experts suggest that the cost of reaching a Phase I readout could decrease from over US$100 million to around $70 million[v], making the pursuit of innovative therapies more cost-effective, while also condensing a typical four to five-year exploratory research phase into less than a year, a significant improvement for indications with few treatment options[vi].
However, AI’s full potential is constrained by challenges such as the need for high-quality datasets, regulatory hurdles and even a lack of human expertise. While regulators are responding by establishing guidelines and promoting collaboration with industry stakeholders, challenges remain, particularly regarding ethical implications of AI technologies in healthcare. Widespread adoption is also hindered by the biological data required to train AI models, which is often expensive and time-consuming to generate.
Early-stage drug discovery
Machine learning (ML) models significantly enhance efficiency in early-stage drug discovery, particularly in hit identification and lead optimization, when compared to traditional approaches. One of the most notable advantages of ML is its ability to rapidly process vast amounts of data. In traditional drug discovery, identifying a single candidate can take years, requiring the synthesis and testing of thousands of molecules over an extended period. Company filings, collated by GlobalData[vii], show that traditional methods can require synthesizing around 5,000 molecules over four to six years to identify a promising candidate.
In contrast, ML models can screen billions of molecules virtually, reducing the number of physical tests to only a few hundred. This capability allows researchers to prioritize one billion molecules in as little as one day, compared to one million days with conventional methods. GlobalData’s Drug Database shows that there are currently 40 regenerative medicine therapies that have been discovered or are being developed using AI[viii]. Of these, three are undergoing Phase II clinical trials, including Aspen Neuroscience’s ANPD-001, which is currently in Phase II human clinical trials for the treatment of Parkinson’s disease. AI and ML play an important role in this process as cells are tested to ensure proper function, including ML-based genetic tests to evaluate cell quality.
ML models can also employ active learning techniques, which allows for more accurate predictions of molecule properties. By prioritizing molecules for physics-based analyses, researchers can catch potential issues earlier in the drug discovery process, further enhancing efficiency and reducing costs.
Data interrogation tools
The integration of Generative AI and data analytics with data interrogation tools is significantly enhancing the efficiency and effectiveness of drug discovery processes by expediting compound and target screening and improving assessments of biological activity, safety and facilitating the identification of new uses for existing drugs.
AI-driven screening methods, such as High-Throughput Screening (HTS), allow researchers to rapidly evaluate large libraries of compounds. This significantly accelerates the identification of lead compounds by examining multiple candidates simultaneously, while tools for virtual screening, structure-based drug design, and ligand-based drug design, facilitate the identification of potential drug candidates by predicting how compounds will interact with biological targets.
Tools that compile extensive data from various databases enable researchers to assess the safety profiles of compounds more effectively. By analyzing historical data on toxicity and pharmacokinetics, researchers can identify potential adverse effects and optimize drug candidates accordingly. Leveraging existing data on drug candidates that have previously been tested allows researchers to explore new indications. This dual approach of developing new candidates while also ‘rescuing’ historical candidates minimizes risk and enhances the potential for successful outcomes in clinical trials.
Training AI models
Training AI models in drug discovery relies heavily on various types of biological and clinical data, with an emphasis on diversity to mitigate biases. Addressing data gaps and biases requires a combination of innovative techniques, open data practices and regulatory oversight. The most critical data types for training AI models in drug discovery include biological and clinical trial data, digital biomarkers and diverse demographic data.
Biological data is essential for understanding biological processes and drug interactions, but generating such data is often slow and resource intensive. Clinical trial data, including patient demographics, treatment responses, and outcomes, is vital for predicting treatment responses and developing effective drugs. Digital biomarkers, derived from digital behavioral interventions and patient monitoring systems, provide insights into treatment responses and enhance predictive capabilities of AI models.
Addressing data gaps and biases in AI training is crucial for drug development. Strategies include data augmentation, open-source data sharing, fine-tuning techniques, human-in-the-loop approaches and regulatory frameworks. Data augmentation increases the size of the training dataset by creating variations of existing data, while open-source data sharing allows for broader access to diverse data sources. Fine-tuning techniques prioritize the use of representative training data to correct biases, while human-in-the-loop approaches involve expert feedback during the AI training process to identify and correct biases.
How electronic health records (EHRs) can provide crucial insights
Veradigm, a leading provider of healthcare data and technology solutions, is capturing structured data across diverse patient populations and geographies, by utilizing AI to analyze electronic health record (EHR) de-identified data within the Veradigm Network. Veradigm’s AI-driven approach also enables scalable extraction of information from unstructured data, offering life science organizations deeper, real-time insights into patient experiences and outcomes.
Veradigm has recently developed* an AI-enabled, GLP-1–focused real-world database designed to support life sciences companies in understanding and optimizing treatment outcomes. Through advanced AI-driven data curation, the platform extracts real-world insights from clinician notes, including reasons for GLP-1 therapy discontinuation such as adverse events and perceived lack of efficacy. It also identifies off-label usage patterns and relevant comorbidities that may influence therapeutic decisions. Combined with clinical validation to ensure data accuracy and reliability, Veradigm’s solution provides fit-for-purpose evidence to accelerate research and improve patient care strategies.
For more information on Veradigm’s approach, download the free paper below.
[i] GlobalData: Tech Sentiment Polls Q1 2025, April 2025. https://www.globaldata.com/store/report/tech-sentiment-polls-quarterly-analysis/
[ii] https://www.nobelprize.org/prizes/chemistry/2024/press-release/
[iii] Zhao L, Wang J, Pang L, Liu Y, Zhang J. GANsDTA: Predicting Drug-Target Binding Affinity Using GANs. Front Genet. 2020 Jan 9;10:1243. doi: 10.3389/fgene.2019.01243. PMID: 31993067; PMCID: PMC6962343. https://pmc.ncbi.nlm.nih.gov/articles/PMC6962343/
[iv] https://www.researchgate.net/publication/370071234_Improving_Patient_Pre-screening_for_Clinical_Trials_Assisting_Physicians_with_Large_Language_Models
[v] https://www.pharmaceutical-technology.com/features/the-ai-advantage-in-discovering-new-medicines/
[vi] GlobalData: Artificial Intelligence in Healthcare, September 2024
[vii] QuantumPharm Inc 2024, prospectus. https://ir.xtalpi.com/media/uopbz44q/2024060400059.pdf
[viii] GlobalData: February 20, 2025 Analyst Briefing. https://pharma.globaldata.com/Analysis/details/Using-Artificial-Intelligence-to-Enhance-Regenerative-Medicine
* https://investor.veradigm.com/news-releases/news-release-details/veradigm-advances-glp-1-real-world-evidence-generation-ai-driven