Enchant whitepaper: Breaking the data wall between lab and clinic

Enchant is a breakthrough multi-modal transformer trained on dozens of data modalities and sources from across the full span of drug discovery and development.
Press
This is some text inside of a div block.
October 29, 2024

Enchant is a breakthrough multi-modal transformer trained on dozens of data modalities and sources from across the full span of drug discovery and development.

The discovery and development of new medicines is inefficient and expensive, leaving many patients' medical needs unmet. Sector-wide, low efficiency is driven by late-stage clinical failure.

The full process involves two distinct stages: a discovery stage, where many (for example thousands of) molecules are under investigation in the laboratory; and a clinical stage, where a single drug candidate is tested in human volunteers or patients.

Innovations in artificial intelligence have improved and accelerated many aspects of these stages individually. For example, impressive progress has been achieved in target identification, molecular discovery, and clinical trial design.

Data from the discovery stage is abundant, and becoming even more abundant through laboratory automation. But economic and ethical barriers prevent generation of clinical data on any but a vanishingly small number of distinct molecules. This creates a data wall between the laboratory and the clinic, and is the fundamental obstacle to the efficient creation of medicines.

At Iambic Therapeutics we have developed Enchant, a cutting-edge AI model, that breaks through this wall.

Figure 1. Schematic illustration of Enchant. Data are abundant during the discovery stage, and scarce at the clinical stage. Enchant allows data from the discovery stage – oftentimes on unrelated molecules – to be used for the prediction of clinical outcomes.

The purpose of Enchant is to make predictions for data-scarce clinical outcomes by leveraging abundant discovery data. It builds on our previous work, which addressed two challenges: different stages create data of different types, and public data sources are messy and of mixed quality. Here we use Enchant to predict clinical properties by training on increasing amounts of discovery data.

As a first demonstration, we predict human pharmacokinetics (PK) of candidate drugs. We organize the discovery data in the order that they are typically gathered during the discovery process. Enchant produces reliable predictions of clinical PK properties by being trained on more pre-clinical, laboratory data, as shown in Figure 2. We see that training Enchant with clinical data on just 5 distinct molecules (less than 1% of the dataset) yields strong predictive power. Competitor technologies trained at such low training set sizes do not yield predictive models. 

A screenshot of a graphDescription automatically generated
Figure 2. Enchant predictions of human PK can be improved by training on preclinical data. Left: predictive power (represented through Spearman correlation) of Enchant for human PK half-life (Obach human PK dataset). Models trained on very limited amounts of human data (1% and 5% of Obach) nevertheless become strongly predictive of human PK on unseen molecules, through transfer learning from more readily available preclinical data. Right: Similar behavior is observed for other human PK properties, and the table compares Spearman correlation for all of the PK parameters reported by Obach for combination of preclinical data with three different fractions of human clinical training data. In each case the value in parentheses is performance of the current state-of the-art literature model.

Enchant also surpasses the state of the art when all clinical PK training data are used. Trained on all of the available preclinical data together with the full Obach dataset, Enchant achieves a Spearman R of 0.74 for human PK half life, outperforming the previous state-of-the-art performance of 0.58. Similar performance is observed for other human clinical PK properties.

While Enchant represents a paradigm shift in breaking the data wall between laboratory and clinical data, it's important to contextualize its performance within the landscape of existing models (Figure 3). Several groups have attempted the development of large language models (LLMs) for drug discovery, but none approach the predictive power of Enchant, and Enchant's ability to leverage abundant pre-clinical data to make accurate clinical predictions is a unique and fundamental breakthrough in the field. 

Figure 3. Comparison of Enchant to other LLM-based models in drug discovery.

Drug development is inefficient and costly, mainly due to the lack of clinical data, which leads to late-stage failures. Enchant, our new AI model, breaks the data wall by using abundant discovery data to predict clinical outcomes. By additionally incorporating even small amounts of clinical data, it outperforms existing models trained on full clinical datasets.

Enchant has enabled Iambic to reduce clinical risk in our programs at the discovery stage. Even in the absence of clinical data, it provides valuable insights to avoid potential pitfalls. Combined with our other AI and platform technologies, Iambic works to efficiently deliver safer, more effective medicines.

Contact: partnerships@iambic.ai

1 We’re primarily addressing small molecules, but similar considerations apply to other therapeutic modalities.

In a prior version of this blog we provided the incorrect name for the developer of the nach0 model. This had no impact on the reported outcomes and we have updated the reference.

Media Contacts