Machine learning for in-silico lead identification

Discover how AI can help you select promising lead compounds through predicting their ADMET properties and off-target side effects

It is estimated that over 80% of new drug compounds fail during drug development because of poor ADMET properties such as absorption, distribution, metabolism, excretion, and toxicity. Typically, companies spend tens to hundreds of millions U.S. dollars on drug discovery and development processes which on average takes 10-15 years yielding an average 10% of success rate. In this analysis, we present how Kantify, with its novel in silico AI drug discovery pipeline, can rapidly predict two essential elements of the lead identification analysis 1. ADMET properties and 2. Off-target binding score. Discover below how to rapidly and efficiently perform lead identification, even on unknown compounds.

The traditional drug discovery process

The commonly run drug discovery process starts with target identification, hit identification, and proceeds with lead identification. In short, once the target protein/s involved in a specific disease are discovered, comes the stage of finding compounds that interact with the target protein in a desired way (for example activation, inhibition). In traditional drug discovery these compounds are usually found through an in vitro laboratory process called high-throughput screening (HTS), wherein using automated equipment large libraries of compounds are tested for their ability to interact with the desired target. From there, the compounds which show manifestation of desired activity with the target protein in screening assays are picked and categorized as hit compounds. However, not all hit compounds are perfect drug candidates. Therefore, in the following stage (known as hit-to-lead stage) the most promising hits are selected for further development, noted as lead compounds. A lead compound is generally defined as a new chemical compound that could potentially be developed into a new drug by optimizing its beneficial effects and minimizing its adverse effects. To achieve this, the lead compounds may undergo further screening for off-target effects, as well as testing for physico-chemical and ADMET properties. In pharmacology and pharmacokinetics, ADMET stands as an abbreviation for absorption, distribution, metabolism, excretion, and toxicity of a particular lead compound. A successful lead compound should be absorbed in the bloodstream, distributed to the exact point of action in the body, metabolized efficiently, excreted effectively through the body, and most importantly, it should pass safety studies on toxicity (e.g. liver toxicity). Once the lead compound is optimized, in vivo testing such as animal trials can start.

Figure 1: Overview of the traditional preclinical drug discovery process

The challenges with traditional lead compound selection

The research and development aspect of selecting a lead compound has shown to be a high-risk investment, usually experiencing unexpected failures in different stages of the drug development. One main reason for the failure of lead compounds as drugs, is the efficacy and the safety aspects of the drug which are largely related to absorption, distribution, metabolism, excretion, and toxicity properties (ADMET properties) of the compound. Traditionally, researchers use in-vivo and in-vitro experiments to assess a specific ADMET property for a hit compound. An example of such an experiment can be the in-vivo measurement of the cytochrome P-450 liver enzymes’ capacity to metabolize the hit compound. The traditional way of studying lead compounds for their properties is usually done using cellular (cell-based) assays and animal testing. Traditional methods for lead identification are shown to be vastly expensive and time-consuming, as well as ethically troublesome with animal trials often being associated with animal cruelty. According to the FDA, only 5 in 5,000 drug compounds that enter preclinical testing get to reach human clinical trials. When all drug compounds fail during their physicochemical testing, the drug discovery team is left with no option but to start over by screening a new library of compounds against the selected target protein.

The challenges of traditional ADMET properties computation techniques

ADMET properties play a crucial role in the discovery and optimization of promising lead compounds. In order to minimize failure and select the best lead compound, medicinal chemists are looking at developing computational strategies to help them predict how each potential drug would behave in an organism, to assess the risk of potential toxicity on organs, and biological pathways. The traditional ADMET properties analysis approach is used to provide a virtual screening and preliminary results of hit compounds before further validating their properties with in-vitro studies.

There are several publicly available computational tools and web platforms to predict ADMET properties, however, the tools have been shown incomplete and as not to be very accurate. Moreover, most of these computational tools are individual models which focus on specific ADMET properties and only a few can evaluate several ADMET properties simultaneously. The traditional ADMET computational approaches and methods are also lacking comprehensiveness and calculation accuracy. For example, molecular modeling plays a major role in in-silico analysis of a drug compound metabolism properties. However, it can only assess the possible interactions between compounds and metabolic enzymes; it cannot explicitly evaluate the ADMET risks of the drug candidate compounds. Furthermore, the PBPK method can predict multiple drug compound properties, but it can only provide common information about the biological behavior of organs or tissues. The QSAR data modeling method is often used as an essential method in drug discovery, to analyze relationships between structural properties of chemical compounds and biological activities. Despite its frequent use, QSAR has shown having many constraints in certain cases, such as being limited and inaccurate, yielding false correlations.

Leveraging Machine learning for in silico lead identification

In order to help drug discovery teams save time and resources in the selection of leads, Kantify has developed an in silico drug discovery solution based on artificial intelligence. While the Kantify solution covers the whole drug discovery pipeline, we focus in this article on how it performs ADMET prediction and off-target hit prediction, even on unknown compounds.

Kantify’s Machine Learning Solution for ADMET properties prediction

In order to help drug discovery teams bring new and better drugs to the market, Kantify has developed ZeptoWard, a novel in silico machine learning solution that can predict 88 ADMET property endpoints for any compound. We are using deep learning, a subset of machine learning based on artificial neural networks in which multiple layers of processing are used to extract many different features from the data. In particular, our machine learning algorithm for lead identification can help identify the best lead compound and calculate the following categories of drug properties:

  • Absorption (e.g. Caco-2 permeability, PGP-inhibitor, HIA, etc.)
  • Distribution (e.g. BBB penetration, PPB, VD, etc.)
  • Metabolism (e.g. CYP inhibitor and substrate)
  • Excretion (e.g. drug half-life, clearance of drug)
  • Toxicity (e.g. hepatotoxicity, carcinogenicity, respiratory toxicity, etc.)

Discover more on how we help advance drug discovery with machine learning ADMET properties prediction in our Platform section.

Kantify’s Machine Learning Solution for off-target side effect prediction

Determining side effects of drugs that have not yet entered clinical trials is an expensive, difficult and crucial task. Kantify uses its hit prediction technology, Zepto.Hit , to assess the degree of interaction a drug compound has with numerous proteins across the human body, to understand the selectivity of a compound, and predict potential side effects.

When a drug compound interacts with proteins other than those for which it was intended to bind (i.e. off-target interaction), unexpected side effects can occur which may be harmful for the human organism. The off-target effects of a drug compound within a human organism can be a substantial bottleneck in the development of new therapeutics. Therefore the reduction of those effects is desirable in drug development. The best way to diminish them is to uncover where and when they occur, and then design a way to avoid them while balancing for on-target efficiency. Kantify uses its machine learning solution Zepto.Hit, to compute the probability of any drug compound interacting with human off-target proteins. The solution can learn how a drug compound reacts with any human protein, and thereby proxy for expected side effects in clinical trials. When a drug compound shows a low degree of interaction, it means that it interacts with a few proteins only. On the other hand, a high degree of interaction means that the drug displays activity with many proteins within the body. We rely on such an approach to proxy for the expected quantity of side effects yielded by a drug.

Case Study: Lead Analysis over Known Drugs

In order to showcase the ability of our in-silico lead identification solution, a set of drugs from ChEMBL has been explored. ChEMBL is a widespread curated database of bioactive molecules with drug-like properties. Our dataset of choice contains 10.000 drugs among which half of them have entered a phase 1 clinical trial or higher.

For the case study we use two components of the Kantify in silico Drug Discovery solution:

  • 1 - Zepto.Ward, which predicts the ADMET properties of a compound

  • 2 - Zepto.Hit, which predicts the off-target compound interaction

Let us first describe the three drugs chosen for this analysis.


Diclofenamide has been commercialized under the name Daranide for several decades in the United States. Its primary usage focuses on treating acute angle closure glaucoma. Glaucoma is a group of eye diseases that damages the eye’s optic nerve and can cause vision loss. Diclofenamide deals with glaucoma by draining fluids out of the eye. It has recently been designated by the European Medicines Agency as an orphan drug to treat primary periodic paralysis. Diclofenamide is sold under the name Keveyis in the USA in this case.

Figure 2: Chemical structure of Diclofenamide


Vinblastine is a chemotherapy medication meant to treat several cancers such as Hodgkin’s lymphoma, lung cancer, bladder cancer, melanoma and testicular cancer. It is prescribed alongside other chemotherapy drugs and it gets injected into the veins.

Figure 3: Chemical structure of Vinblastine


Amlexanox, traded under the name Aphthasol, is an anti-inflammatory drug used to treat recurrent aphthous ulcers (canker sores). In Japan, it is also being prescribed for several other inflammatory conditions.

Figure 4: Chemical structure of Amlexanox

Overall results of the Lead Identification Analysis

The objective of the lead identification analysis is to predict 2 scores:

  • An ADMET score that reflects the drug-likeness and safety of the compound; the higher the score, the higher the chances that the human body can safely process the drug.

  • An off-target binding score that quantifies the interaction of a drug compound binding with human proteins, other than its intended protein target; a high score means a low degree of interaction, meaning a drug compound is expected to not have many side effects on the body.

The graph below presents the results of the analysis. We observe that:

  • Diclofenamide (glaucoma) shows both a great ADMET score and off-target binding score
  • Vinblastine (chemotherapy) shows both a poor ADMET score and off-target binding score
  • Amlexanox (anti-inflammatory) shows a great ADMET score but a poor off-target binding score

Figure 5: Comparison of the 3 drugs through lead scoring

Using Zepto.Ward to predict the ADMET score

Thanks to ZeptoWard, Kantify can rapidly define what are the expected ADMET properties of a specific compound. The ADMET score quantifies the safeness of a drug: the higher the score, the safer the molecule is expected to be. For instance, we see that Diclofenamide (glaucoma) exhibits the best possible score (100%) while Vinblastine (chemotherapy) shows a low ADMET value of 8%.

The estimated ADMET scores reflect the behavior that the 3 studied drugs have on the human body. Diclofenamide (glaucoma) is a drug that has been prescribed for half a decade and is also being considered for drug repurposing. It is therefore being viewed as a substantially safe drug. On the other hand, Vinblastine is a chemotherapy medication meant to fight cancer. Such molecules are designed to kill some cells or prevent them from growing. Hence, collateral damages are much heavier than those carried by Diclofenamide (glaucoma). Finally, we see that Amlexanox (anti-inflammatory) shows a good ADMET score of 86%. Such a great score is expected in the class of anti-inflammatory drugs.

One criteria composing the ADMET score relates to hepatotoxicity. Hepatotoxicity is known for being a major driver of drug withdrawals. In the case of Diclofenamide (glaucoma), the probability of inducing hepatotoxicity is estimated to be lower than 1% according to our AI algorithm. This output partially explains the great ADMET value of Diclofenamide (glaucoma). On the contrary, Vinblastine (chemotherapy) turns out to have a much higher probability of inducing hepatotoxicity. Consequently, the ADMET score is being degraded by this measurement.

Using Zepto.Hit to predict the Off-target binding score

Thanks to Zepto.Hit, Kantify can estimate the off-target binding probability of small molecules across a large set of proteins, i.e whether a compound will bind to other targets beyond the target of interest. Such data is particularly useful to discriminate between compounds and anticipate unexpected side effects. All other things being equal, a drug that binds to a few proteins should be favored against drugs that bind to a large set of proteins. The rationale behind this approach is straightforward: as long as a medication interacts with a narrow set of components, few adverse effects are to be expected.

We first start by predicting the off-target binding probabilities. Then we predict an off-target binding score. The higher the binding score is, the safer the drug is expected to be.

Diclofenamide (glaucoma) depicts an off-target binding score above 99%, while Vinblastine (chemotherapy) reaches a low 33%. As Vinblastine (chemotherapy) is a much more aggressive medication towards the human body than Diclofenamide, it is not surprising to see that Zepto.Hit yields off-target scores far apart from one another: we assume that the chemotherapy drug attacks a wider and more diversified set of cells than Diclofenamide (glaucoma). Finally, let us have a closer look at the third chosen compound, Amlexanox (anti-inflammatory). While the other molecules clearly stand out from each other by showing a large difference in their off-target scoring, the last compound seems to yield nuanced properties. Even though it depicts a far better ADMET score than the chemotherapy medication(Vinblastine), it also exhibits an off-target binding score slightly lower than the chemotherapy drug. Amlexanox (anti-inflammatory) likely has a complex interaction process, which leads the compound to interact with lots of other components in the human body. The mechanism of actions around Amlexanox (anti-inflammatory) are not well understood and it is still being under scrutiny by the scientific community (see e.g. Han et al. (2020) ).

Benefits and next steps

In a very quick time frame, the Kantify in silico technology enables research teams to easily and rapidly filter out compounds that will fail at a later stage of the pipeline, and focus on the promising compounds or even optimize their compounds. This unique technology enables our partners to have a maximum chance to succeed in the clinical validation stages, and de-risk their drug development efforts.

While the lead analysis conducted in this article focuses on commercialized drugs with known effects, our AI for Drug Discovery platform Zeptomics can be used on any small compound, whether already known or not.

Contact us to learn more about how to use Artificial Intelligence in drug discovery: target identification, hit prediction, lead identification.

Get in touch !