In this repo, we have a baseline model for CYP inhibition. It is a multitask CheMeleon model trained on pIC50 data curated from ChEMBL for the following targets: CYP1A2, CYP2D6, CYP3A4, CYP3C9.

It is a no split model, meaning it has been trained with no data allocated to validation and test sets and with just a training set fraction of 1.0.

Getting Started

Pre-requisites

We highly recommend you have the Anvil framework from openadmet-models installed in an environment (called openadmet-models) for ease of use and full utilization of OpenADMET's models. The installation instructions can be found here.

Alternatively, you can also use Docker to spin up a containerized pre-installed environment to run openadmet-models. Just be sure you are mounting the correct folder (./cyp1a2-cyp2d6-cyp3a4-cyp3c9-chemeleon-baseline) where you've downloaded the model.

If you're using a gpu, run:

docker run -it --user=root --rm  \
    -v ./cyp1a2-cyp2d6-cyp3a4-cyp3c9-chemeleon-baseline:/home/mambauser/model:rw \
    --runtime=nvidia 
    --gpus 
    all ghcr.io/openadmet/openadmet-models:main

Otherwise, for cpu only:

docker run -it --user=root --rm  \
    -v ./pxr-chemeleon-baseline:/home/mambauser/model:rw \
    all ghcr.io/openadmet/openadmet-models:main

You will also need git lfs installed.

Downloading the model

After installing Anvil, clone the model repo:

git clone https://huggingface.co/openadmet/cyp1a2-cyp2d6-cyp3a4-cyp3c9-chemeleon-baseline/

Change to the repo directory. Ensure you have git lfs installed for the repo and get the large model files:

git lfs install
git lfs pull

You are now ready to use the model!

Using the model

We will use this model for inference or, to predict the pIC50s of a set of molecular compounds unseen to the model. For demonstration purposes, we have provided a small subset of compounds from a ZINC deck in the file compounds_for_inference.csv.

The generic command to run our inference pipeline is:

    openadmet predict \
        --input-path <the path to the data to predict on> \
        --input-col <the column to of the data to predict on, often SMILES> \
        --model-dir <the anvil_training directory of the model to predict with> \
        --output-csv <the path to an output CSV to save the predictions to> \
        --accelerator <whether to use gpu or cpu, defaults to gpu>

You can run this directly in your command line, OR you can use the bash script we've provided, run_model_inference.sh.

For our working example, this command becomes:

openadmet predict \
    --input-path compounds_for_inference.csv \
    --input-col OPENADMET_CANONICAL_SMILES \
    --model-dir anvil_training/ \
    --output-csv predictions.csv \
    --accelerator cpu

You can easily substitute your own set of compounds, simply modify the --input-path and --input-col arguments for your specific dataset.

In our example, this outputs a file called predictions.csv which will have predicted (the OADMET_PRED columns) pIC50 values for each compound for each CYP target:

'OADMET_PRED_openadmet-AC50_OPENADMET_LOGAC50_cyp3a4',
'OADMET_STD_openadmet-AC50_OPENADMET_LOGAC50_cyp3a4',
'OADMET_PRED_openadmet-AC50_OPENADMET_LOGAC50_cyp2d6',
'OADMET_STD_openadmet-AC50_OPENADMET_LOGAC50_cyp2d6',
'OADMET_PRED_openadmet-AC50_OPENADMET_LOGAC50_cyp2c9',
'OADMET_STD_openadmet-AC50_OPENADMET_LOGAC50_cyp2c9',
'OADMET_PRED_openadmet-AC50_OPENADMET_LOGAC50_cyp1a2',
'OADMET_STD_openadmet-AC50_OPENADMET_LOGAC50_cyp1a2'

NOTE In this example, the standard deviation (OADMET_STD) columns are empty because uncertainty cannot be estimated unless training an ensemble of models. For further details, visit our docs.

Using Docker (advanced)

Alternatively you can use Docker to quickly spin up a containerised pre-installed environment for running openadmet-models, with the model mounted.

Once you have cloned the repo and pulled the relevant files like the above, start an interactive container instance with openadmet-models ready to go.

docker run -it  --user=root  --rm -v ./cyp1a2-cyp2d6-cyp3a4-cyp3c9-chemeleon-baseline:/home/mambauser/model:rw --runtime=nvidia --gpus all ghcr.io/openadmet/openadmet-models:main

You can then run inference interactively as above with the model directory mounted as model

Or to run the example inference script (run_model_inference.sh) in its entirety (non-interactive)

docker run --user=root  --rm -v ./cyp1a2-cyp2d6-cyp3a4-cyp3c9-chemeleon-baseline:/home/mambauser/model:rw --runtime=nvidia --gpus all ghcr.io/openadmet/openadmet-models:main /bin/bash -c "cd /home/mambauser/model && ./run_model_inference.sh"

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support