In-silico screening of drug-like molecules.
Potencyscreen is a small library to predict molecular properties from string descriptors such as SMILES or SELFIES. It features classical ML models based on hand-crafted fingerprint functions as well as deep learning models, leveraging the molfeat library.
Getting Started
The following code provides an example on how to optimize the hyperparameters of a random forrest model and evaluate its performance on the test set.
import datamol as dm
from molfeat.trans.fp import FPVecTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from potencyscreen import trainers
df = dm.data.freesolv() # test dataset
# define ML model and hyperparameters
param_grid = {
'feat__kind': ['fcfp:6', 'ecfp:6'],
'feat__length': [1024],
'rf__n_estimators': [100, 500]
}
pipe = Pipeline([('feat', FPVecTransformer(kind='rdkit')),
('scaler', StandardScaler()),
('rf', RandomForestRegressor())])
grid_search = GridSearchCV(pipe, param_grid=param_grid)
# optimize hyperparameters and evaluate performance metrics on test set
sk_trainer = trainers.SklearnTrainer(
grid_search, df['smiles'], df['expt'], thresh=0.)
_ = sk_trainer.train()
_, _ = sk_trainer.test_metrics()
Installation
To use potencyscreen and run all examples, you can simply install the package via the following command:
pip install -e .
Run example
To run the jupyter notebook, execute
jupyter notebook examples/EGFR/
and open the file ‘training_inference.ipynb’ in the browser window.
Requirements
To run the example, the following packages are required:
"datamol",
"rdkit",
"molfeat",
"python-dotenv",
"pandas",
"numpy",
"scikit-learn",
"torch",
"torch-geometric",
"tqdm",
"jupyter"
The code was tested with Python 3.8.
For Developers
To install all packages required to compile the documentation and to run pytest, you may install potencyscreen with the following options:
pip install -e .[docs,test]
To run the tests:
pytest tests
To compile the documentation:
make -C docs html
Contact
For questions, please contact stephan.thaler@tum.de.