Trainers

A collection of trainers for molecular property prediction from SMILES.

Trainer interface

class potencyscreen.trainers.TrainerTemplate(thresh)[source]

Abstract class providing shared evaluation functionality and a common interface for all trainers.

thresh: Threshold above which a molecule is classified as belonging to the True class.

predict(smiles)[source]

Classifies an unseen molecule and predicts its pic50 value.

Parameters:: smiles (str) – SMILES of an unseen molecule.
Return type:: Tuple[float, bool]
Returns:: (value, acceptance) containing the predicted value and a bool whether the predicted value exceeded the threshold.

test_metrics(verbose=True)[source]

Evaluates the best optained model on the test dataset and computes important model evaluation metrics.

Parameters:: verbose (bool) – If true, prints the obtained metrics.
Return type:: Tuple[Tuple[ndarray, ndarray], Tuple[Any, …]]
Returns:: (test_eval, metrics), where ‘test_eval’ is a Tuple (prediction, target) containing all predicted and target values of the test set and ‘metrics’ is a Tuple (r2, mae, rmse, acc, bac, sens, spec, conf_matrix) containing all evaluation metrics.

Classical ML

class potencyscreen.trainers.SklearnTrainer(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]

Trainer for Sci-kit learn ML regression models.

Inspired by the molfeat Sci-kit learn tutorial.

search

Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.

Type:: model_selection.RandomizedSearchCV

__init__(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]

Sci-kit learn trainer.

Parameters:

search (Union[RandomizedSearchCV, GridSearchCV]) – Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.
smiles (Series) – SMILES input data.
targets (Series) – Corresponding target property for each SMILES.
thresh (float) – Threshold above which a molecule is classified as belonging to the True class.
test_size (float) – Portion of dataset to reserve for test data.
seed (int) – Random seed for cross validation search.

train()[source]

Searches for optimal hyperparameters by fitting the CV search.

Return type:: float
Returns:: Best obtained CV score.

property train_results: DataFrame

Returns the cross validation search results.

Return type:: DataFrame

Deep Learning

class potencyscreen.trainers.PyTorchTrainer(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]

Trainer for deep learning models defined in PyTorch.

Trains a PyG graph neural network while tracking performance on validation data. Enables predictions using the best obtained model.

Inspired by the molfeat PyG tutorial.

train_losses

Average train loss in each epoch.

Type:: List

val_losses

Average validation loss in each epoch.

Type:: List

best_model

Model with the best validation loss.

Type:: torch_geometric.nn.models.basic_gnn.BasicGNN

best_val_loss

Best validation loss.

Type:: float

__init__(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]

PyTorch Trainer.

Parameters:

model (BasicGNN) – PyTorch GNN model.
dataset (TorchDataset) – Dataset.
optimizer (Optimizer) – PyTorch Optimizer.
loss_fn (Callable) – PyTorch loss function.
thresh (float) – Threshold above which a molecule is classified as belonging to the True class.
lr_scheduler (Optional[ReduceLROnPlateau]) – ReduceLROnPlateau learning rate scheduler. Other schedulers are not guaranteed to work.
batch_size (int) – Training and inference batch size.
test_size (float) – Portion of dataset to reserve for test data.
val_size (float) – Portion of dataset to reserve for validation data.
seed (int) – Random seed for PyTorch random number generator.
device (str) – Device to run model.

train(epochs=10)[source]

Trains deep learning model.

Parameters:: epochs (int) – Number of training epochs.
Return type:: float
Returns:: Validation loss of best obtained model.