Trainers

A collection of trainers for molecular property prediction from SMILES.

Trainer interface

class potencyscreen.trainers.TrainerTemplate(thresh)[source]

Abstract class providing shared evaluation functionality and a common interface for all trainers.

thresh

Threshold above which a molecule is classified as belonging to the True class.

predict(smiles)[source]

Classifies an unseen molecule and predicts its pic50 value.

Parameters:

smiles (str) – SMILES of an unseen molecule.

Return type:

Tuple[float, bool]

Returns:

(value, acceptance) containing the predicted value and a bool whether the predicted value exceeded the threshold.

test_metrics(verbose=True)[source]

Evaluates the best optained model on the test dataset and computes important model evaluation metrics.

Parameters:

verbose (bool) – If true, prints the obtained metrics.

Return type:

Tuple[Tuple[ndarray, ndarray], Tuple[Any, …]]

Returns:

(test_eval, metrics), where ‘test_eval’ is a Tuple (prediction, target) containing all predicted and target values of the test set and ‘metrics’ is a Tuple (r2, mae, rmse, acc, bac, sens, spec, conf_matrix) containing all evaluation metrics.

Classical ML

class potencyscreen.trainers.SklearnTrainer(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]

Trainer for Sci-kit learn ML regression models.

Inspired by the molfeat Sci-kit learn tutorial.

search

Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.

Type:

model_selection.RandomizedSearchCV

__init__(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]

Sci-kit learn trainer.

Parameters:
  • search (Union[RandomizedSearchCV, GridSearchCV]) – Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.

  • smiles (Series) – SMILES input data.

  • targets (Series) – Corresponding target property for each SMILES.

  • thresh (float) – Threshold above which a molecule is classified as belonging to the True class.

  • test_size (float) – Portion of dataset to reserve for test data.

  • seed (int) – Random seed for cross validation search.

train()[source]

Searches for optimal hyperparameters by fitting the CV search.

Return type:

float

Returns:

Best obtained CV score.

property train_results: DataFrame

Returns the cross validation search results.

Return type:

DataFrame

Deep Learning

class potencyscreen.trainers.PyTorchTrainer(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]

Trainer for deep learning models defined in PyTorch.

Trains a PyG graph neural network while tracking performance on validation data. Enables predictions using the best obtained model.

Inspired by the molfeat PyG tutorial.

train_losses

Average train loss in each epoch.

Type:

List

val_losses

Average validation loss in each epoch.

Type:

List

best_model

Model with the best validation loss.

Type:

torch_geometric.nn.models.basic_gnn.BasicGNN

best_val_loss

Best validation loss.

Type:

float

__init__(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]

PyTorch Trainer.

Parameters:
  • model (BasicGNN) – PyTorch GNN model.

  • dataset (TorchDataset) – Dataset.

  • optimizer (Optimizer) – PyTorch Optimizer.

  • loss_fn (Callable) – PyTorch loss function.

  • thresh (float) – Threshold above which a molecule is classified as belonging to the True class.

  • lr_scheduler (Optional[ReduceLROnPlateau]) – ReduceLROnPlateau learning rate scheduler. Other schedulers are not guaranteed to work.

  • batch_size (int) – Training and inference batch size.

  • test_size (float) – Portion of dataset to reserve for test data.

  • val_size (float) – Portion of dataset to reserve for validation data.

  • seed (int) – Random seed for PyTorch random number generator.

  • device (str) – Device to run model.

train(epochs=10)[source]

Trains deep learning model.

Parameters:

epochs (int) – Number of training epochs.

Return type:

float

Returns:

Validation loss of best obtained model.