Trainers
A collection of trainers for molecular property prediction from SMILES.
Trainer interface
- class potencyscreen.trainers.TrainerTemplate(thresh)[source]
Abstract class providing shared evaluation functionality and a common interface for all trainers.
- thresh
Threshold above which a molecule is classified as belonging to the True class.
- test_metrics(verbose=True)[source]
Evaluates the best optained model on the test dataset and computes important model evaluation metrics.
- Parameters:
verbose (
bool
) – If true, prints the obtained metrics.- Return type:
- Returns:
(test_eval, metrics), where ‘test_eval’ is a Tuple (prediction, target) containing all predicted and target values of the test set and ‘metrics’ is a Tuple (r2, mae, rmse, acc, bac, sens, spec, conf_matrix) containing all evaluation metrics.
Classical ML
- class potencyscreen.trainers.SklearnTrainer(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]
Trainer for Sci-kit learn ML regression models.
Inspired by the molfeat Sci-kit learn tutorial.
- search
Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.
- Type:
model_selection.RandomizedSearchCV
- __init__(search, smiles, targets, thresh, test_size=0.2, seed=0)[source]
Sci-kit learn trainer.
- Parameters:
search (
Union
[RandomizedSearchCV
,GridSearchCV
]) – Sci-kit learn RandomizedSearchCV or GridSearch to optimize hyperparameters and fit ML model.smiles (
Series
) – SMILES input data.targets (
Series
) – Corresponding target property for each SMILES.thresh (
float
) – Threshold above which a molecule is classified as belonging to the True class.test_size (
float
) – Portion of dataset to reserve for test data.seed (
int
) – Random seed for cross validation search.
- train()[source]
Searches for optimal hyperparameters by fitting the CV search.
- Return type:
- Returns:
Best obtained CV score.
- property train_results: DataFrame
Returns the cross validation search results.
- Return type:
DataFrame
Deep Learning
- class potencyscreen.trainers.PyTorchTrainer(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]
Trainer for deep learning models defined in PyTorch.
Trains a PyG graph neural network while tracking performance on validation data. Enables predictions using the best obtained model.
Inspired by the molfeat PyG tutorial.
- train_losses
Average train loss in each epoch.
- Type:
List
- val_losses
Average validation loss in each epoch.
- Type:
List
- best_model
Model with the best validation loss.
- Type:
torch_geometric.nn.models.basic_gnn.BasicGNN
- __init__(model, dataset, optimizer, loss_fn, thresh, lr_scheduler=None, batch_size=32, test_size=0.2, val_size=0.1, seed=0, device='cpu')[source]
PyTorch Trainer.
- Parameters:
model (
BasicGNN
) – PyTorch GNN model.dataset (
TorchDataset
) – Dataset.optimizer (
Optimizer
) – PyTorch Optimizer.loss_fn (
Callable
) – PyTorch loss function.thresh (
float
) – Threshold above which a molecule is classified as belonging to the True class.lr_scheduler (
Optional
[ReduceLROnPlateau
]) – ReduceLROnPlateau learning rate scheduler. Other schedulers are not guaranteed to work.batch_size (
int
) – Training and inference batch size.test_size (
float
) – Portion of dataset to reserve for test data.val_size (
float
) – Portion of dataset to reserve for validation data.seed (
int
) – Random seed for PyTorch random number generator.device (
str
) – Device to run model.