Data

Functions for preprocessing and storing SMILES data.

Preprocessing

potencyscreen.data.standardize_smiles(dataframe, smiles_column)[source]

Adds a column ‘standard_smiles’ to the input dataframe by sanitizing and standardizing the SMILES strings.

Parameters:

dataframe (DataFrame) – A pandas dataframe containing the SMILES strings.
smiles_column (str) – A string deining the dataframe column containing SMILES.

Return type:

DataFrame

Returns:

The input dataframe with standardized SMILES added in the column ‘standard_smiles’.

class potencyscreen.data.TorchDataset(smiles, y, featurizer)[source]

PyTorch Dataset class to store and load SMILES data for deep learning in potencyscreen.trainers.PyTorchTrainer.

This class is adapted from the molfeat PyG tutorial.

featurizer

Featurizer function applied during dataloading to extract graph features from SMILES.

Target property data.

transformed_mols

The extracted features for all molecules in the dataset.

__init__(smiles, y, featurizer)[source]

Torch Dataset.

Parameters:

smiles (Series) – SMILES input data.
y (Series) – Corresponding target property for each SMILES.
featurizer (PYGGraphTransformer) – Molfeat featurizer to apply to SMILES during data loading.

collate_fn(**kwargs)[source]: collate_fn for PyTorch torch.utils.data.DataLoader.

property degree: Returns the histogram of in-degrees of nodes for use in PNA.

property num_atom_features: Returns dimension of atom features extracted from featurizer.

property num_bond_features: Returns dimension of bond features extracted from featurizer.