Data
Functions for preprocessing and storing SMILES data.
Preprocessing
- potencyscreen.data.standardize_smiles(dataframe, smiles_column)[source]
Adds a column ‘standard_smiles’ to the input dataframe by sanitizing and standardizing the SMILES strings.
- Parameters:
dataframe (
DataFrame
) – A pandas dataframe containing the SMILES strings.smiles_column (
str
) – A string deining the dataframe column containing SMILES.
- Return type:
DataFrame
- Returns:
The input dataframe with standardized SMILES added in the column ‘standard_smiles’.
Dataloading
- class potencyscreen.data.TorchDataset(smiles, y, featurizer)[source]
PyTorch Dataset class to store and load SMILES data for deep learning in
potencyscreen.trainers.PyTorchTrainer
.This class is adapted from the molfeat PyG tutorial.
- featurizer
Featurizer function applied during dataloading to extract graph features from SMILES.
- Type:
PYGGraphTransformer
- y
Target property data.
- Type:
- transformed_mols
The extracted features for all molecules in the dataset.
- Type:
List
- __init__(smiles, y, featurizer)[source]
Torch Dataset.
- Parameters:
smiles (
Series
) – SMILES input data.y (
Series
) – Corresponding target property for each SMILES.featurizer (
PYGGraphTransformer
) – Molfeat featurizer to apply to SMILES during data loading.
- collate_fn(**kwargs)[source]
collate_fn for PyTorch
torch.utils.data.DataLoader
.
- property degree
Returns the histogram of in-degrees of nodes for use in PNA.
- property num_atom_features
Returns dimension of atom features extracted from featurizer.
- property num_bond_features
Returns dimension of bond features extracted from featurizer.
- property num_output
Returns dimension of target property.