rnaglib.data_loading
- class rnaglib.data_loading.RNADataset(data_path=None, version='1.0.0', download_dir=None, redundancy='nr', all_graphs=None, representations=(), rna_features=None, nt_features=None, bp_features=None, rna_targets=None, nt_targets=None, bp_targets=None, annotated=False, verbose=False)[source]
This class is the main object to hold the core RNA data annotations. The
RNAglibDataset.all_rnas
object is a generator networkx objects that hold all the annotations for each RNA in the dataset. You can also access individual RNAs on-disk withRNAGlibDataset()[idx]
orRNAGlibDataset().get_pdbid('1b23')
- Parameters:
representations – List of rnaglib.Representation objects to apply to each item.
data_path – The path to the folder containing the graphs. If node_sim is not None, this data should be annotated
version – Version of the dataset to use (default=’0.0.0’)
redundancy – To use all graphs or just the non redundant set.
all_graphs – In the given directory, one can choose to provide a list of graphs to use
- subset(list_of_graphs)[source]
Create another dataset with only the specified graphs
- Parameters:
list_of_graphs – a list of graph names
- Returns:
A graphdataset
- get_nt_encoding(g, encode_feature=True)[source]
Get targets for graph g for every node get the attribute specified by self.node_target output a mapping of nodes to their targets
- Parameters:
g – a nx graph
encode_feature – A boolean as to whether this should encode the features or targets
- Returns:
A dict that maps nodes to encodings
- rnaglib.data_loading.get_loader(dataset, batch_size=5, num_workers=0, split=True, split_train=0.7, split_valid=0.85, verbose=False, framework='dgl')[source]
Fetch a loader object for a given dataset.
- Parameters:
dataset (rnaglib.data_loading.RNADataset) – Dataset for loading.
batch_size (int) – number of items in batch
split (bool) – whether to compute splits
split_train (float) – proportion of dataset to keep for training
split_valid (float) – proportion of dataset to keep for validation
verbose (bool) – print updates
framework (str) – learning framework to use (‘dgl’)
- Returns:
torch.utils.data.DataLoader
- class rnaglib.data_loading.Collater(dataset)[source]
Wrapper for collate function, so we can use different node similarities. We cannot use functools.partial as it is not picklable so incompatible with Pytorch loading
Initialize a Collater object.
- Parameters:
node_simfunc – A node comparison function as defined in kernels, to optionally return a pairwise
comparison of the nodes in the batch :param max_size_kernel: If the node comparison is not None, optionnaly only return a pairwise comparison between a subset of all nodes, of size max_size_kernel :param hstack: If True, hstack point cloud return
- Returns:
a picklable python function that can be called on a batch by Pytorch loaders