rnaglib.utils
General utilities for handling RNA structures and graphs.
- rnaglib.utils.download_graphs(redundancy='nr', version='1.0.0', annotated=False, chop=False, overwrite=False, data_root=None, verbose=False)[source]
Based on the options, get the right data from the latest release and put it in download_dir.
- Parameters:
redundancy – Whether to include all RNAs or just a non-redundant set as defined by BGSU
annotated – Whether to include graphlet annotations in the graphs. This will also create a hashing directory and table
overwrite – To overwrite existing data
download_dir – Where to save this data. Defaults to ~/.rnaglib/
- Returns:
the path of the data along with its hashing.
- rnaglib.utils.get_rna_list(nr_only=False)[source]
Fetch a list of PDBs containing RNA from RCSB API.
- rnaglib.utils.graph_from_pdbid(pdbid, graph_dir=None, version='1.0.0', annotated=False, chop=False, redundancy='nr', graph_format='json')[source]
Fetch an annotated graph with a PDBID.
- Parameters:
pdbid – PDB id to fetch
graph_dir – path containing annotated graphs
graph_format – which format to load (JSON, or networkx)
- rnaglib.utils.load_graph(filename)[source]
This is a utility function that supports loading from json or pickle Sometimes, the pickle also contains rings in the form of a node dict, in which case the rings are added into the graph
- Parameters:
filename – json or pickle filename
- Returns:
networkx DiGraph object
- rnaglib.utils.dump_json(filename, graph)[source]
Just a shortcut to dump a json graph more compactly
- Parameters:
filename – The dump name
graph – The graph to dump
- rnaglib.utils.load_json(filename)[source]
Just a shortcut to load a json graph more compactly
- Parameters:
filename – The dump name
- Returns:
The loaded graph
- rnaglib.utils.reorder_nodes(g)[source]
Reorder nodes in graph
- Parameters:
g (networkx.DiGraph) – Pass a graph for node reordering.
- Return h:
(nx DiGraph)
- rnaglib.utils.update_RNApdb(pdir, nr_only=True)[source]
Download a list of RNA containing structures from the PDB overwrite exising files
- Parameters:
pdbdir – path containing downloaded PDBs
- Returns rna:
list of PDBIDs that were fetched.
- rnaglib.utils.fix_buggy_edges(graph, label='LW', strategy='remove', edge_map={'B35': 19, 'B53': 0, 'cHH': 1, 'cHS': 2, 'cHW': 3, 'cSH': 4, 'cSS': 5, 'cSW': 6, 'cWH': 7, 'cWS': 8, 'cWW': 9, 'tHH': 10, 'tHS': 11, 'tHW': 12, 'tSH': 13, 'tSS': 14, 'tSW': 15, 'tWH': 16, 'tWS': 17, 'tWW': 18})[source]
Sometimes some edges have weird names such as t.W representing a fuzziness. We just remove those as they don’t deliver a good information
- Parameters:
graph –
strategy – How to deal with it : for now just remove them.
In the future maybe add an edge type in the edge map ? :return:
- rnaglib.utils.dangle_trim(graph)[source]
Recursively remove dangling nodes from graph, with in place modification
- Parameters:
graph – Nx graph
- Returns:
trimmed graph
- rnaglib.utils.gap_fill(original_graph, graph_to_expand)[source]
If we subgraphed, get rid of all degree 1 nodes by completing them with one more hop
- Parameters:
original_graph – nx graph
graph_to_expand – nx graph that needs to be expanded to fix dangles
- Returns:
the expanded graph
- rnaglib.utils.extract_graphlet(graph, n, size=1, label='LW')[source]
Small util to extract a graphlet around a node
- Parameters:
graph – Nx graph
n – a node in the graph
size – The depth to consider
- Returns:
The graphlet as a copy
- rnaglib.utils.build_node_feature_parser(asked_features=None, node_feature_map={'C5prime_xyz': <rnaglib.utils.feature_maps.ListEncoder object>, 'Dp': <rnaglib.utils.feature_maps.FloatEncoder object>, 'P_xyz': <rnaglib.utils.feature_maps.ListEncoder object>, 'alpha': <rnaglib.utils.feature_maps.FloatEncoder object>, 'amplitude': <rnaglib.utils.feature_maps.FloatEncoder object>, 'bb_type': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'beta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'bin': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'binding_ion': <rnaglib.utils.feature_maps.BoolEncoder object>, 'binding_protein': <rnaglib.utils.feature_maps.BoolEncoder object>, 'binding_protein_Rdst': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Rx': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Ry': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Rz': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tdst': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tx': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Ty': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tz': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_aa': None, 'binding_protein_id': None, 'binding_protein_nt': None, 'binding_protein_nt-aa': None, 'binding_small-molecule': <rnaglib.utils.feature_maps.BoolEncoder object>, 'chain_name': None, 'chi': <rnaglib.utils.feature_maps.FloatEncoder object>, 'cluster': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'dbn': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'delta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'epsilon': <rnaglib.utils.feature_maps.FloatEncoder object>, 'epsilon_zeta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta_base': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta_prime': <rnaglib.utils.feature_maps.FloatEncoder object>, 'filter_rmsd': <rnaglib.utils.feature_maps.FloatEncoder object>, 'form': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'frame_origin': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_quaternion': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_rmsd': <rnaglib.utils.feature_maps.FloatEncoder object>, 'frame_x_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_y_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_z_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'gamma': <rnaglib.utils.feature_maps.FloatEncoder object>, 'glyco_bond': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'index': None, 'index_chain': None, 'is_broken': <rnaglib.utils.feature_maps.BoolEncoder object>, 'is_modified': <rnaglib.utils.feature_maps.BoolEncoder object>, 'nt_code': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'nt_id': None, 'nt_name': None, 'nt_resnum': None, 'nt_type': None, 'phase_angle': <rnaglib.utils.feature_maps.FloatEncoder object>, 'puckering': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'splay_angle': <rnaglib.utils.feature_maps.FloatEncoder object>, 'splay_distance': <rnaglib.utils.feature_maps.FloatEncoder object>, 'splay_ratio': <rnaglib.utils.feature_maps.FloatEncoder object>, 'ssZp': <rnaglib.utils.feature_maps.FloatEncoder object>, 'sse_sse': None, 'sugar_class': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'suiteness': <rnaglib.utils.feature_maps.FloatEncoder object>, 'summary': None, 'theta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'theta_base': <rnaglib.utils.feature_maps.FloatEncoder object>, 'theta_prime': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v0': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v1': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v2': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v3': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v4': <rnaglib.utils.feature_maps.FloatEncoder object>, 'zeta': <rnaglib.utils.feature_maps.FloatEncoder object>})[source]
This function will load the predefined feature maps available globally. Then for each of the features in ‘asked feature’, it will return an encoder object for each of the asked features in the form of a dict {asked_feature : EncoderObject}
If some keys don’t exist, will raise an Error. However if some keys are present but problematic, this will just cause a printing of the problematic keys :param asked_features: A list of string keys that are present in the encoder :return: A dict {asked_feature : EncoderObject}
- rnaglib.utils.build_hash_table(graph_dir, hasher, graphlets=True, max_graphs=0, graphlet_size=1, mode='count', label='LW', directed=True)[source]
Iterates over nodes of the graphs in graph dir and fill a hash table with their graphlets hashes
- Parameters:
graph_dir –
hasher –
graphlets –
max_graphs –
graphlet_size –
mode –
label –
- Returns: