rnaglib.utils

General utilities for handling RNA structures and graphs.

rnaglib.utils.download_graphs(redundancy='nr', version='1.0.0', annotated=False, chop=False, overwrite=False, data_root=None, verbose=False)[source]

Based on the options, get the right data from the latest release and put it in download_dir.

Parameters:
  • redundancy – Whether to include all RNAs or just a non-redundant set as defined by BGSU

  • annotated – Whether to include graphlet annotations in the graphs. This will also create a hashing directory and table

  • overwrite – To overwrite existing data

  • download_dir – Where to save this data. Defaults to ~/.rnaglib/

Returns:

the path of the data along with its hashing.

rnaglib.utils.get_rna_list(nr_only=False)[source]

Fetch a list of PDBs containing RNA from RCSB API.

rnaglib.utils.graph_from_pdbid(pdbid, graph_dir=None, version='1.0.0', annotated=False, chop=False, redundancy='nr', graph_format='json')[source]

Fetch an annotated graph with a PDBID.

Parameters:
  • pdbid – PDB id to fetch

  • graph_dir – path containing annotated graphs

  • graph_format – which format to load (JSON, or networkx)

rnaglib.utils.load_graph(filename)[source]

This is a utility function that supports loading from json or pickle Sometimes, the pickle also contains rings in the form of a node dict, in which case the rings are added into the graph

Parameters:

filename – json or pickle filename

Returns:

networkx DiGraph object

rnaglib.utils.dump_json(filename, graph)[source]

Just a shortcut to dump a json graph more compactly

Parameters:
  • filename – The dump name

  • graph – The graph to dump

rnaglib.utils.load_json(filename)[source]

Just a shortcut to load a json graph more compactly

Parameters:

filename – The dump name

Returns:

The loaded graph

rnaglib.utils.reorder_nodes(g)[source]

Reorder nodes in graph

Parameters:

g (networkx.DiGraph) – Pass a graph for node reordering.

Return h:

(nx DiGraph)

rnaglib.utils.update_RNApdb(pdir, nr_only=True)[source]

Download a list of RNA containing structures from the PDB overwrite exising files

Parameters:

pdbdir – path containing downloaded PDBs

Returns rna:

list of PDBIDs that were fetched.

rnaglib.utils.fix_buggy_edges(graph, label='LW', strategy='remove', edge_map={'B35': 19, 'B53': 0, 'cHH': 1, 'cHS': 2, 'cHW': 3, 'cSH': 4, 'cSS': 5, 'cSW': 6, 'cWH': 7, 'cWS': 8, 'cWW': 9, 'tHH': 10, 'tHS': 11, 'tHW': 12, 'tSH': 13, 'tSS': 14, 'tSW': 15, 'tWH': 16, 'tWS': 17, 'tWW': 18})[source]

Sometimes some edges have weird names such as t.W representing a fuzziness. We just remove those as they don’t deliver a good information

Parameters:
  • graph

  • strategy – How to deal with it : for now just remove them.

In the future maybe add an edge type in the edge map ? :return:

rnaglib.utils.dangle_trim(graph)[source]

Recursively remove dangling nodes from graph, with in place modification

Parameters:

graph – Nx graph

Returns:

trimmed graph

rnaglib.utils.gap_fill(original_graph, graph_to_expand)[source]

If we subgraphed, get rid of all degree 1 nodes by completing them with one more hop

Parameters:
  • original_graph – nx graph

  • graph_to_expand – nx graph that needs to be expanded to fix dangles

Returns:

the expanded graph

rnaglib.utils.extract_graphlet(graph, n, size=1, label='LW')[source]

Small util to extract a graphlet around a node

Parameters:
  • graph – Nx graph

  • n – a node in the graph

  • size – The depth to consider

Returns:

The graphlet as a copy

rnaglib.utils.build_node_feature_parser(asked_features=None, node_feature_map={'C5prime_xyz': <rnaglib.utils.feature_maps.ListEncoder object>, 'Dp': <rnaglib.utils.feature_maps.FloatEncoder object>, 'P_xyz': <rnaglib.utils.feature_maps.ListEncoder object>, 'alpha': <rnaglib.utils.feature_maps.FloatEncoder object>, 'amplitude': <rnaglib.utils.feature_maps.FloatEncoder object>, 'bb_type': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'beta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'bin': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'binding_ion': <rnaglib.utils.feature_maps.BoolEncoder object>, 'binding_protein': <rnaglib.utils.feature_maps.BoolEncoder object>, 'binding_protein_Rdst': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Rx': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Ry': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Rz': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tdst': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tx': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Ty': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_Tz': <rnaglib.utils.feature_maps.FloatEncoder object>, 'binding_protein_aa': None, 'binding_protein_id': None, 'binding_protein_nt': None, 'binding_protein_nt-aa': None, 'binding_small-molecule': <rnaglib.utils.feature_maps.BoolEncoder object>, 'chain_name': None, 'chi': <rnaglib.utils.feature_maps.FloatEncoder object>, 'cluster': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'dbn': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'delta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'epsilon': <rnaglib.utils.feature_maps.FloatEncoder object>, 'epsilon_zeta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta_base': <rnaglib.utils.feature_maps.FloatEncoder object>, 'eta_prime': <rnaglib.utils.feature_maps.FloatEncoder object>, 'filter_rmsd': <rnaglib.utils.feature_maps.FloatEncoder object>, 'form': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'frame_origin': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_quaternion': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_rmsd': <rnaglib.utils.feature_maps.FloatEncoder object>, 'frame_x_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_y_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'frame_z_axis': <rnaglib.utils.feature_maps.ListEncoder object>, 'gamma': <rnaglib.utils.feature_maps.FloatEncoder object>, 'glyco_bond': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'index': None, 'index_chain': None, 'is_broken': <rnaglib.utils.feature_maps.BoolEncoder object>, 'is_modified': <rnaglib.utils.feature_maps.BoolEncoder object>, 'nt_code': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'nt_id': None, 'nt_name': None, 'nt_resnum': None, 'nt_type': None, 'phase_angle': <rnaglib.utils.feature_maps.FloatEncoder object>, 'puckering': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'splay_angle': <rnaglib.utils.feature_maps.FloatEncoder object>, 'splay_distance': <rnaglib.utils.feature_maps.FloatEncoder object>, 'splay_ratio': <rnaglib.utils.feature_maps.FloatEncoder object>, 'ssZp': <rnaglib.utils.feature_maps.FloatEncoder object>, 'sse_sse': None, 'sugar_class': <rnaglib.utils.feature_maps.OneHotEncoder object>, 'suiteness': <rnaglib.utils.feature_maps.FloatEncoder object>, 'summary': None, 'theta': <rnaglib.utils.feature_maps.FloatEncoder object>, 'theta_base': <rnaglib.utils.feature_maps.FloatEncoder object>, 'theta_prime': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v0': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v1': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v2': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v3': <rnaglib.utils.feature_maps.FloatEncoder object>, 'v4': <rnaglib.utils.feature_maps.FloatEncoder object>, 'zeta': <rnaglib.utils.feature_maps.FloatEncoder object>})[source]

This function will load the predefined feature maps available globally. Then for each of the features in ‘asked feature’, it will return an encoder object for each of the asked features in the form of a dict {asked_feature : EncoderObject}

If some keys don’t exist, will raise an Error. However if some keys are present but problematic, this will just cause a printing of the problematic keys :param asked_features: A list of string keys that are present in the encoder :return: A dict {asked_feature : EncoderObject}

rnaglib.utils.build_hash_table(graph_dir, hasher, graphlets=True, max_graphs=0, graphlet_size=1, mode='count', label='LW', directed=True)[source]

Iterates over nodes of the graphs in graph dir and fill a hash table with their graphlets hashes

Parameters:
  • graph_dir

  • hasher

  • graphlets

  • max_graphs

  • graphlet_size

  • mode

  • label

Returns: