rnaglib.prepare_data

Functions to build data releases from raw PDBs.

rnaglib.prepare_data.filter_dot_edges(graph)[source]

Remove edges with a ‘.’ in the LW annotation. This happens in place.

Parameters:

graph – networkx graph

rnaglib.prepare_data.filter_all(graph_dir, output_dir, filters=['NR'], min_nodes=20)[source]

Apply filters to a graph dataset.

Parameters:
  • graph_dir – where to read graphs from

  • output_dir – where to dump the graphs

  • filters – list of which filters to apply (‘NR’, ‘Ribo’, ‘NonRibo’)

  • min_nodes – skip graphs with fewer than min_nodes nodes (default=20)

rnaglib.prepare_data.one_rna_from_cif(cif)[source]

Build 2.5d graph for one cif using dssr

Parameters:

cif – path to mmCIF

Returns:

2.5d graph

rnaglib.prepare_data.cif_to_graph(cif, output_dir=None, min_nodes=20, return_graph=False)[source]

Build DDSR graphs for one mmCIF. Requires x3dna-dssr to be in PATH.

Parameters:
  • cif – path to CIF

  • output_dir – where to dump

  • min_nodes – smallest RNA (number of residue nodes)

  • return_graph – Boolean to include the graph in the output

Returns:

networkx graph of structure.

rnaglib.prepare_data.add_graph_annotations(g, cif)[source]

Adds information at the graph level and on the small molecules partner of an RNA molecule

Parameters:
  • g – the nx graph created from dssr output

  • cif – the path to a .mmcif file

Returns:

the annotated graph, actually the graph is mutated in place

rnaglib.prepare_data.hariboss_filter(lig, cif_dict, mass_lower_limit=160, mass_upper_limit=1000)[source]
Sorts ligands into ion / ligand / None

Returns ions for a specific list of ions, ligands if the hetatm has the right atoms and mass and None otherwise

Parameters:
  • lig – A biopython ligand residue object

  • cif_dict – The output of the biopython MMCIF2DICT object

  • mass_lower_limit

  • mass_upper_limit

rnaglib.prepare_data.chop_all(graph_path, dest, n_jobs=4, parallel=True)[source]

Chop and dump all the rglib graphs in the dataset.

Parameters:
  • graph_path – path to graphs for chopping

  • dest – path where chopped graphs will be dumped

N_jobs:

number of workers to use

Paralle:

whether to use multiprocessing

rnaglib.prepare_data.annotate_all(dump_path='../data/annotated/sample_v2', graph_path='../data/chunks_nx', parallel=True, do_hash=True, wl_hops=3, graphlet_size=1, re_annotate=False)[source]

Routine for all files in a folder

Parameters:
  • dump_path

  • graph_path

  • parallel

Returns: