`rnaglib.prepare_data`

Functions to build data releases from raw PDBs.

rnaglib.prepare_data.filter_dot_edges(graph)[source]

Remove edges with a ‘.’ in the LW annotation. This happens in place.

Parameters:: graph – networkx graph

rnaglib.prepare_data.filter_all(graph_dir, output_dir, filters=['NR'], min_nodes=20)[source]

Apply filters to a graph dataset.

Parameters:

graph_dir – where to read graphs from
output_dir – where to dump the graphs
filters – list of which filters to apply (‘NR’, ‘Ribo’, ‘NonRibo’)
min_nodes – skip graphs with fewer than min_nodes nodes (default=20)

rnaglib.prepare_data.one_rna_from_cif(cif)[source]

Build 2.5d graph for one cif using dssr

Parameters:: cif – path to mmCIF
Returns:: 2.5d graph

rnaglib.prepare_data.cif_to_graph(cif, output_dir=None, min_nodes=20, return_graph=False)[source]

Build DDSR graphs for one mmCIF. Requires x3dna-dssr to be in PATH.

Parameters:

cif – path to CIF
output_dir – where to dump
min_nodes – smallest RNA (number of residue nodes)
return_graph – Boolean to include the graph in the output

Returns:

networkx graph of structure.

rnaglib.prepare_data.add_graph_annotations(g, cif)[source]

Adds information at the graph level and on the small molecules partner of an RNA molecule

Parameters:

g – the nx graph created from dssr output
cif – the path to a .mmcif file

Returns:

the annotated graph, actually the graph is mutated in place

rnaglib.prepare_data.hariboss_filter(lig, cif_dict, mass_lower_limit=160, mass_upper_limit=1000)[source]

Sorts ligands into ion / ligand / None: Returns ions for a specific list of ions, ligands if the hetatm has the right atoms and mass and None otherwise

Parameters:

lig – A biopython ligand residue object
cif_dict – The output of the biopython MMCIF2DICT object
mass_lower_limit –
mass_upper_limit –

rnaglib.prepare_data.chop_all(graph_path, dest, n_jobs=4, parallel=True)[source]

Chop and dump all the rglib graphs in the dataset.

Parameters:

graph_path – path to graphs for chopping
dest – path where chopped graphs will be dumped

N_jobs:

number of workers to use

Paralle:

whether to use multiprocessing

rnaglib.prepare_data.annotate_all(dump_path='../data/annotated/sample_v2', graph_path='../data/chunks_nx', parallel=True, do_hash=True, wl_hops=3, graphlet_size=1, re_annotate=False)[source]

Routine for all files in a folder

Parameters:

dump_path –
graph_path –
parallel –

Returns:

rnaglib.prepare_data

`rnaglib.prepare_data`