rnaglib.prepare_data
Functions to build data releases from raw PDBs.
- rnaglib.prepare_data.filter_dot_edges(graph)[source]
Remove edges with a ‘.’ in the LW annotation. This happens in place.
- Parameters:
graph – networkx graph
- rnaglib.prepare_data.filter_all(graph_dir, output_dir, filters=['NR'], min_nodes=20)[source]
Apply filters to a graph dataset.
- Parameters:
graph_dir – where to read graphs from
output_dir – where to dump the graphs
filters – list of which filters to apply (‘NR’, ‘Ribo’, ‘NonRibo’)
min_nodes – skip graphs with fewer than min_nodes nodes (default=20)
- rnaglib.prepare_data.one_rna_from_cif(cif)[source]
Build 2.5d graph for one cif using dssr
- Parameters:
cif – path to mmCIF
- Returns:
2.5d graph
- rnaglib.prepare_data.cif_to_graph(cif, output_dir=None, min_nodes=20, return_graph=False)[source]
Build DDSR graphs for one mmCIF. Requires x3dna-dssr to be in PATH.
- Parameters:
cif – path to CIF
output_dir – where to dump
min_nodes – smallest RNA (number of residue nodes)
return_graph – Boolean to include the graph in the output
- Returns:
networkx graph of structure.
- rnaglib.prepare_data.add_graph_annotations(g, cif)[source]
Adds information at the graph level and on the small molecules partner of an RNA molecule
- Parameters:
g – the nx graph created from dssr output
cif – the path to a .mmcif file
- Returns:
the annotated graph, actually the graph is mutated in place
- rnaglib.prepare_data.hariboss_filter(lig, cif_dict, mass_lower_limit=160, mass_upper_limit=1000)[source]
- Sorts ligands into ion / ligand / None
Returns ions for a specific list of ions, ligands if the hetatm has the right atoms and mass and None otherwise
- Parameters:
lig – A biopython ligand residue object
cif_dict – The output of the biopython MMCIF2DICT object
mass_lower_limit –
mass_upper_limit –
- rnaglib.prepare_data.chop_all(graph_path, dest, n_jobs=4, parallel=True)[source]
Chop and dump all the rglib graphs in the dataset.
- Parameters:
graph_path – path to graphs for chopping
dest – path where chopped graphs will be dumped
- N_jobs:
number of workers to use
- Paralle:
whether to use multiprocessing