pypath.utils.pyreact.BioPaxReader§
- class pypath.utils.pyreact.BioPaxReader(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]§
Bases:
Logger
This class parses a BioPAX file and exposes its content easily accessible for further processing. First it opens the file, if necessary it extracts from the archive. Then an lxml.etree.iterparse object is created, so the iteration is efficient and memory requirements are minimal. The iterparse object is iterated then, and for each tag included in the BioPaxReader.methods dict, the appropriate method is called. These me- thods extract information from the BioPAX entity, and store it in arbit- rary data structures: strings, lists or dicts. These are stored in dicts where keys are the original IDs of the tags, prefixed with the unique ID of the parser object. This is necessary to give a way to merge later the result of parsing more BioPAX files. For example, id42 may identify EGFR in one file, but AKT1 in the other. Then, the parser of the first file has a unique ID of a 5 letter random string, the second parser a different one, and the molecules with the same ID can be distinguished at merging, e.g. EGFR will be ffjh2@id42 and AKT1 will be tr9gy@id42. The methods and the resulted dicts are named after the BioPAX elements, sometimes abbreviated. For example, BioPaxReader.protein() processes the <bp:Protein> elements, and stores the results in BioPaxReader.proteins.
In its current state, this class does not parse every information and all BioPax entities. For example, nucleic acid related entities and interactions are omitted. But these easily can be added with minor mo- difications.
- __init__(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]§
- Parameters:
biopax (str,FileOpener) – either a filename, or a FileOpener
object; if string is supplied, the FileOpener will be created in- ternally
- Parameters:
source (str) – the name of the data source, e.g. Reactome
cleanup_period (int) – the number of last elements stored
during the iteration of lxml.etree.iterparse; lower number results lower memory usage, but might risk that an element is deleted before it has been processed. Default is 800, which is a safe option.
- Parameters:
file_from_archive (str) – in case of processing an archive
which may contain multiple files (tar.gz or zip), the path of the file to be processed needs to be supplied. E.g. BioPax/Homo_sapiens.owl.
- Parameters:
silent (bool) – whether print status messages and progress
bars during processing. If you process large number of small files, better to set False, in case of one large file, True. The default is False.
Methods
__init__
(biopax, source[, cleanup_period, ...])- param str,FileOpener biopax:
either a filename, or a FileOpener
Gets the uncompressed size of the BioPax XML.
cassembly
()catalysis
()Removes the used elements to free up memory.
Deletes the iterator and closes the file object.
control
()cplex
()extract
()Extracts the BioPax file from compressed archive.
fragfea
()get_none
(something)Creates the
lxml.etree.iterparse
object.interaction
()iterate
()Iterates the BioPax XML and calls the appropriate methods for each element.
modfea
()Opens the BioPax file.
pathway
()pref
()process
([silent])This method executes the total workflow of BioPax processing.
protein
()pubref
()pwstep
()reaction
()reload
()rxref
()seqint
()seqmodvoc
()seqsite
()Initializes a progress bar.
stoichiometry
()uxref
()- biopax_size()[source]§
Gets the uncompressed size of the BioPax XML. This is needed in order to have a progress bar. This method should not be called directly,
BioPaxReader.process()
calls it.
- cleanup_hook()[source]§
Removes the used elements to free up memory. This method should not be called directly,
BioPaxReader.iterate()
calls it.
- close_biopax()[source]§
Deletes the iterator and closes the file object. This method should not be called directly,
BioPaxReader.process()
calls it.
- extract()[source]§
Extracts the BioPax file from compressed archive. Creates a temporary file. This is needed to trace the progress of processing, which is useful in case of large files. This method should not be called directly,
BioPaxReader.process()
calls it.
- init_etree()[source]§
Creates the
lxml.etree.iterparse
object. This method should not be called directly,BioPaxReader.process()
calls it.
- iterate()[source]§
Iterates the BioPax XML and calls the appropriate methods for each element. This method should not be called directly,
BioPaxReader.process()
calls it.
- open_biopax()[source]§
Opens the BioPax file. This method should not be called directly,
BioPaxReader.process()
calls it.