pypath.utils.pyreact.BioPaxReader§

class pypath.utils.pyreact.BioPaxReader(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]§

Bases: Logger

This class parses a BioPAX file and exposes its content easily accessible for further processing. First it opens the file, if necessary it extracts from the archive. Then an lxml.etree.iterparse object is created, so the iteration is efficient and memory requirements are minimal. The iterparse object is iterated then, and for each tag included in the BioPaxReader.methods dict, the appropriate method is called. These me- thods extract information from the BioPAX entity, and store it in arbit- rary data structures: strings, lists or dicts. These are stored in dicts where keys are the original IDs of the tags, prefixed with the unique ID of the parser object. This is necessary to give a way to merge later the result of parsing more BioPAX files. For example, id42 may identify EGFR in one file, but AKT1 in the other. Then, the parser of the first file has a unique ID of a 5 letter random string, the second parser a different one, and the molecules with the same ID can be distinguished at merging, e.g. EGFR will be ffjh2@id42 and AKT1 will be tr9gy@id42. The methods and the resulted dicts are named after the BioPAX elements, sometimes abbreviated. For example, BioPaxReader.protein() processes the <bp:Protein> elements, and stores the results in BioPaxReader.proteins.

In its current state, this class does not parse every information and all BioPax entities. For example, nucleic acid related entities and interactions are omitted. But these easily can be added with minor mo- difications.

__init__(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]§
Parameters:

biopax (str,FileOpener) – either a filename, or a FileOpener

object; if string is supplied, the FileOpener will be created in- ternally

Parameters:
  • source (str) – the name of the data source, e.g. Reactome

  • cleanup_period (int) – the number of last elements stored

during the iteration of lxml.etree.iterparse; lower number results lower memory usage, but might risk that an element is deleted before it has been processed. Default is 800, which is a safe option.

Parameters:

file_from_archive (str) – in case of processing an archive

which may contain multiple files (tar.gz or zip), the path of the file to be processed needs to be supplied. E.g. BioPax/Homo_sapiens.owl.

Parameters:

silent (bool) – whether print status messages and progress

bars during processing. If you process large number of small files, better to set False, in case of one large file, True. The default is False.

Methods

__init__(biopax, source[, cleanup_period, ...])

param str,FileOpener biopax:

either a filename, or a FileOpener

biopax_size()

Gets the uncompressed size of the BioPax XML.

cassembly()

catalysis()

cleanup_hook()

Removes the used elements to free up memory.

close_biopax()

Deletes the iterator and closes the file object.

control()

cplex()

extract()

Extracts the BioPax file from compressed archive.

fragfea()

get_none(something)

init_etree()

Creates the lxml.etree.iterparse object.

interaction()

iterate()

Iterates the BioPax XML and calls the appropriate methods for each element.

modfea()

open_biopax()

Opens the BioPax file.

pathway()

pref()

process([silent])

This method executes the total workflow of BioPax processing.

protein()

pubref()

pwstep()

reaction()

reload()

rxref()

seqint()

seqmodvoc()

seqsite()

set_progress()

Initializes a progress bar.

stoichiometry()

uxref()

biopax_size()[source]§

Gets the uncompressed size of the BioPax XML. This is needed in order to have a progress bar. This method should not be called directly, BioPaxReader.process() calls it.

cleanup_hook()[source]§

Removes the used elements to free up memory. This method should not be called directly, BioPaxReader.iterate() calls it.

close_biopax()[source]§

Deletes the iterator and closes the file object. This method should not be called directly, BioPaxReader.process() calls it.

extract()[source]§

Extracts the BioPax file from compressed archive. Creates a temporary file. This is needed to trace the progress of processing, which is useful in case of large files. This method should not be called directly, BioPaxReader.process() calls it.

init_etree()[source]§

Creates the lxml.etree.iterparse object. This method should not be called directly, BioPaxReader.process() calls it.

iterate()[source]§

Iterates the BioPax XML and calls the appropriate methods for each element. This method should not be called directly, BioPaxReader.process() calls it.

open_biopax()[source]§

Opens the BioPax file. This method should not be called directly, BioPaxReader.process() calls it.

process(silent=False)[source]§

This method executes the total workflow of BioPax processing.

Parameters:

silent (bool) – whether to print status messages and progress bars.

set_progress()[source]§

Initializes a progress bar. This method should not be called directly, BioPaxReader.process() calls it.