Welcome to pypath’s documentation!

pypath is a Python module for cellular signaling pathways analysis. It is accompanied and developed together with a high confidence, literature curated signaling network, OmniPath.

Source code

pypath is free software, licensed under GPLv3. The code is available at github. Recent package for PIP is available at http://pypath.omnipathdb.org/releases/latest/, archives are available at http://pypath.omnipathdb.org/releases/archive/.

Main features

  • Undirected or directed networks.
  • Easy, often seemless protein ID conversion.
  • Efficient handling of annotations, especially sources, literature references, directions, effect signs (stimulation/inhibition) and enzyme-substrate interactions.
  • Ready integration of dozens of bioinformatics resources, most of them being downloaded on the fly, directly from the original source.
  • Caching: after downloading data, a local copy saved, so faster and offline run is possible afterwards.
  • Based on igraph: plenty of graph methods are available with excellent computational performance.
  • Partial support for non-human species.
  • Partial support for other molecular species than proteins.

Indices and tables

Examples

Example 1: building the network

Import the module and create an instance:

import pypath
pa = pypath.PyPath()

The tables for ID conversions, and the network of the selected manually curated pathway databases (OmniPath) can be initialized this way:

pa.init_network()

Resources without references for each interaction are available separately. For example, to build a comprehensive phosphonetwork, you might want to load interactions from high-throughput data in PhosphoSite, kinase-substrate relationships from PhosphoNetworks and MIMP. This is the way to do it:

from pypath.data_formats import ptm_misc
pa.load_resources(lst = {'mimp': ptm_misc['mimp']})
pa.load_resources(lst = {'pnetworks': ptm_misc['pnetworks']})
pa.load_resources(lst = {'psite_noref': ptm_misc['psite_noref']})

In case you have your own interaction data, and you wish to merge this into the network, you first need to define its format, and then call the load_resource() function. In this simple example, the proteins are noted by Gene Symbols, their names are in columns #0 and #1, while from column #2 an edge attribute named ‘score’ will be read. The file is tab separated, and the file name is ‘mylist.sif’. (This won’t work unless you really have a file like this.)

mylist = pypath.input_formats.ReadSettings(name = "mylist",
    separator = "\t", nameColA = 0, nameColB = 1,
    nameTypeA = "genesymbol", nameTypeB = "genesymbol",
    isDirected = False, inFile = 'mylist.sif',
    extraEdgeAttrs = {
        'score': 2
    })
pa.load_resource(mylist)

Although proteins are identified by UniProt IDs, it is useful to have standard gene symbols as labels, to make things more human understandable.

pa.genesymbol_labels()

Now the igraph object is in pa.graph, vertex and edge attributes are accessible the usual way:

pa.graph.vs[0]['name']
pa.graph.vs[0]['label']

Let’s see the default attributes of edges:

pa.graph.es[0]['sources']
pa.graph.es[0]['references']
pa.graph.es[0]['dirs']

The graph is undirected by default, and ‘Direction’ objects can be found in <edge>[‘dirs’] to describe the directions and signs of the edge:

print pa.graph.es[111]['dirs']
pa.graph.es[111]['dirs'].is_directed()
pa.graph.es[111]['dirs'].is_inhibition()
pa.graph.es[111]['dirs'].which_dirs()
pa.graph.es[111]['dirs'].get_dir(pa.graph.es[111]['dirs'].which_dirs()[0])

To convert the igraph object to a directed graph, use this function:

pa.get_directed()
pa.dgraph.ecount()
pa.dgraph.vcount()

To replace the graph with the giant component of the directed graph:

pa.graph = pa.get_giant(graph = pa.dgraph)

Get compounds for all the proteins in the network:

pa.compounds_from_chembl((None, 'chembl_ebi'), pchembl = True)

This works only if you have a MySQL server hosting an instance of ChEMBL:

pa.compounds_from_chembl((None, 'chembl'), pchembl = True)

You can find the list of compounds and the requested data (now only the pchembl values) in vertex attributes:

pa.graph.vs[2]['compounds_chembl']
pa.graph.vs[2]['compounds_data']
pa.graph.vs[2]['compounds_data'][0]['pchembl']

Loading PTMs. This function loads PTMs from all the available resources. It takes long time first, but once it saves the downloaded data under ./cache directory, it is much faster to run again.

pa.load_ptms()

Individual PTM resources can be loaded the following way:

pa.load_phospho_dmi(source = 'phosphoELM')

In the latter case, this function is needed to merge the identical PTMs loaded from multiple resources:

pa.uniq_ptms()

PTMs are stored in objects. This example shows how to access the type of modification, and the number and name of the modified residue.

pa.graph.es[70]['ptm'][0].ptm.typ
pa.graph.es[70]['ptm'][0].ptm.protein
pa.graph.es[70]['ptm'][0].ptm.residue.name
pa.graph.es[70]['ptm'][0].ptm.residue.number

Example 2: using the Mapper class for translating IDs

The mapping submodule of pypath can be used for ID conversion. Here is a basic example:

from pypath import mapping
m = mapping.Mapper(9606)
result = {}
gene_list = ['EGFR', 'AKT1', 'GABARAPL1', 'TP53']
  for g in gene_list:
    result[g] = m.map_name(g, 'genesymbol', 'uniprot')

Example 3: pathways annotations

Pathways are functional annotations of molecules in molecular networks. Currently pathway annotations from 4 sources are available in pypath: from KEGG, NetPath, SignaLink and Signor. NetPath and SignaLink will be loaded automatically with the network data, the other two need to be loaded separately:

pa.kegg_pathways()
pa.signor_pathways()

SignaLink assigns pathway annotations to proteins, while the other resources assignes this to interactions (at least the data read this way). In addition, proteins classified as autophagy proteins in the Autophagy Regulatory Network will appear as a SignaLink pathway named Autophagy. To have uniform annotations (from all sources, for both proteins and interactions, with <source>_pathways attribute names), use this method to do the necessary conversion:

pa.pathway_attributes()

After this, you can access the pathway annotations in kegg_pathways, netpath_pathways, signalink_pathways and signor_pathways edge and vertex attributes:

print pa.graph.vs[333]['signor_pathways']

You can simply do all the steps above by calling one method:

pa.load_all_pathways()

Example 4: other functional annotations

Pathways as defined above usually follow the well known ways of information flow as it is presented in textbooks and reviews. But they are incomplete and biased. More complete and less biased functional annotations are Gene Ontology and GeneSets from the Molecular Signature Database. Methods are available in pypath, so you can load these annotations into network attributes.

# load only the biological process aspect:
pa.load_go(['P'])
# get the GO BP terms for AKT1:
pa.gs('AKT1')['go']['P']
# get the GO annotation:
pa.go_dict()
# list names instead of IDs:
# (9606 is an NCBI taxonomy ID)
map(pa.go[9606].get_name, pa.gs('AKT1')['go']['P'])
# calculate enrichment:
# this is a simple Fisher test
# with multiple p-values correction
# for example, get the enriched terms for the Notch pathway:
pa.load_all_pathways()
notch = pa.pathway_members('Notch', 'signalink')
pa.go_dict()
enr = pa.go_enrichment(list(notch.up()))
print enr
enr.toplist()
enr.top_terms()
enr.top_ids()
enr.enrichments[enr.top_ids()[0]].pval_adj
enr.enrichments[enr.top_ids()[0]].significant()

Using GeneSets from MSigDB:

from pypath import gsea
# login with an MSigDB user:
g = gsea.GSEA('user@email.org', mapper = pa.mapper)
g.show_collections()
g.load_collection('CGP')
# genesets are loaded into g.sets
enr = gsea.GSEABinaryEnrichmentSet(basic_set = pa.graph.vs['name'], gsea = g)
# or ``basic_set`` could be ``dataio.all_uniprots()``
enr.new_set(list(notch.up()))
# if you see nothing, it means none of the loaded sets
# are enriched in your list
print enr
enr.top_genesets()
enr.top_geneset_ids()
# to do the same with methods of the PyPath() object:
pa.init_gsea('user@email.org')
pa.add_genesets(['CP:KEGG'])
enr = pa.geneset_enrichment(list(notch.up()), alpha = 0.2)

Example 5: bypass cache

Cache makes pypath run much faster. A typical session downloads hundreds MBs of data from dozens of sources, and it takes minutes to do this. In addition it makes pypath sensitive to network connectivity and speed. After the first download, files are saved in ./cache directory, and files with the same URL will be automatically read from there. However, sometimes it is necessary to bypass the cache and download the files again. For example if we suspect there are erroneous or old files there. There is an easy way to disable the cache temporarily while executing data input methods:

# here we use the old cache:
pa.load_signor_ptms()

with pypath.dataio.cache_off():
    # here we don not read from the cache
    # but download again and write the
    # new files into the cache:
    pa.load_signor_ptms()

# here already the new files are used from the cache:
pa.load_signor_ptms()

# similarly, if the cache is turned off by default,
# we can temporarily enable:

# this way we permanently disable the cache:
pypath.dataio.CACHE = False

# and here temporarily enable:
with pypath.dataio.cache_on():
    human_proteome = pypath.dataio.all_uniprots()

# the cache is permanently enabled if this variable is ``None`` or ``True``:
pypath.dataio.CACHE = None

I plan to introduce more methods to give a more flexible control over the files in cache.

Example 6: saving and loading a session:

The network object with its attributes can be saved into a pickle dump, and loaded from there in subsequent sessions.

# initialize a PyPath() object:
pa = pypath.PyPath()
pa.init_network()
pa.load_all_pathways()

# here we save the loaded network
# with the pathway annotations:
pa.save_network()

# in another session we load the saved network:
pa = pypath.PyPath()
pa.init_network(pfile = True)

# above the network has been saved into
# `cache/default_network.pickle`
# to save/load to/from different file:

pa.save_network('cache/other_network.pickle')
pa = pypath.PyPath()
pa.init_network(pfile = 'cache/other_network.pickle')

How to set up a ChEMBL MySQL instance?

Currently pypath.chembl gives some powerful and flexible methods to query compound-target relationships from ChEMBL. This is implemented using MySQL, and so far I could not find a way to provide the same features with using the webservice. Using the webservice would be much more convenient for most of the users, so it is only matter of time and I will implement a webservice based ChEMBL module. Until then, here is a short guide to load ChEMBL on your own MySQL server. To do this, you will need 25GB of free disk space. ChEMBL is huge. 1GB is the downloaded compressed database dump, 8GB is the same uncompressed, and 16GB is the database loaded into MySQL. You can delete the 2 former, so at the end you will sacrifice only 16GB to have your own ChEMBL.

# login to the MySQL shell as administrator:

mysql --user=root --password=foobar [--host=217.0.0.1 --port=3306]
/* create a database and user for ChEMBL: */
CREATE DATABASE chembl;
CREATE USER chembl;
GRANT ALL ON chembl.* TO `chembl`@`%` IDENTIFIED BY 'a-new-password';
FLUSH PRIVILEGES;

/* optionally create a user which can only read, e.g. if you want
to share this database with the whole institute: */
CREATE USER chembl_ro;
GRANT SELECT ON chembl.* TO `chembl_ro`@`%` IDENTIFIED BY 'a-new-password';
FLUSH PRIVILEGES;

EXIT;
# let's download the ChEMBL MySQL dump (warning, it is 1GB!):
curl -O ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_21_mysql.tar.gz

# uncompressing. start this only if you have 8GB free space:
tar -xzvf chembl_21_mysql.tar.gz

# loading into mysql. this will take up 16GB more space:
mysql --user=chembl --password='password-of-chembl' [--host=217.0.0.1 --port=3306] < \
    chembl_21_mysql/chembl_21.mysqldump.sql

At this point you can query ChEMBL on your own laptop/server. Congratulations! Enjoy! You might find it necessary to increase the cache and buffer sizes in /etc/mysql/my.cnf. Loading the mysqldump takes time, if it fails try it again, but before check if you have enough free space and see the error log of MySQL.

Alternatively, you can download and run the myChEMBL virtual machine, and connect to its pre-installed MySQL server via the VirtualBox virtual network.

How to access OmniPath data by bioservices

Bioservices is a Python module to access various webserices. If you need only the network data and kinase-substrate interactions, and you would like to proccess the data further your own way, bioservices offers a convenient way to fetch the data directly from the OmniPath webservice into Python objects.

from bioservices import omnipath
op = omnipath.OmniPath()

# all interactions:
i = op.get_interactions()

# or only those of EGFR1 but include the sources and references fields:
i = op.get_interactions('P00533', fields = ['sources', 'references'])

# get kinase-substrate interactions of EGFR1 and PRKCA:
ks = op.get_ptms('P00533,Q02156', fields = ['sources'])

Reference

class pypath.main.PyPath(ncbi_tax_id=9606, default_name_type={'protein': 'uniprot', 'drug': 'chembl', 'mirna': 'mirbase'}, copy=None, mysql=(None, 'mapping'), chembl_mysql=(None, 'chembl'), name='unnamed', outdir='results', loglevel='INFO', loops=False)[source]
add_update_vertex(defAttrs, originalName, originalNameType, extraAttrs={}, add=False)[source]

Updates the attributes of one node in the network. Optionally it creates a new node and sets the attributes, but it is not efficient as igraph needs to reindex vertices after this operation, so better to create new nodes and edges in batch.

all_between(nameA, nameB)[source]

Returns all edges between two given vertex names. Similar to straight_between(), but checks both directions, and returns list of edge ids in [undirected, straight, reveresed] format, for both nameA -> nameB and nameB -> nameA edges.

apply_list(name, node_or_edge='node')[source]

Creates vertex or edge attribute based on a list.

attach_network(edgeList=False, regulator=False)[source]

Adds edges to the network from edgeList obtained from file or other input method.

basic_stats(latex=False, caption='', latex_hdr=True, fontsize=8, font='HelveticaNeueLTStd-LtCn', fname=None, header_format='%s', row_order=None, by_category=True, use_cats=['p', 'm', 'i', 'r'], urls=True, annots=False)[source]

Returns basic numbers about the network resources, e.g. edge and node counts.

latex
Return table in a LaTeX document. This can be compiled by PDFLaTeX: latex stats.tex
cancer_gene_census_list()[source]

Loads the list of cancer driver proteins from the COSMIC Cancer Gene Census.

clean_graph()[source]

Removes multiple edges, unknown molecules and those from wrong taxon. Multiple edges will be combined by combine_attr() method. Loops will be deleted unless the loops attribute set to True.

collapse_by_name(graph=None)[source]

Collapses nodes with the same name with copying and merging all edges and attributes.

combine_attr(lst, num_method=<built-in function max>)[source]

Combines multiple attributes into one. This method attempts to find out which is the best way to combine attributes.

  • if there is only one value or one of them is None, then returns the one available
  • lists: concatenates unique values of lists
  • numbers: returns the greater by default or calls num_method() if given.
  • sets: returns the union
  • dicts: calls common.merge_dicts()
  • Direction: calls their special merge() method

Works on more than 2 attributes recursively.

Parameters:
  • lst (list) – List of one or two attribute values.
  • num_method (callable) – Method to merge numeric attributes.
copy_edges(sources, target, move=False, graph=None)[source]

Copies edges of one node to another, keeping attributes and directions.

Parameters:
  • sources (list) – Vertex IDs to copy from.
  • target (int) – Vertex ID to copy for.
  • move (bool) – Whether perform copy or move, i.e. remove or keep the source edges.
count_sol()[source]

Counts nodes with zero degree.

curation_effort(sum_by_source=False)[source]

Returns the total number of reference-interactions pairs.

@sum_by_source
: bool
If True, counts the refrence-interaction pairs by sources, and returns the sum of these values.
delete_by_taxon(tax)[source]

Removes the proteins of all organisms which are not listed.

Parameters:tax (list) – List of NCBI Taxonomy IDs of the organisms. E.g. [7227, 9606]
delete_unknown(tax, typ='protein', defaultNameType=None)[source]

Removes those proteins which are not in the list of all default IDs of the organisms. By default, it means to remove all protein nodes not having a human SwissProt ID.

@tax
: list
List of NCBI Taxonomy IDs of the organisms of interest. E.g. [7227, 9606]
@typ
: str
Molecule type. E.g. ‘protein’ or ‘mirna’
@defaultNameType
: str
The default name type of the given molecular species. For proteins it’s ‘uniprot’ by default.
dgenesymbol(genesymbol)[source]

Returns igraph.Vertex() object if the GeneSymbol can be found in the default directed network, otherwise None.

@genesymbol
: str
GeneSymbol.
dgs(genesymbol)

Returns igraph.Vertex() object if the GeneSymbol can be found in the default directed network, otherwise None.

@genesymbol
: str
GeneSymbol.
disease_genes_list(dataset='curated')[source]

Loads the list of all disease related genes from DisGeNet. This resource is human only.

dp(identifier)

Same as PyPath.protein, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifier
: int, str
Vertex index (int) or GeneSymbol (str) or UniProt ID (str).
dprotein(identifier)[source]

Same as PyPath.protein, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifier
: int, str
Vertex index (int) or GeneSymbol (str) or UniProt ID (str).
druggability_list()[source]

Loads the list of druggable proteins from DgiDB. This resource is human only.

duniprot(uniprot)[source]

Same as PyPath.uniprot(), just for directed graph. Returns ``igraph.Vertex() object if the UniProt can be found in the default directed network, otherwise None.

@uniprot
: str
UniProt ID.
duniprots(uniprots)[source]

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default directed graph.

dup(uniprot)

Same as PyPath.uniprot(), just for directed graph. Returns ``igraph.Vertex() object if the UniProt can be found in the default directed network, otherwise None.

@uniprot
: str
UniProt ID.
dups(uniprots)

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default directed graph.

edge_exists(nameA, nameB)[source]

Returns a tuple of vertice indices if edge doesn’t exists, otherwise edge id. Not sensitive to direction.

edges_expression(func=<function <lambda>>)[source]

Executes function func for each pairs of connected proteins in the network, for every expression dataset. By default, func simply gives the product the (normalized) expression values.

func
: callable
Function to handle 2 vectors (pandas.Series() objects), should return one vector of the same length.
edges_in_comlexes(csources=['corum'], graph=None)[source]

Creates edge attributes complexes and in_complex. These are both dicts where the keys are complex resources. The values in complexes are the list of complex names both the source and the target vertices belong to. The values in_complex are boolean values whether there is at least one complex in the given resources both the source and the target vertex of the edge belong to.

@csources
: list
List of complex resources. Should be already loaded.
@graph
: igraph.Graph()
The graph object to do the calculations on.
export_dot(nodes=None, edges=None, directed=True, labels='genesymbol', edges_filter=<function <lambda>>, nodes_filter=<function <lambda>>, edge_sources=None, dir_sources=None, graph=None, return_object=False, save_dot=None, save_graphics=None, prog='neato', format=None, hide=False, font=None, auto_edges=False, hide_nodes=[], defaults={}, **kwargs)[source]

Builds a pygraphviz.AGraph() object with filtering the edges and vertices along arbitrary criteria. Returns the Agraph object if requesred, or exports the dot file, or saves the graphics.

@nodes : list List of vertex ids to be included. @edges : list List of edge ids to be included. @directed : bool Create a directed or undirected graph. @labels : str Name type to be used as id/label in the dot format. @edges_filter : function Function to filter edges, accepting igraph.Edge as argument. @nodes_filter : function Function to filter vertices, accepting igraph.Vertex as argument. @edge_sources : list Sources to be included. @dir_sources : list Direction and effect sources to be included. @graph : igraph.Graph The graph object to export. @return_object : bool Whether to return the pygraphviz.AGraph object. @save_dot : str Filename to export the dot file to. @save_graphics : str Filename to export the graphics, the extension defines the format. @prog : str The graphviz layout algorithm to use. @format : str The graphics format passed to pygraphviz.AGrapg().draw(). @hide : bool Hide filtered edges instead of omit them. @hide nodes : list Nodes to hide. List of vertex ids. @auto_edges : str Automatic, built-in style for edges. ‘DIRECTIONS’ or ‘RESOURCE_CATEGORIES’ are supported. @font : str Font to use for labels. For using more than one fonts refer to graphviz attributes with constant values or define callbacks or mapping dictionaries. @defaults : dict Default values for graphviz attributes, labeled with the entity, e.g. {‘edge_penwidth’: 0.2}. @**kwargs : constant, callable or dict Graphviz attributes, labeled by the target entity. E.g. edge_penwidth, ‘vertex_shape` or graph_label. If the value is constant, this value will be used. If the value is dict, and has _name as key, for every instance of the given entity, the value of the attribute defined by _name will be looked up in the dict, and the corresponding value will be given to the graphviz attribute. If the key _name is missing from the dict, igraph vertex and edge indices will be looked up among the keys. If the value is callable, it will be called with the current instance of the entity and the returned value will be used for the graphviz attribute. E.g. edge_arrowhead(edge) or vertex_fillcolor(vertex) Example:

import pypath from pypath import data_formats net = pypath.PyPath() net.init_network(pfile = ‘cache/default.pickle’) #net.init_network({‘arn’: data_formats.omnipath[‘arn’]}) tgf = [v.index for v in net.graph.vs if ‘TGF’ in v[‘slk_pathways’]] dot = net.export_dot(nodes = tgf, save_graphics = ‘tgf_slk.pdf’, prog = ‘dot’,

main_title = ‘TGF-beta pathway’, return_object = True, label_font = ‘HelveticaNeueLTStd Med Cn’, edge_sources = [‘SignaLink3’], dir_sources = [‘SignaLink3’], hide = True)
export_edgelist(fname, graph=None, names=['name'], edge_attributes=[], sep='\t')[source]

Write edge list to text file with attributes

@param fname: the name of the file or a stream to read from. @param graph: the igraph object containing the network @param names: list with the vertex attribute names to be printed

for source and target vertices
@param edge_attributes: list with the edge attribute names
to be printed

@param sep: string used to separate columns

find_all_paths(start, end, mode='OUT', maxlen=2, graph=None)[source]

Finds all paths up to length maxlen between groups of vertices. This function is needed only becaues igraph`s get_all_shortest_paths() finds only the shortest, not any path up to a defined length.

@start
: int or list
Indices of the starting node(s) of the paths.
@end
: int or list
Indices of the target node(s) of the paths.
@mode
: ‘IN’, ‘OUT’, ‘ALL’
Passed to igraph.Graph.neighbors()
@maxlen
: int
Maximum length of paths in steps, i.e. if maxlen = 3, then the longest path may consist of 3 edges and 4 nodes.
@graph
: igraph.Graph object
The graph you want to find paths in. self.graph by default.
genesymbol(genesymbol)[source]

Returns igraph.Vertex() object if the GeneSymbol can be found in the default undirected network, otherwise None.

@genesymbol
: str
GeneSymbol.
genesymbol_labels(graph=None, remap_all=False)[source]

Creats vertex attribute label and fills up with Gene Symbols of all proteins where the Gene Symbol can be looked up based on the default name of the protein vertex. If the attribute label had been already initialized, updates this attribute or recreates if remap_all is True.

get_directed(graph=False, conv_edges=False, mutual=False, ret=False)[source]

Converts graph undirected igraph.Graph object to a directed one. By default it converts the graph in PyPath.graph and places the directed instance in PyPath.dgraph.

@graph
: igraph.Graph
Undirected graph object.
@conv_edges
: bool
Whether to convert undirected edges (those without explicit direction information) to an arbitrary direction edge or a pair of opposite edges. Otherwise those will be deleted. Default is False.
@mutual
: bool
If conv_edges is True, whether to convert the undirected edges to a single, arbitrary directed edge, or a pair of opposite directed edges. Default is False.
@ret
: bool
Return the directed graph instance, or return None. Default is False (returns None).
get_edge(nodes)[source]

Returns the edge id only if there is an edge from nodes[0] to nodes[1], returns False if edge exists in opposite direction, or no edge exists between the two vertices, or any of the vertice ids doesn’t exist. To find edges without regarding their direction, see edge_exists().

get_giant(replace=False, graph=None)[source]

Returns the giant component of the graph, or replaces the igraph object with only the giant component.

gs(genesymbol)

Returns igraph.Vertex() object if the GeneSymbol can be found in the default undirected network, otherwise None.

@genesymbol
: str
GeneSymbol.
gs_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

@source
: str
GeneSymbol
@target
: str
GeneSymbol
@directed
: bool
To be passed to igraph.Graph.get_eid()
init_edge_attr(attr)[source]

Fills edge attribute with its default values, creates lists if in edgeAttrs the attribute is registered as list.

init_network(lst={'hprd_p': <pypath.input_formats.ReadSettings instance>, 'dbptm': <pypath.input_formats.ReadSettings instance>, 'signalink3': <pypath.input_formats.ReadSettings instance>, 'domino': <pypath.input_formats.ReadSettings instance>, 'spike': <pypath.input_formats.ReadSettings instance>, 'matrixdb': <pypath.input_formats.ReadSettings instance>, 'innatedb': <pypath.input_formats.ReadSettings instance>, 'mppi': <pypath.input_formats.ReadSettings instance>, 'intact': <pypath.input_formats.ReadSettings instance>, 'nrf2': <pypath.input_formats.ReadSettings instance>, 'trip': <pypath.input_formats.ReadSettings instance>, 'arn': <pypath.input_formats.ReadSettings instance>, 'macrophage': <pypath.input_formats.ReadSettings instance>, 'death': <pypath.input_formats.ReadSettings instance>, 'pdz': <pypath.input_formats.ReadSettings instance>, 'elm': <pypath.input_formats.ReadSettings instance>, 'dip': <pypath.input_formats.ReadSettings instance>, 'ca1': <pypath.input_formats.ReadSettings instance>, 'depod': <pypath.input_formats.ReadSettings instance>, 'biogrid': <pypath.input_formats.ReadSettings instance>, 'ccmap': <pypath.input_formats.ReadSettings instance>, 'hprd': <pypath.input_formats.ReadSettings instance>, 'guide2pharma': <pypath.input_formats.ReadSettings instance>, 'phelm': <pypath.input_formats.ReadSettings instance>, 'signor': <pypath.input_formats.ReadSettings instance>, 'lmpid': <pypath.input_formats.ReadSettings instance>, 'psite': <pypath.input_formats.ReadSettings instance>}, exclude=[], cache_files={}, pfile=False, save=False, reread=False, redownload=False)[source]

This is a lazy way to start the module, load data and build the high confidence, literature curated part of the signaling network.

init_vertex_attr(attr)[source]

Fills vertex attribute with its default values, creates lists if in vertexAttrs the attribute is registered as list.

intogen_cancer_drivers_list(intogen_file)[source]

Loads the list of cancer driver proteins from IntOGen data.

kinases_list()[source]

Loads the list of all known kinases in the proteome from kinase.com. This resource is human only.

load_compleat(graph=None)[source]

Loads complexes from Compleat. Loads data into vertex attribute graph.vs[‘complexes’][‘compleat’]. This resource is human only.

load_complexportal(graph=None)[source]

Loads complexes from ComplexPortal. Loads data into vertex attribute graph.vs[‘complexes’][‘complexportal’]. This resource is human only.

load_corum(graph=None)[source]

Loads complexes from CORUM database. Loads data into vertex attribute graph.vs[‘complexes’][‘corum’]. This resource is human only.

load_ddi(ddi)[source]

ddi is either a list of intera.DomainDomain objects, or a function resulting this list

load_dmi(dmi)[source]

dmi is either a list of intera.DomainMotif objects, or a function resulting this list

load_expression(array=False)[source]

Expression data can be loaded into vertex attributes, or into a pandas DataFrame – the latter offers faster ways to process and use these huge matrices.

load_havugimana(graph=None)[source]

Loads complexes from Havugimana 2012. Loads data into vertex attribute graph.vs[‘complexes’][‘havugimana’]. This resource is human only.

load_mutations(attributes=None, gdsc_datadir=None, mutation_file=None)[source]

Mutations are listed in vertex attributes. Mutation() objects offers methods to identify residues and look up in Ptm(), Motif() and Domain() objects, to check if those residues are modified, or are in some short motif or domain.

load_resources(lst={'hprd_p': <pypath.input_formats.ReadSettings instance>, 'dbptm': <pypath.input_formats.ReadSettings instance>, 'signalink3': <pypath.input_formats.ReadSettings instance>, 'domino': <pypath.input_formats.ReadSettings instance>, 'spike': <pypath.input_formats.ReadSettings instance>, 'matrixdb': <pypath.input_formats.ReadSettings instance>, 'innatedb': <pypath.input_formats.ReadSettings instance>, 'mppi': <pypath.input_formats.ReadSettings instance>, 'intact': <pypath.input_formats.ReadSettings instance>, 'nrf2': <pypath.input_formats.ReadSettings instance>, 'trip': <pypath.input_formats.ReadSettings instance>, 'arn': <pypath.input_formats.ReadSettings instance>, 'macrophage': <pypath.input_formats.ReadSettings instance>, 'death': <pypath.input_formats.ReadSettings instance>, 'pdz': <pypath.input_formats.ReadSettings instance>, 'elm': <pypath.input_formats.ReadSettings instance>, 'dip': <pypath.input_formats.ReadSettings instance>, 'ca1': <pypath.input_formats.ReadSettings instance>, 'depod': <pypath.input_formats.ReadSettings instance>, 'biogrid': <pypath.input_formats.ReadSettings instance>, 'ccmap': <pypath.input_formats.ReadSettings instance>, 'hprd': <pypath.input_formats.ReadSettings instance>, 'guide2pharma': <pypath.input_formats.ReadSettings instance>, 'phelm': <pypath.input_formats.ReadSettings instance>, 'signor': <pypath.input_formats.ReadSettings instance>, 'lmpid': <pypath.input_formats.ReadSettings instance>, 'psite': <pypath.input_formats.ReadSettings instance>}, exclude=[], cache_files={}, reread=False, redownload=False)[source]

Loads multiple resources, and cleans up after. Looks up ID types, and loads all ID conversion tables from UniProt if necessary. This is much faster than loading the ID conversion and the resources one by one.

map_edge(edge)[source]

Translates molecule names in dict representing an edge.

map_item(item)[source]

Translates the name in item representing a molecule.

map_list(lst, singleList=False)[source]

Only a wrapper for map_edge()

merge_lists(nameA, nameB, name=None, and_or='and', delete=False, func='max')[source]

Merges two lists in lists.

merge_nodes(nodes, primary=None, graph=None)[source]

Merges all attributes and all edges of selected nodes and assigns them to the primary node (by default the one with lowest ID).

Parameters:
  • nodes (list) – List of edge IDs.
  • primary (int) – ID of the primary edge; if None the lowest ID selected.
mutated_edges(sample)[source]

Compares the mutated residues and the modified residues in PTMs. Interactions are marked as mutated if the target residue in the underlying PTM is mutated.

network_filter(p=2.0)[source]

This function aims to cut the number of edges in the network, without loosing nodes, to make the network less connected, less hairball-like, more usable for analysis.

network_stats(outfile=None)[source]

Calculates basic statistics for the whole network and each of sources. Writes the results in a tab file.

orthology_translation(target, source=None, mapping_id_type='refseqp', graph=None)[source]

Translates the current object to another organism by orthology. Proteins without known ortholog will be deleted.

Parameters:target (int) – NCBI Taxonomy ID of the target organism. E.g. 10090 for mouse.
p(identifier)

Returns igraph.Vertex() object if the identifier is a valid vertex index in the default undirected graph, or a UniProt ID or GeneSymbol which can be found in the default undirected network, otherwise None.

@identifier
: int, str
Vertex index (int) or GeneSymbol (str) or UniProt ID (str).
process_dmi(source, **kwargs)[source]

This is an universal function for loading domain-motif objects like load_phospho_dmi() for PTMs. TODO this will replace load_elm, load_ielm, etc

protein(identifier)[source]

Returns igraph.Vertex() object if the identifier is a valid vertex index in the default undirected graph, or a UniProt ID or GeneSymbol which can be found in the default undirected network, otherwise None.

@identifier
: int, str
Vertex index (int) or GeneSymbol (str) or UniProt ID (str).
protein_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

@source
: int, str
Vertex index or UniProt ID or GeneSymbol
@target
: int, str
Vertex index or UniProt ID or GeneSymbol
@directed
: bool
To be passed to igraph.Graph.get_eid()
proteome_list(swissprot=True)[source]

Loads the whole proteome as a list.

read_data_file(settings, keep_raw=False, cache_files={}, reread=False, redownload=False)[source]

Interaction data with node and edge attributes can be read from simple text based files. This function works not only with files, but with lists as well. Any other function can be written to download a preprocess data, and then give it to this function to finally attach to the network.

@settings
: ReadSettings instance
The detailed definition of the input format. Instead of the file name you can give a function name, which will be executed, and the returned data will be used.
@keep_raw
: boolean
To keep the raw data read by this function, in order for debugging purposes, or further use.
receptors_list()[source]

Loads the Human Plasma Membrane Receptome as a list. This resource is human only.

save_session()[source]

Save current state into pickle dump.

separate()[source]

Separates networks from different sources. Returns dict of igraph objects.

separate_by_category()[source]

Separate networks based on categories. Returns dict of igraph objects.

set_chembl_mysql(title, config_file=None)[source]

Sets the ChEMBL MySQL config according to title section in config_file ini style config.

title (str): section title in ini file config_file (str, NoneType): config file name;

if None, the mysql_config/defaults.mysql will be used
set_druggability()[source]

Creates a vertex attribute dgb with value True if the protein is druggable, otherwise False.

set_drugtargets(pchembl=5.0)[source]

Creates a vertex attribute dtg with value True if the protein has at least one compound binding with affinity higher than pchembl, otherwise False.

set_kinases()[source]

Creates a vertex attribute kin with value True if the protein is a kinase, otherwise False.

set_receptors()[source]

Creates a vertex attribute rec with value True if the protein is a receptor, otherwise False.

set_signaling_proteins()[source]

Creates a vertex attribute kin with value True if the protein is a kinase, otherwise False.

set_transcription_factors(classes=['a', 'b', 'other'])[source]

Creates a vertex attribute tf with value True if the protein is a transcription factor, otherwise False.

shortest_path_dist(graph=None, subset=None, outfile=None, **kwargs)[source]

subset is a tuple of two lists if you wish to look for paths between elements of two groups, or a list if you wish to look for shortest paths within this group

signaling_proteins_list()[source]

Compiles a list of signaling proteins (as opposed to other proteins like metabolic enzymes, matrix proteins), by looking up a few simple keywords in short description of GO terms.

small_plot(graph, **kwargs)[source]

This method is deprecated, do not use it.

source_network(font='HelveticaNeueLTStd')[source]

For EMBL branding, use Helvetica Neue Linotype Standard light

straight_between(nameA, nameB)[source]

This does actually the same as get_edge(), but by names instead of vertex ids.

sum_in_complex(csources=['corum'], graph=None)[source]

Returns the total number of edges in the network falling between two members of the same complex. Returns as a dict by complex resources. Calls :py:func:pypath.pypath.Pypath.edges_in_comlexes() to do the calculations.

@csources
: list
List of complex resources. Should be already loaded.
@graph
: igraph.Graph()
The graph object to do the calculations on.
tfs_list()[source]

Loads the list of all known transcription factors from TF census (Vaquerizas 2009). This resource is human only.

third_source_directions(graph=None)[source]

This method calls a series of methods to get additional direction & effect information from sources having no literature curated references, but giving sufficient evidence about the directionality for interactions already supported by literature evidences from other sources.

uniprot(uniprot)[source]

Returns igraph.Vertex() object if the UniProt can be found in the default undirected network, otherwise None.

@uniprot
: str
UniProt ID.
uniprots(uniprots)[source]

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default undirected graph.

up(uniprot)

Returns igraph.Vertex() object if the UniProt can be found in the default undirected network, otherwise None.

@uniprot
: str
UniProt ID.
up_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

@source
: str
UniProt ID
@target
: str
UniProt ID
@directed
: bool
To be passed to igraph.Graph.get_eid()
update_cats()[source]

Makes sure that the has_cats attribute is an up to date set of all categories in the current network.

update_sources()[source]

Makes sure that the sources attribute is an up to date list of all sources in the current network.

update_vindex()[source]

This is deprecated.

update_vname()[source]

For fast lookup of node names and indexes, these are hold in a list and a dict as well. However, every time new nodes are added, these should be updated. This function is automatically called after all operations affecting node indices.

ups(uniprots)

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default undirected graph.

vertex_pathways()[source]

Some resources assignes interactions some others proteins to pathways. This function converts pathway annotations from edge attributes to vertex attributes.

class pypath.main.Direction(nameA, nameB)[source]
consensus_edges()[source]

Returns list of edges based on majority consensus of directions and signs.

get_dir(direction, sources=False)[source]

Returns boolean or list of sources

get_dirs(src, tgt, sources=False)[source]

Returns all directions with boolean values or list of sources.

majority_dir()[source]

Returns directionality based on majority consensus. Returns None if the number of sources supporting the two opposite directions are the same. Returns ‘undirected’ if there is no directionality information. Returns tuple of IDs if one direction is supported by more sources.

majority_sign()[source]

Returns signs based on majority consensus. Keys in the returned dict are directions. Values are None if the direction lacks effect sign. Otherwise tuples with their first element True if the number of sources supporting stimulation in the given direction is greater or equal compared to those supporting inhibition. The second value is the same for inhibition.

set_dir(direction, source)[source]

Adds directionality information with the corresponding data source named.

src()[source]

Returns the IDs of effector molecules in this directed interaction. If the interaction is bidirectional, the list will contain 2 IDs. If the interaction is undirec- ted, an empty list will be returned.

tgt()[source]

Returns the IDs of the target moleculess in the inter- action. Same behaviour as Direction.src().

unset_dir(direction, source=None)[source]

Removes directionality information, or single source.

class pypath.input_formats.ReadSettings(name='unknown', separator=None, nameColA=0, nameColB=1, nameTypeA='uniprot', nameTypeB='uniprot', typeA='protein', typeB='protein', isDirected=False, sign=False, inFile=None, references=False, extraEdgeAttrs={}, extraNodeAttrsA={}, extraNodeAttrsB={}, header=False, taxonA=9606, taxonB=9606, ncbiTaxId=False, interactionType='PPI', positiveFilters=[], negativeFilters=[], inputArgs={}, must_have_references=True, huge=False, resource=None)[source]
class pypath.input_formats.ReadList(name='unknown', separator=None, nameCol=0, nameType='uniprot', typ='protein', inFile=None, extraAttrs={}, header=False)[source]
class pypath.input_formats.UniprotMapping(nameType, bi=False, ncbi_tax_id=9606, swissprot='yes')[source]
class pypath.input_formats.PickleMapping(pickleFile)[source]
class pypath.input_formats.FileMapping(input, oneCol, twoCol, separator=None, header=0, bi=False, ncbi_tax_id=9606, typ='protein')[source]
class pypath.input_formats.MysqlMapping(tableName, fieldOne, fieldTwo, db=None, ncbi_tax_id=None, bi=False, mysql=None, typ='protein')[source]
class pypath.pyreact.BioPaxReader(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]

This class parses a BioPAX file and exposes its content easily accessible for further processing. First it opens the file, if necessary it extracts from the archive. Then an lxml.etree.iterparse object is created, so the iteration is efficient and memory requirements are minimal. The iterparse object is iterated then, and for each tag included in the BioPaxReader.methods dict, the appropriate method is called. These me- thods extract information from the BioPAX entity, and store it in arbit- rary data structures: strings, lists or dicts. These are stored in dicts where keys are the original IDs of the tags, prefixed with the unique ID of the parser object. This is necessary to give a way to merge later the result of parsing more BioPAX files. For example, id42 may identify EGFR in one file, but AKT1 in the other. Then, the parser of the first file has a unique ID of a 5 letter random string, the second parser a different one, and the molecules with the same ID can be distinguished at merging, e.g. EGFR will be ffjh2@id42 and AKT1 will be tr9gy@id42. The methods and the resulted dicts are named after the BioPAX elements, sometimes abbreviated. For example, BioPaxReader.protein() processes the <bp:Protein> elements, and stores the results in BioPaxReader.proteins.

In its current state, this class does not parse every information and all BioPax entities. For example, nucleic acid related entities and interactions are omitted. But these easily can be added with minor mo- difications.

biopax_size()[source]

Gets the uncompressed size of the BioPax XML. This is needed in order to have a progress bar. This method should not be called directly, BioPaxReader.process() calls it.

cleanup_hook()[source]

Removes the used elements to free up memory. This method should not be called directly, BioPaxReader.iterate() calls it.

close_biopax()[source]

Deletes the iterator and closes the file object. This method should not be called directly, BioPaxReader.process() calls it.

extract()[source]

Extracts the BioPax file from compressed archive. Creates a temporary file. This is needed to trace the progress of processing, which is useful in case of large files. This method should not be called directly, BioPaxReader.process() calls it.

init_etree()[source]

Creates the lxml.etree.iterparse object. This method should not be called directly, BioPaxReader.process() calls it.

iterate()[source]

Iterates the BioPax XML and calls the appropriate methods for each element. This method should not be called directly, BioPaxReader.process() calls it.

open_biopax()[source]

Opens the BioPax file. This method should not be called directly, BioPaxReader.process() calls it.

process(silent=False)[source]

This method executes the total workflow of BioPax processing.

Parameters:silent (bool) – whether to print status messages and progress bars.
set_progress()[source]

Initializes a progress bar. This method should not be called directly, BioPaxReader.process() calls it.

class pypath.pyreact.BioPaxReader(biopax, source, cleanup_period=800, file_from_archive=None, silent=False)[source]

This class parses a BioPAX file and exposes its content easily accessible for further processing. First it opens the file, if necessary it extracts from the archive. Then an lxml.etree.iterparse object is created, so the iteration is efficient and memory requirements are minimal. The iterparse object is iterated then, and for each tag included in the BioPaxReader.methods dict, the appropriate method is called. These me- thods extract information from the BioPAX entity, and store it in arbit- rary data structures: strings, lists or dicts. These are stored in dicts where keys are the original IDs of the tags, prefixed with the unique ID of the parser object. This is necessary to give a way to merge later the result of parsing more BioPAX files. For example, id42 may identify EGFR in one file, but AKT1 in the other. Then, the parser of the first file has a unique ID of a 5 letter random string, the second parser a different one, and the molecules with the same ID can be distinguished at merging, e.g. EGFR will be ffjh2@id42 and AKT1 will be tr9gy@id42. The methods and the resulted dicts are named after the BioPAX elements, sometimes abbreviated. For example, BioPaxReader.protein() processes the <bp:Protein> elements, and stores the results in BioPaxReader.proteins.

In its current state, this class does not parse every information and all BioPax entities. For example, nucleic acid related entities and interactions are omitted. But these easily can be added with minor mo- difications.

biopax_size()[source]

Gets the uncompressed size of the BioPax XML. This is needed in order to have a progress bar. This method should not be called directly, BioPaxReader.process() calls it.

cleanup_hook()[source]

Removes the used elements to free up memory. This method should not be called directly, BioPaxReader.iterate() calls it.

close_biopax()[source]

Deletes the iterator and closes the file object. This method should not be called directly, BioPaxReader.process() calls it.

extract()[source]

Extracts the BioPax file from compressed archive. Creates a temporary file. This is needed to trace the progress of processing, which is useful in case of large files. This method should not be called directly, BioPaxReader.process() calls it.

init_etree()[source]

Creates the lxml.etree.iterparse object. This method should not be called directly, BioPaxReader.process() calls it.

iterate()[source]

Iterates the BioPax XML and calls the appropriate methods for each element. This method should not be called directly, BioPaxReader.process() calls it.

open_biopax()[source]

Opens the BioPax file. This method should not be called directly, BioPaxReader.process() calls it.

process(silent=False)[source]

This method executes the total workflow of BioPax processing.

Parameters:silent (bool) – whether to print status messages and progress bars.
set_progress()[source]

Initializes a progress bar. This method should not be called directly, BioPaxReader.process() calls it.

class pypath.pyreact.PyReact(mapper=None, ncbi_tax_id=9606, default_id_types={}, modifications=True, seq=None, silent=False, max_complex_combinations=100, max_reaction_combinations=100)[source]
gen_cvariations()[source]

Because one key from the BioPax file might represent more complexes, complexvariations are created to give a way to represent sets of combinations. These are created for all complexes, even with only one unambiguous constitution. The keys are the constitutions of all the combinations listed in alphabetic order, separated by |. For example, A,B,C|A,B,D|A,B,E.

in_same_component(by_source=False)[source]

For all complexes connects all members of the complex with each other.

merge_complexes(this_round=None)[source]

Merges complexes from the active BioPaxReader object. Protein families and subcomplexes are expanded, and all combinations are created as separate complexes. The complexes from the same ID are added to sets in the rcomplexes dict.

merge_cvariations()[source]

This processes those complexes which are in fact a set of complex variations. As simple complexes also are always extended to complex variations because they might have not only simple proteins but protein families as members, here we only add new records to the attributes of already existing complexes. After merge_complexes will be called again, to process those simple complexes which have any of the complex variations processed here among their subcomplexes.

class pypath.pyreact.Entity(identifier, id_type, sources=[], attrs=None)[source]
expand()[source]

With this method it is possible to iterate Entity objects just like EntitySet objects.

Yields string.

class pypath.pyreact.Protein(protein_id, id_type='uniprot', sources=[], attrs=None)[source]
class pypath.pyreact.EntitySet(members, sources=[], sep=';', parent=None)[source]
class pypath.pyreact.AttributeHandler[source]
class pypath.pyreact.ProteinFamily(members, source, parent=None)[source]
class pypath.pyreact.Complex(members, source, parent=None)[source]
class pypath.pyreact.ComplexVariations(members, source, parent=None)[source]
itermembers()[source]

This is a convenient iterator for the expand methods of higher classes like ReactionSide or Control.

class pypath.pyreact.ReactionSide(members, source=[], parent=None)[source]
expand()[source]

Expands the ReactionSide by iterating over all combinations of all ComplexVariation and ProteinFamily members, so yields ReactionSide objects with only Protein and Complex members. Yields tuple, because ReactionSide is initialized in Reaction, the tuple is suitable to serve as members and attrs.

class pypath.pyreact.Reaction(left, right, left_attrs, right_attrs, source=[], parent=None)[source]
class pypath.pyreact.Control(er, ed, source=[], parent=None)[source]
pypath.plot.is_opentype_cff_font(filename)[source]

This is necessary to fix a bug in matplotlib: https://github.com/matplotlib/matplotlib/pull/6714 Returns True if the given font is a Postscript Compact Font Format Font embedded in an OpenType wrapper. Used by the PostScript and PDF backends that can not subset these fonts.

pypath.plot.randn(d0, d1, ..., dn)

Return a sample (or samples) from the “standard normal” distribution.

If positive, int_like or int-convertible arguments are provided, randn generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1 (if any of the are floats, they are first converted to integers by truncation). A single float randomly sampled from the distribution is returned if no argument is provided.

This is a convenience function. If you want an interface that takes a tuple as the first argument, use numpy.random.standard_normal instead.

d0, d1, ..., dn
: int, optional
The dimensions of the returned array, should be all positive. If no argument is given a single Python float is returned.
Z
: ndarray or float
A (d0, d1, ..., dn)-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.

random.standard_normal : Similar, but takes a tuple as its argument.

For random samples from , use:

sigma * np.random.randn(...) + mu

>>> np.random.randn()
2.1923875335537315 #random

Two-by-four array of samples from N(3, 6.25):

>>> 2.5 * np.random.randn(2, 4) + 3
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],  #random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]]) #random
class pypath.plot.Plot(fname=None, font_family='Helvetica Neue LT Std', font_style='normal', font_weight='normal', font_variant='normal', font_stretch='normal', palette=None, context='poster', lab_size=(9, 9), axis_lab_size=10.0, rc={})[source]
finish()[source]

Saves and closes a figure.

class pypath.plot.MultiBarplot(x, y, categories=None, cat_names=None, cat_ordr=None, fname=None, figsize=(12, 4), xlab='', ylab='', title='', lab_angle=90, color='#007b7f', order=False, desc=True, ylog=False, legend=None, fin=True, rc={}, axis_lab_font={}, bar_args={}, ticklabel_font={}, legend_font={}, title_font={}, title_halign='center', title_valign='top', y2=None, color2=None, ylim=None, grouped=False, group_labels=[], summary=False, summary_name='', uniform_ylim=False, do=True, legloc=0, maketitle=True, **kwargs)[source]
by_plot()[source]

Sets list of lists with x and y values and colors by category.

do_plot()[source]

Calls the plotting methods in the correct order.

finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

labels()[source]

Sets properties of axis labels and ticklabels.

make_plots()[source]

Does the actual plotting.

plot()[source]

The total workflow of this class. Calls all methods in the correct order.

plots_order()[source]

Defines the order of the subplots.

post_plot()[source]

Saves the plot into file, and closes the figure.

pre_plot()[source]

Executes all necessary tasks before plotting in the correct order.

reload()[source]

Reloads the module and updates the class instance.

set_categories()[source]

Sets a list with category indices (integers) of length equal of x, and sets dicts to translate between category names and indices.

set_colors(colseries='')[source]

Compiles an array of colors equal length of x.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_grid()[source]

Sets up a grid according to the number of subplots, with proportions according to the number of elements in each subplot.

set_title()[source]

Sets the main title.

sort()[source]

Finds the defined or default order, and sorts the arrays x, y and col accordingly.

class pypath.plot.ScatterPlus(x, y, size=None, color='#114477', labels=None, xlog=False, ylog=False, xlim=None, ylim=None, xtickscale=None, ytickscale=None, legscale=None, fname=None, confi=True, title_font={}, ticklabel_font={}, legend_font={}, axis_lab_font={}, annot_font={}, xlab='', ylab='', axis_lab_size=10.0, min_size=5.0, max_size=30.0, log_size=False, alpha=0.5, size_scaling=0.8, lab_angle=90, order=False, desc=True, legend=True, legtitle='', legstrip=(None, None), color_labels=[], legloc=4, size_to_value=<function <lambda>>, value_to_size=<function <lambda>>, figsize=(10.0, 7.5), title='', title_halign='center', title_valign='top', fin=True, rc={}, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

reload()[source]

Reloads the module and updates the class instance.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_title()[source]

Sets the main title.

values_to_sizes(values)[source]

Transformation converts from size values in data dimension to dimension of the size graphical parameter.

class pypath.plot.StackedBarplot(x, y, fname, names, colors, xlab='', ylab='', title='', title_halign='center', title_valign='top', bar_args={}, axis_lab_font={}, ticklabel_font={}, title_font={}, legend_font={}, lab_angle=90, figsize=(9, 6), legend=True, order=False, desc=True)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

plot()[source]

The total workflow of this class. Calls all methods in the correct order.

reload()[source]

Reloads the module and updates the class instance.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_title()[source]

Sets the main title.

class pypath.plot.SimilarityGraph(pp, fname, similarity, size, layout_method='fruchterman_reingold', layout_param={}, width=1024, height=1024, margin=124, **kwargs)[source]
sizes_edge()[source]

Sets the size according to number of edges for each resource.

class pypath.plot.Histogram(data, labels, fname, font_family='Helvetica Neue LT Std', font_style='normal', font_weight='normal', font_variant='normal', font_stretch='normal', xlab='', ylab='', title='', axis_lab_size=10.0, lab_angle=90, lab_size=(9, 9), color=None, palette=None, rc={}, context='poster', figsize=(5.0, 3.0), bins=None, nbins=None, x_log=False, y_log=False, tone=2, alpha=0.5, legend_size=6, xlim=None, kde_base=0.2, kde_perc=12.0, **kwargs)[source]
class pypath.plot.HistoryTree(fname, **kwargs)[source]
run_latex()[source]

Runs LaTeX to compile the TeX file.

write_tex()[source]

Writes the TeX markup to file.

class pypath.plot.HtpCharacteristics(pp, fname, upper=200, lower=5, axis_lab_font={}, ticklabel_font={}, title_font={}, title='', htdata={}, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_grid()[source]

Sets up a grid according to the number of subplots, with one additional column of zero width on the left to have aligned y axis labels.

set_title()[source]

Sets the main title.

class pypath.plot.RefsComposite(pp, fname, axis_lab_font={}, ticklabel_font={}, title_font={}, legend_font={}, bar_args={}, title='', color='#88CCEE', hcolor='#88CCEE', all_color='#44AA99', htp_threshold=20, figsize=(12.8, 8.8), all_name='All', curation_plot=False, pubmeds=None, earliest=None, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_grid()[source]

Sets up a grid according to the number of subplots, with one additional column of zero width on the left to have aligned y axis labels.

set_title()[source]

Sets the main title.

class pypath.plot.CurationPlot(pp, fname, colors, pubmeds=None, earliest=None, axis_lab_font={}, ticklabel_font={}, legend_font={}, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

class pypath.plot.BarplotsGrid(pp, x, by, fname, ylab, data=None, color='#77AADD', xlim=None, uniform_xlim=True, uniform_ylim=False, full_range_x=True, sort=False, desc=False, ylog=False, hoffset=0.0, woffset=0.0, axis_lab_font={}, ticklabel_font={}, small_ticklabel_font={}, title_font={}, bar_args={}, htp_threshold=20, xmin=None, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

set_grid()[source]

Sets up a grid according to the number of subplots, with one additional column of zero width on the left to have aligned y axis labels.

set_title()[source]

Sets the main title.

class pypath.plot.Dendrogram(fname, data, color='#4477AA', axis_lab_font={}, ticklabel_font={}, **kwargs)[source]
finish()[source]

Applies tight layout, draws the figure, writes the file and closes.

init_fig()[source]

Creates a figure using the object oriented matplotlib interface.

set_figsize()[source]

Converts width and height to a tuple so can be used for figsize.

class pypath.analysis.Workflow(name, network_datasets=[], do_main_table=True, do_compile_main_table=True, do_curation_table=True, do_compile_curation_table=True, do_simgraphs=True, do_multi_barplots=True, do_coverage_groups=True, do_htp_char=True, do_ptms_barplot=True, do_scatterplots=True, do_history_tree=True, do_compile_history_tree=True, do_refs_journals_grid=True, do_refs_years_grid=True, do_dirs_stacked=True, do_refs_composite=True, do_curation_plot=True, do_refs_by_j=True, do_refs_by_db=True, do_refs_by_year=True, do_resource_list=True, do_compile_resource_list=True, do_consistency_dedrogram=True, do_consistency_table=True, title=None, outdir=None, htdata={}, inc_raw=None, **kwargs)[source]
class pypath.intera.Residue(number, name, identifier, id_type='uniprot', isoform=1, mutated=False, seq=None)[source]
class pypath.intera.Mutation(original, mutated, sample, properties={})[source]
class pypath.intera.Motif(protein, start, end, id_type='uniprot', regex=None, instance=None, isoform=1, motif_name=None, prob=None, elm=None, description=None, seq=None, source=None)[source]
class pypath.intera.Ptm(protein, id_type='uniprot', typ='unknown', motif=None, residue=None, source=None, isoform=1, seq=None)[source]
class pypath.intera.Domain(protein, id_type='uniprot', domain=None, domain_id_type='pfam', start=None, end=None, isoform=1, chains={})[source]
class pypath.intera.DomainMotif(domain, ptm, sources=None, refs=None, pdbs=None)[source]
class pypath.intera.DomainDomain(domain_a, domain_b, pdbs=None, sources=None, refs=None, contact_residues=None)[source]
class pypath.intera.Interface(id_a, id_b, source, id_type='uniprot', pdb=None, css=None, stab_en=None, solv_en=None, area=None, isoform_a=1, isoform_b=1)[source]
add_residues(res_a, res_b, typ='undefined')[source]

Adds one pair of residues of type typ, where res_a and res_b are tuples of residue number in sequence and residue type, e.g. (124, ‘S’) – (means Serine #124) typ can be undefined, hbonds, sbridges, ssbonds or covbonds

bond_types()[source]

Returns the bond types present in this interface

get_bonds(typ=None, mode=None)[source]

Gives a generator to iterate throught bonds in this interface. If no type given, bonds of all types returned.

numof_residues()[source]

Returns the number of residue pairs by bound type

class pypath.pdb.ResidueMapper[source]

This class stores and serves the PDB –> UniProt residue level mapping. Attempts to download the mapping, and stores it for further use. Converts PDB residue numbers to the corresponding UniProt ones.

clean()[source]

Removes cached mappings, freeing up memory.

class pypath.pdb.ResidueMapper[source]

This class stores and serves the PDB –> UniProt residue level mapping. Attempts to download the mapping, and stores it for further use. Converts PDB residue numbers to the corresponding UniProt ones.

clean()[source]

Removes cached mappings, freeing up memory.

class pypath.proteomicsdb.ProteomicsDB(username, password, output_format='json')[source]
class pypath.mapping.MappingTable(one, two, typ, source, param, ncbi_tax_id, mysql=None, log=None, cache=False, cachedir='cache', uniprots=None)[source]

To initialize ID conversion tables for the first time data is downloaded from UniProt and read to dictionaries. It takes a couple of seconds. Data is saved to pickle dumps, this way after tables load much faster.

read_mapping_uniprot(param, ncbi_tax_id=None)[source]

Downloads ID mappings directly from UniProt. See the names of possible identifiers here: http://www.uniprot.org/help/programmatic_access

@param : UniprotMapping instance

class pypath.mapping.Mapper(ncbi_tax_id=9606, mysql_conf=(None, 'mapping'), log=None, cache=True, cachedir='cache')[source]
load_mappings(maplst=None, ncbi_tax_id=None)[source]

mapList is a list of mappings to load; elements of mapList are dicts containing the id names, molecule type, and preferred source e.g. (“one”: “uniprot”, “two”: “refseq”, “typ”: “protein”, “src”: “mysql”, “par”: “mysql_param/file_param”) by default those are loaded from pickle files

load_uniprot_mapping(filename, ncbi_tax_id=None)[source]

This is a wrapper to load a ... mapping table.

map_name(name, nameType, targetNameType, ncbi_tax_id=None, strict=False, silent=True)[source]

This function should be used to convert individual IDs. It takes care about everything, you don’t need to think on the details. How does it work: looks up dictionaries between the original and target ID type, if doesn’t find, attempts to load from the predefined inputs. If the original name is genesymbol, first it looks up among the preferred gene names from UniProt, if not found, it takes an attempt with the alternative gene names. If the gene symbol still couldn’t be found, and strict = False, the last attempt only the first 5 chara- cters of the gene symbol matched. If the target name type is uniprot, then it converts all the ACs to primary. Then, for the Trembl IDs it looks up the preferred gene names, and find Swissprot IDs with the same preferred gene name.

@name
: str
The original name which shall be converted.
@nameType
: str
The type of the name. Available by default: - genesymbol (gene name) - entrez (Entrez Gene ID [#]) - refseqp (NCBI RefSeq Protein ID [NP_*|XP_*]) - ensp (Ensembl protein ID [ENSP*]) - enst (Ensembl transcript ID [ENST*]) - ensg (Ensembl genomic DNA ID [ENSG*]) - hgnc (HGNC ID [HGNC:#]) - gi (GI number [#]) - embl (DDBJ/EMBL/GeneBank CDS accession) - embl_id (DDBJ/EMBL/GeneBank accession) To use other IDs, you need to define the input method and load the table before calling :py:func:Mapper.map_name().
primary_uniprot(lst)[source]

For a list of UniProt IDs returns the list of primary ids.

trembl_swissprot(lst, ncbi_tax_id=None)[source]

For a list of Trembl and Swissprot IDs, returns possibly only Swissprot, mapping from Trembl to gene names, and then back to Swissprot.

which_table(nameType, targetNameType, load=True, ncbi_tax_id=None)[source]

Returns the table which is suitable to convert an ID of nameType to targetNameType. If no such table have been loaded yet, it attempts to load from UniProt.

class pypath.mapping.MappingTable(one, two, typ, source, param, ncbi_tax_id, mysql=None, log=None, cache=False, cachedir='cache', uniprots=None)[source]

To initialize ID conversion tables for the first time data is downloaded from UniProt and read to dictionaries. It takes a couple of seconds. Data is saved to pickle dumps, this way after tables load much faster.

class pypath.reflists.ReferenceList(nameType, typ, tax, inFile, **kwargs)[source]
pypath.refs

alias of pypath.refs

class pypath.refs.Reference(pmid)[source]
class pypath.dataio.ResidueMapper[source]

This class stores and serves the PDB –> UniProt residue level mapping. Attempts to download the mapping, and stores it for further use. Converts PDB residue numbers to the corresponding UniProt ones.

clean()[source]

Removes cached mappings, freeing up memory.

pypath.dataio.acsn_ppi(keep_in_complex_interactions=True)[source]

Processes ACSN data from local file. Returns list of interactions.

@keep_in_complex_interactions
: bool
Whether to include interactions from complex expansion.
pypath.dataio.biogrid_interactions(organism=9606, htp_limit=1, ltp=True)[source]

Downloads and processes BioGRID interactions. Keeps only the “low throughput” interactions. Returns list of interactions.

@organism
: int
NCBI Taxonomy ID of organism.
@htp_limit
: int
Exclude interactions only from references cited at more than this number of interactions.
pypath.dataio.dip_login(user, passwd)[source]

This does not work for unknown reasons.

In addition, the binary_data parameter of Curl().__init__() has been changed, below updates are necessary.

pypath.dataio.get_acsn_effects()[source]

Processes ACSN data, returns list of effects.

pypath.dataio.get_ca1()[source]

Downloads and processes the CA1 signaling network (Ma’ayan 2005). Returns list of interactions.

pypath.dataio.get_ccmap(organism=9606)[source]

Downloads and processes CancerCellMap. Returns list of interactions.

@organism
: int
NCBI Taxonomy ID to match column #7 in nodes file.
pypath.dataio.get_complexportal(species=9606, zipped=True)[source]

Complex dataset from IntAct. See more: http://www.ebi.ac.uk/intact/complex/ http://nar.oxfordjournals.org/content/early/2014/10/13/nar.gku975.full.pdf

pypath.dataio.get_dbptm(organism=9606)[source]

Downloads enzyme-substrate interactions from dbPTM. Returns list of dicts.

pypath.dataio.get_dgidb()[source]

Downloads and processes the list of all human druggable proteins. Returns a list of GeneSymbols.

pypath.dataio.get_disgenet(dataset='curated')[source]

Downloads and processes the list of all human disease related proteins from DisGeNet. Returns dict of dicts.

@dataset
: str
Name of DisGeNet dataset to be obtained: curated, literature, befree or all.
pypath.dataio.get_domino_ptms()[source]

The table comes from dataio.get_domino(), having the following fields: header = [‘uniprot-A’, ‘uniprot-B’, ‘isoform-A’, ‘isoform-B’, #3 ‘exp. method’, ‘references’, ‘taxon-A’, ‘taxon-B’, #7 ‘role-A’, ‘role-B’, ‘binding-site-range-A’, ‘binding-site-range-B’, #11 ‘domains-A’, ‘domains-B’, ‘ptm-residue-A’, ‘ptm-residue-B’, #15 ‘ptm-type-mi-A’, ‘ptm-type-mi-B’, ‘ptm-type-A’, ‘ptm-type-B’, #19 ‘ptm-res-name-A’, ‘ptm-res-name-B’, ‘mutations-A’, ‘mutations-B’, #23 ‘mutation-effects-A’, ‘mutation-effects-B’, ‘domains-interpro-A’, #26 ‘domains-interpro-B’, ‘negative’] #28

pypath.dataio.get_elm_interactions()[source]

Downlods manually curated interactions from ELM. This is the gold standard set of ELM.

pypath.dataio.get_graphviz_attrs()[source]

Downloads graphviz attribute list from graphviz.org. Returns 3 dicts of dicts: graph_attrs, vertex_attrs and edge_attrs.

pypath.dataio.get_guide2pharma(organism='human', endogenous=True)[source]

Downloads and processes Guide to Pharmacology data. Returns list of dicts.

@organism
: str
Name of the organism, e.g. human.
@endogenous
: bool
Whether to include only endogenous ligands interactions.
pypath.dataio.get_havugimana()[source]

Downloads data from Supplement Table S3/1 from Havugimana 2012 Cell. 150(5): 1068–1081.

pypath.dataio.get_homologene()[source]

Downloads the recent release of the NCBI HomoloGene database. Returns file pointer.

pypath.dataio.get_hpmr()[source]

Downloads and processes the list of all human receptors from human receptor census (HPMR – Human Plasma Membrane Receptome). Returns list of GeneSymbols.

pypath.dataio.get_hprd(in_vivo=True)[source]

Downloads and preprocesses HPRD data.

pypath.dataio.get_hprd_ptms(in_vivo=True)[source]

Processes HPRD data and extracts PTMs. Returns list of kinase-substrate interactions.

pypath.dataio.get_hsn()[source]

Downloads and processes HumanSignalingNetwork version 6 (published 2014 Jan by Edwin Wang). Returns list of interactions.

pypath.dataio.get_i3d()[source]

Interaction3D contains residue numbers in given chains in given PDB stuctures, so we need to add an offset to get the residue numbers valid for UniProt sequences. Offsets can be obtained from Instruct, or from the Pfam PDB-chain-UniProt mapping table.

pypath.dataio.get_instruct()[source]

Instruct contains residue numbers in UniProt sequences, it means no further calculations of offsets in chains of PDB structures needed. Chains are not given, only a set of PDB structures supporting the domain-domain // protein-protein interaction.

pypath.dataio.get_instruct_offsets()[source]

These offsets should be understood as from UniProt to PDB.

pypath.dataio.get_kegg(mapper=None)[source]

Downloads and processes KEGG Pathways. Returns list of interactions.

pypath.dataio.get_kinases()[source]

Downloads and processes the list of all human kinases. Returns a list of GeneSymbols.

pypath.dataio.get_laudanna_directions()[source]

Downloads and processes the SignalingFlow edge attributes from Laudanna Lab. Returns list of directions.

pypath.dataio.get_laudanna_effects()[source]

Downloads and processes the SignalingDirection edge attributes from Laudanna Lab. Returns list of effects.

pypath.dataio.get_li2012()[source]

Reads supplementary data of Li 2012 from local file. Returns table (list of lists).

pypath.dataio.get_lit_bm_13()[source]

Downloads and processes Lit-BM-13 dataset, the high confidence literature curated interactions from CCSB. Returns list of interactions.

pypath.dataio.get_phosphoelm(organism=9606, ltp_only=True)[source]

Downloads kinase-substrate interactions from phosphoELM. Returns list of dicts.

Parameters:
  • organism (int) – NCBI Taxonomy ID.
  • ltp_only (bool) – Include only low-throughput interactions.
pypath.dataio.get_phosphosite(cache=True)[source]

Downloads curated and HTP data from Phosphosite, from preprocessed cache file if available. Processes BioPAX format. Returns list of interactions.

pypath.dataio.get_phosphosite_curated()[source]

Loads literature curated PhosphoSite data, from preprocessed cache file if available. Returns list of interactions.

pypath.dataio.get_phosphosite_noref()[source]

Loads HTP PhosphoSite data, from preprocessed cache file if available. Returns list of interactions.

pypath.dataio.get_pmid(idList)[source]

For a list of doi or PMC IDs fetches the corresponding PMIDs.

pypath.dataio.get_switches_elm()[source]

switches.elm is a resource containing functional switches in molecular regulation, in domain-motif level resolution, classified into categories according to their mechanism.

pypath.dataio.get_tfcensus(classes=['a', 'b', 'other'])[source]

Downloads and processes list of all human transcripton factors. Returns dict with lists of ENSGene IDs and HGNC Gene Names.

pypath.dataio.get_wang_effects()[source]

Downloads and processes Wang Lab HumanSignalingNetwork. Returns list of effects.

pypath.dataio.homologene_dict(source, target, id_type)[source]

Returns orthology translation table as dict, obtained from NVBI HomoloGene data.

Parameters:
  • source (int) – NCBI Taxonomy ID of the source species (keys).
  • target (int) – NCBI Taxonomy ID of the target species (values).
  • id_type (str) – ID type to be used in the dict. Possible values: ‘RefSeq’, ‘Entrez’, ‘GI’, ‘GeneSymbol’.
pypath.dataio.hprd_interactions(in_vivo=True)[source]

Processes HPRD data and extracts interactions. Returns list of interactions.

pypath.dataio.li2012_dmi(mapper=None)[source]

Converts table read by pypath.dataio.get_li2012() to list of pypath.intera.DomainMotif() objects. Translates GeneSymbols to UniProt IDs.

@mapper
: pypath.mapping.Mapper()
If not provided, a new Mapper() instance will be initialized, reserving more memory.
pypath.dataio.li2012_interactions()[source]

Converts table read by pypath.dataio.get_li2012() to list of interactions.

pypath.dataio.li2012_phospho()[source]

Converts table read by pypath.dataio.get_li2012() to list of dicts of kinase-substrate interactions.

pypath.dataio.lmpid_dmi(fname='LMPID_DATA_pubmed_ref.xml', organism=9606)[source]

Converts list of domain-motif interactions supplied by pypath.dataio.load_lmpid() to list of `pypath.intera.DomainMotif() objects.

pypath.dataio.lmpid_interactions(fname='LMPID_DATA_pubmed_ref.xml', organism=9606)[source]

Converts list of domain-motif interactions supplied by pypath.dataio.load_lmpid() to list of interactions.

pypath.dataio.load_lmpid(fname='LMPID_DATA_pubmed_ref.xml', organism=9606)[source]

Reads and processes LMPID data from local file pypath.data/LMPID_DATA_pubmed_ref.xml. The file was provided by LMPID authors and is now redistributed with the module. Returns list of domain-motif interactions.

pypath.dataio.load_macrophage()[source]

Loads Macrophage from local file. Returns list of interactions.

pypath.dataio.load_signor_ptms(organism=9606)[source]

Loads and processes Signor PTMs. Returns dict of dicts.

pypath.dataio.only_pmids(idList, strict=True)[source]

Return elements unchanged which compy to PubMed ID format, and attempts to translate the DOIs and PMC IDs using NCBI E-utils. Returns list containing only PMIDs.

@idList
: list, str
List of IDs or one single ID.
@strict
: bool
Whether keep in the list those IDs which are not PMIDs, neither DOIs or PMC IDs or NIH manuscript IDs.
pypath.dataio.open_pubmed(pmid)[source]

Opens PubMed record in web browser.

@pmid
: str or int
PubMed ID
pypath.dataio.phosphosite_directions(organism='human')[source]

From curated and HTP PhosphoSite data generates a list of directions.

pypath.dataio.reactions_biopax(biopax_file, organism=9606, protein_name_type='UniProt', clean=True)[source]

Processes a BioPAX file and extracts binary interactions.

pypath.dataio.reactome_biopax(organism=9606, cache=True)[source]

Downloads Reactome human reactions in SBML format. Returns File object.

pypath.dataio.reactome_interactions(cacheFile=None, **kwargs)[source]

Downloads and processes Reactome BioPAX. Extracts binary interactions. The applied criteria are very stringent, yields very few interactions. Requires large free memory, approx. 2G.

pypath.dataio.reactome_sbml()[source]

Downloads Reactome human reactions in SBML format. Returns gzip.GzipFile object.

pypath.dataio.read_complexes_havugimana()[source]

Supplement Table S3/1 from Havugimana 2012 Cell. 150(5): 1068–1081.

pypath.dataio.read_table(cols, fileObject=None, data=None, sep='\t', sep2=None, rem=[], hdr=None, encoding='ascii')[source]

Generic function to read data tables.

fileObject
: file-like
Any file like object: file opened for read, or StringIO buffer
cols
: dict
Dictionary of columns to read. Keys identifying fields are returned in the result. Values are column numbers.
sepLevel1
: str
Field separator of the file.
sepLevel2
: dict
Subfield separators and prefixes. E.g. {2: ‘,’, 3: ‘|’}
hdr
: int
Number of header lines. If None, no headers assumed.
rem
: list
Strings to remove. For each line these elements will be replaced with ‘’.
pypath.dataio.read_xls(xls_file, sheet='', csv_file=None, return_table=True)[source]

Generic function to read MS Excel XLS file, and convert one sheet to CSV, or return as a list of lists

pypath.dataio.rolland_hi_ii_14()[source]

Loads the HI-II-14 unbiased interactome from the large scale screening of from Rolland 2014. Returns list of interactions.

Reads and processes SignaLink3 interactions from local file. Returns list of interactions.

pypath.dataio.signor_interactions(organism=9606)[source]

Downloads the full dataset from Signor. Returns the file contents.

Note: this method has been updated Oct 2016, as Signor updated both their data and webpage.

pypath.dataio.signor_urls()[source]

This function is deprecated.

pypath.dataio.take_a_trip(cachefile='trip.pickle')[source]

Downloads TRIP data from webpage and preprocesses it. Saves preprocessed data into cachefile and next time loads from this file.

@cachefile
: str
Filename, located in ./cache. To disable cache, pass None. To download again, remove file from ./cache.
pypath.dataio.trip_find_uniprot(soup)[source]

Looks up a UniProt name in table downloaded from TRIP webpage.

@soup
: bs4.BeautifulSoup
The BeautifulSoup instance returned by pypath.dataio.trip_get_uniprot().
pypath.dataio.trip_get_uniprot(syn)[source]

Downloads table from TRIP webpage and UniProt attempts to look up the UniProt ID for one synonym.

@syn
: str
The synonym as shown on TRIP webpage.
pypath.dataio.trip_interactions(exclude_methods=['Inference', 'Speculation'], predictions=False, species='Human', strict=False)[source]

Obtains processed TRIP interactions by calling pypath.dataio.trip_process() and returns list of interactions. All arguments are passed to trip_process(), see their definition there.

pypath.dataio.trip_process(exclude_methods=['Inference', 'Speculation'], predictions=False, species='Human', strict=False)[source]

Downloads TRIP data by calling pypath.dadio.take_a_trip() and further provcesses it. Returns dict of dict with TRIP data.

@exclude_methods
: list
Interaction detection methods to be discarded.
@predictions
: bool
Whether to include predicted interactions.
@species
: str
Organism name, e.g. Human.
@strict
: bool
Whether include interactions with species not used as a bait or not specified.
pypath.dataio.trip_process_table(tab, result, intrs, trp_uniprot)[source]

Processes one HTML table downloaded from TRIP webpage.

@tab
: bs4.element.Tag()
One table of interactions from TRIP webpage.
@result
: dict
Dictionary the data should be filled in.
@intrs
: dict
Dictionary of already converted interactor IDs. This serves as a cache so do not need to look up the same ID twice.
@trp_uniprot
: str
UniProt ID of TRP domain containing protein.
pypath.dataio.vidal_hi_iii(fname)[source]

Loads the HI-III unbiased interactome from preliminary data of the next large scale screening of Vidal Lab.

The data is accessible here:
http://interactome.dfci.harvard.edu/H_sapiens/dload_trk.php

You need to register and accept the license terms.

Returns list of interactions.

class pypath.server.RestResource(pypath)[source]
class pypath.server.Rest(pypath, port)[source]
class pypath.curl.Curl(url, silent=True, get=None, post=None, req_headers=None, cache=True, debug=False, outf=None, compr=None, encoding=None, files_needed=None, timeout=300, init_url=None, init_fun='get_jsessionid', follow=True, large=False, override_post=False, init_headers=False, return_headers=False, binary_data=None, write_cache=True, force_quote=False, sftp_user=None, sftp_passwd=None, sftp_passwd_file='.secrets', sftp_port=22, sftp_host=None, sftp_ask=None, setup=True, call=True, process=True, retries=3, cache_dir='cache')[source]

This class is a wrapper around pycurl. You can set a vast amount of parameters. In addition it has a cacheing functionality, using this downloads performed only once. It handles HTTP, FTP, cookies, headers, GET and POST params, multipart/form data, URL quoting, redirects, timeouts, retries, encodings, debugging. It returns either downloaded data, file pointer, files extracted from archives (gzip, tar.gz, zip). It is able to show a progress and status indicator on the console.

construct_binary_data()[source]

The binary data content of a form/multipart type request can be constructed from a list of tuples (<field name>, <field value>), where field name and value are both type of bytes.

is_quoted(string)[source]

From http://stackoverflow.com/questions/ 1637762/test-if-string-is-url-encoded-in-php

set_binary_data()[source]

Set binary data to be transmitted attached to POST request.

binary_data is either a bytes string, or a filename, or a list of key-value pairs of a multipart form.

url_fix(charset='utf-8')[source]

From http://stackoverflow.com/a/121017/854988

class pypath.curl.FileOpener(file_param, compr=None, extract=True, _open=True, set_fileobj=True, files_needed=None, large=True)[source]

This class opens a file, extracts it in case it is a gzip, tar.gz, tar.bz2 or zip archive, selects the requested files if you only need certain files from a multifile archive, reads the data from the file, or returns the file pointer, as you request. It examines the file type and size.

class pypath.curl.cache_delete_off[source]

This is a context handler which stops pypath.curl.Curl() deleting the cache files. This is the default behaviour, so this context won’t change anything by default.

Behind the scenes it sets the value of the pypath.curl.CACHEDEL module level variable to False.

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_delete_off():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.cache_delete_on[source]

This is a context handler which results pypath.curl.Curl() deleting the cache files instead of reading it. Then it downloads the data again, or does nothing if the DRYRUN context is turned on. Upon deleting cache files console messages will let you know which files have been deleted.

Behind the scenes it sets the value of the pypath.curl.CACHEDEL module level variable to True (by default it is False).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_delete_on():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.cache_off[source]

This is a context handler to turn off pypath.curl.Curl() cache. Data will be downloaded even if it exists in cache.

Behind the scenes it sets the value of the pypath.curl.CACHE module level variable to False (by default it is None).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

print('`curl.CACHE` is ', curl.CACHE)

with curl.cache_on():
    print('`curl.CACHE` is ', curl.CACHE)
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.cache_on[source]

This is a context handler to turn on pypath.curl.Curl() cache. As most of the methods use cache as their default behaviour, probably it won’t change anything.

Behind the scenes it sets the value of the pypath.curl.CACHE module level variable to True (by default it is None).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

print('`curl.CACHE` is ', curl.CACHE)

with curl.cache_on():
    print('`curl.CACHE` is ', curl.CACHE)
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.cache_print_off[source]

This is a context handler which stops pypath.curl.Curl() to print verbose messages about its cache.

Behind the scenes it sets the value of the pypath.curl.CACHEPRINT module level variable to False. As by default it is False, this context won’t modify the default behaviour.

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_print_off():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.cache_print_on[source]

This is a context handler which makes pypath.curl.Curl() print verbose messages about its cache.

Behind the scenes it sets the value of the pypath.curl.CACHEPRINT module level variable to True (by default it is False).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_print_on():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.dryrun_off[source]

This is a context handler which results pypath.curl.Curl() to perform download or cache read. This is the default behaviour, so applying this context restores the default.

Behind the scenes it sets the value of the pypath.curl.DRYRUN module level variable to False.

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_dryrun_off():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.dryrun_on[source]

This is a context handler which results pypath.curl.Curl() to do all setup steps, but do not perform download or cache read.

Behind the scenes it sets the value of the pypath.curl.DRYRUN module level variable to True (by default it is False).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_dryrun_on():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.preserve_off[source]

This is a context handler which avoids pypath.curl.Curl() to make a reference to itself in the module level variable LASTCURL. By default it does not do this, so this context only restores the default.

Behind the scenes it sets the value of the pypath.curl.PRESERVE module level variable to False.

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_preserve_off():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.preserve_on[source]

This is a context handler which results pypath.curl.Curl() to make a reference to itself in the module level variable LASTCURL. This is useful if you have some issue with Curl, and you want to access the instance for debugging.

Behind the scenes it sets the value of the pypath.curl.PRESERVE module level variable to True (by default it is False).

Example:

import pypath
from pypath import curl, data_formats

pa = pypath.PyPath()

with curl.cache_preserve_on():
    pa.load_resources({'signor': data_formats.pathway['signor']})
class pypath.curl.Curl(url, silent=True, get=None, post=None, req_headers=None, cache=True, debug=False, outf=None, compr=None, encoding=None, files_needed=None, timeout=300, init_url=None, init_fun='get_jsessionid', follow=True, large=False, override_post=False, init_headers=False, return_headers=False, binary_data=None, write_cache=True, force_quote=False, sftp_user=None, sftp_passwd=None, sftp_passwd_file='.secrets', sftp_port=22, sftp_host=None, sftp_ask=None, setup=True, call=True, process=True, retries=3, cache_dir='cache')[source]

This class is a wrapper around pycurl. You can set a vast amount of parameters. In addition it has a cacheing functionality, using this downloads performed only once. It handles HTTP, FTP, cookies, headers, GET and POST params, multipart/form data, URL quoting, redirects, timeouts, retries, encodings, debugging. It returns either downloaded data, file pointer, files extracted from archives (gzip, tar.gz, zip). It is able to show a progress and status indicator on the console.

construct_binary_data()[source]

The binary data content of a form/multipart type request can be constructed from a list of tuples (<field name>, <field value>), where field name and value are both type of bytes.

is_quoted(string)[source]

From http://stackoverflow.com/questions/ 1637762/test-if-string-is-url-encoded-in-php

set_binary_data()[source]

Set binary data to be transmitted attached to POST request.

binary_data is either a bytes string, or a filename, or a list of key-value pairs of a multipart form.

url_fix(charset='utf-8')[source]

From http://stackoverflow.com/a/121017/854988

class pypath.curl.FileOpener(file_param, compr=None, extract=True, _open=True, set_fileobj=True, files_needed=None, large=True)[source]

This class opens a file, extracts it in case it is a gzip, tar.gz, tar.bz2 or zip archive, selects the requested files if you only need certain files from a multifile archive, reads the data from the file, or returns the file pointer, as you request. It examines the file type and size.

class pypath.chembl.Chembl(chembl_mysql=(None, 'chembl_ebi'), ncbi_tax_id=9606, mapping_mysql=None, mapper=None)[source]
class pypath.seq.Seq(protein, sequence, isoform=1)[source]
class pypath.mysql.MysqlRunner(param, cursor='serverside', concurrent_queries=4, log=None, silent=False)[source]
get_qid(query)[source]

Returns the 32 byte md5sum of a string: this serves as a unique identifier of queries, referred as qid in this module.

@query
: str
MySQL query or any other string.
class pypath.mysql_connect.MysqlConnect(config=None, log=None, timeout=12)[source]
class pypath.progress.Progress(total=None, name='Progress', interval=3000, percent=True, status='initializing')[source]
pypath.colorgen.flatten()

chain.from_iterable(iterable) –> chain object

Alternate chain() contructor taking a single iterable argument that evaluates lazily.

pypath.colorgen.getfracs()[source]
[Fraction(0, 1), Fraction(1, 2), Fraction(1, 4), Fraction(3, 4),
Fraction(1, 8), Fraction(3, 8), Fraction(5, 8), Fraction(7, 8), Fraction(1, 16), Fraction(3, 16), ...]

[0.0, 0.5, 0.25, 0.75, 0.125, 0.375, 0.625, 0.875, 0.0625, 0.1875, ...]

pypath.colorgen.zenos_dichotomy()[source]

http://en.wikipedia.org/wiki/1/2_%2B_1/4_%2B_1/8_%2B_1/16_%2B_%C2%B7_%C2%B7_%C2%B7

pypath.common.uniqList(seq)[source]

Not order preserving From http://www.peterbe.com/plog/uniqifiers-benchmark

pypath.common.rotate(point, angle, center=(0.0, 0.0))[source]

from http://stackoverflow.com/a/20024348/854988 Rotates a point around center. Angle is in degrees. Rotation is counter-clockwise

pypath.common.cleanDict(dct)[source]

Removes None values from dict and casts everything else to str.

pypath.common.md5(value)[source]

Returns the ms5sum of value as string.

class pypath.enrich.Enrichment(set_count, pop_count, set_size, pop_size, data)[source]
class pypath.enrich.EnrichmentSet(data, pop_size, correction_method='hommel', alpha=0.05)[source]
class pypath.gsea.GSEA(user=None, mapper=None)[source]
class pypath.gsea.GSEABinaryEnrichmentSet(basic_set, gsea=None, geneset_ids=None, alpha=0.05, correction_method='hommel', user=None, mapper=None)[source]
pypath.go.load_go(graph, aspect=['C', 'F', 'P'])[source]

@graph : igraph.Graph Any igraph.Graph object with uniprot IDs in its name vertex attribute.

class pypath.go.GOAnnotation(organism=9606)[source]
class pypath.go.GOEnrichmentSet(aspect, organism=9606, annotation=None, basic_set=None, alpha=0.05, correction_method='hommel')[source]
class pypath.drawing.Plot(graph=None, filename=None, graphix_dir='pdf', graphix_format='pdf', name=None, title_text=None, title_font_family=None, title_font_size=None, title_color='#646567', size=None, layout='fruchterman_reingold', layout_param=None, vertex_label=None, vertex_size=None, vertex_label_size='degree_label_size', edge_width=None, vertex_color='#6EA945', vertex_label_color='#007B7F', vertex_alpha='AA', vertex_frame_color='#FFFFFF00', vertex_frame_width=0, edge_label=None, edge_label_size=None, edge_label_color='#007B7F', edge_curved=None, edge_color='#818284', edge_alpha='AA', autocurve=None, vertex_label_font='sans-serif', edge_label_font='sans-serif', edge_arrow_size=1.0, edge_arrow_width=1.0, palettes={}, bbox=None, margin=10, small=None, dimensions=(1280, 1280), grouping=None, **kwargs)[source]