Reference

annot module

class pypath.annot.Adhesome(**kwargs)[source]
class pypath.annot.AnnotationBase(name, ncbi_tax_id=9606, input_method=None, input_args=None, entity_type='protein', swissprot_only=True, proteins=(), complexes=(), reference_set=(), infer_complexes=True, dump=None, **kwargs)[source]
add_complexes_by_inference(complexes=None)[source]

Creates complex annotations by in silico inference and adds them to this annotation set.

all_proteins()[source]

All UniProt IDs annotated in this resource.

annotate_complex(cplex)[source]

Infers annotations for a single complex.

complex_inference(complexes=None)[source]

Annotates all complexes in complexes, by default in the default complex database (existing in the complex module or generated on demand according to the module’s current settings).

Dict with complexes as keys and sets of annotations as values. Complexes with no valid information in this annotation resource won’t be in the dict.

complexesiterable

Iterable yielding complexes.

get_subset(method=None, **kwargs)[source]

Retrieves a subset by filtering based on kwargs. Each argument should be a name and a value or set of values. Elements having the provided values in the annotation will be returned. Returns a set of UniProt IDs.

load_proteins()[source]

Retrieves a set of all UniProt IDs to have a base set of the entire proteome.

reload()[source]

Reloads the object from the module level.

class pypath.annot.CancerGeneCensus(**kwargs)[source]
class pypath.annot.CellPhoneDB(**kwargs)[source]
record

alias of pypath.dataio.CellPhoneDBAnnotation

class pypath.annot.CellPhoneDBComplex(**kwargs)[source]
class pypath.annot.CellSurfaceProteinAtlas(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Comppi(**kwargs)[source]
class pypath.annot.Corum(name, annot_attr, **kwargs)[source]
class pypath.annot.CorumFuncat(**kwargs)[source]
class pypath.annot.CorumGO(**kwargs)[source]
class pypath.annot.Cpad(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Dgidb(**kwargs)[source]
class pypath.annot.Disgenet(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Exocarta(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.GOIntercell(categories=None, go_annot=None, ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.GuideToPharmacology(load_sources=False, **kwargs)[source]
class pypath.annot.Hgnc(**kwargs)[source]
class pypath.annot.HpmrComplex(**kwargs)[source]
class pypath.annot.HumanPlasmaMembraneReceptome(**kwargs)[source]
class pypath.annot.HumanProteinAtlas(**kwargs)[source]
class pypath.annot.Integrins(**kwargs)[source]
class pypath.annot.Intogen(**kwargs)[source]
class pypath.annot.KeggPathways(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Kinases(**kwargs)[source]
class pypath.annot.Kirouac2010(load_sources=False, **kwargs)[source]
class pypath.annot.LigandReceptor(name, ligand_col=None, receptor_col=None, ligand_id_type=None, receptor_id_type=None, record_processor_method=None, record_extra_fields=None, record_defaults=None, extra_fields_methods=None, **kwargs)[source]
class pypath.annot.Locate(ncbi_tax_id=9606, literature=True, external=True, predictions=False, **kwargs)[source]
class pypath.annot.Matrisome(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Matrixdb(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Membranome(**kwargs)[source]
class pypath.annot.NetpathPathways(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Opm(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Phosphatome(**kwargs)[source]
class pypath.annot.Ramilowski2015(load_sources=False, **kwargs)[source]
class pypath.annot.Ramilowski2015Location(**kwargs)[source]
class pypath.annot.SignalinkPathways(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.SignorPathways(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Surfaceome(**kwargs)[source]
class pypath.annot.Tfcensus(**kwargs)[source]
class pypath.annot.Topdb(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Vesiclepedia(ncbi_tax_id=9606, **kwargs)[source]
class pypath.annot.Zhong2015(**kwargs)[source]
pypath.annot.get_db(keep_annotators=True, create_dataframe=False, use_complexes=True, **kwargs)[source]

Retrieves the current database instance and initializes it if does not exist yet.

pypath.annot.init_db(keep_annotators=True, create_dataframe=False, use_complexes=True, **kwargs)[source]

Initializes or reloads the annotation database. The database will be assigned to the db attribute of this module.

bel module

complex module

class pypath.complex.AbstractComplexResource(name, ncbi_tax_id=9606, input_method=None, input_args=None, dump=None, **kwargs)[source]

A resource which provides information about molecular complexes.

class pypath.complex.CellPhoneDB(**kwargs)[source]
class pypath.complex.Compleat(input_args=None, **kwargs)[source]
class pypath.complex.ComplexAggregator(resources=None, pickle_file=None)[source]
reload()[source]

Reloads the object from the module level.

class pypath.complex.ComplexPortal(input_args=None, **kwargs)[source]
class pypath.complex.Corum(input_args=None, **kwargs)[source]
class pypath.complex.GuideToPharmacology(input_args=None, **kwargs)[source]
class pypath.complex.Havugimana(input_args=None, **kwargs)[source]
class pypath.complex.Hpmr(input_args=None, **kwargs)[source]
class pypath.complex.Humap(input_args=None, **kwargs)[source]
class pypath.complex.Pdb(input_args=None, **kwargs)[source]
class pypath.complex.Signor(input_args=None, **kwargs)[source]
pypath.complex.all_complexes()[source]

Returns a set of all complexes in the database which serves as a reference set for many methods, just like uniprot_input.all_uniprots represents the proteome.

pypath.complex.get_db(**kwargs)[source]

Retrieves the current database instance and initializes it if does not exist yet.

pypath.complex.init_db(**kwargs)[source]

Initializes or reloads the complex database. The database will be assigned to the db attribute of this module.

intera module

This module provides classes to represent and handle structural details of protein interactions i.e. residues, post-translational modifications, short motifs, domains, domain-motifs and domain-motif interactions, binding interfaces.

intercell module

class pypath.intercell.IntercellRole(source, role)
property role

Alias for field number 1

property source

Alias for field number 0

main module

class pypath.main.PyPath(ncbi_tax_id=None, copy=None, name='unnamed', cache_dir=None, outdir='results', loglevel=0, loops=False)[source]

Main network object.

Parameters
  • ncbi_tax_id (int) – Optional, 9606 (Homo sapiens) by default. NCBI Taxonomic identifier of the organism from which the data will be downloaded.

  • default_name_type (dict) – Optional, {'protein': 'uniprot', 'mirna': 'mirbase', 'drug': 'chembl', 'lncrna': 'lncrna-genesymbol'} by default. Contains the default identifier types to which the downloaded data will be converted. If others are used, user may need to provide the format definitions for the conversion tables.

  • copy (pypath.main.PyPath) – Optional, None by default. Other pypath.main.PyPath instance from which the data will be copied.

  • mysql (tuple) – Optional, (None, 'mapping') by default. Contains the MySQL parameters used by the pypath.mapping module to load the ID conversion tables.

  • chembl_mysql (tuple) – Optional, (None, 'chembl') by default. Contains the MySQL parameters used by the pypath.mapping module to load the ChEMBL ID conversion tables.

  • name (str) – Optional, 'unnamed' by default. Session or project name (custom).

  • outdir (str) – Optional, 'results' by default. Output directory where to store all output files.

  • loglevel (int) – Optional, 0 by default. Sets the level of the logger. The higher the level the more messages will be written to the log.

  • loops (bool) – Optional, False by default. Determines if self-loop edges are allowed in the graph.

Variables
  • adjlist (list) – List of [set] containing the adjacency of each node. See PyPath.update_adjlist() method for more information.

  • chembl (pypath.chembl.Chembl) – Contains the ChEMBL data. See pypath.chembl module documentation for more information.

  • chembl_mysql (tuple) – Contains the MySQL parameters used by the pypath.mapping module to load the ChEMBL ID conversion tables.

  • data (dict) – Stores the loaded interaction and attribute table. See PyPath.read_data_file() method for more information.

  • db_dict (dict) – Dictionary of dictionaries. Outer-level keys are 'nodes' and 'edges', corresponding values are [dict] whose keys are the database sources with values of type [set] containing the edge/node indexes for which that database provided some information.

  • dgraph (igraph.Graph) – Directed network graph object.

  • disclaimer (str) – Disclaimer text.

  • dlabDct (dict) – Maps the directed graph node labels [str] (keys) to their indices [int] (values).

  • dnodDct (dict) – Maps the directed graph node names [str] (keys) to their indices [int] (values).

  • dnodInd (set) – Stores the directed graph node names [str].

  • dnodLab (dict) – Maps the directed graph node indices [int] (keys) to their labels [str] (values).

  • dnodNam (dict) – Maps the directed graph node indices [int] (keys) to their names [str] (values).

  • edgeAttrs (dict) – Stores the edge attribute names [str] as keys and their corresponding types (e.g.: set, list, str, …) as values.

  • exp (pandas.DataFrame) – Stores the expression data for the nodes (if loaded).

  • exp_prod (pandas.DataFrame) – Stores the edge expression data (as the product of the normalized expression between the pair of nodes by default). For more details see pypath.main.PyPath.edges_expression().

  • exp_samples (set) – Contains a list of tissues as downloaded by ProteomicsDB. See PyPath.get_proteomicsdb() for more information.

  • failed_edges (list) – List of lists containing information about the failed edges. Each failed edge sublist contains (in this order): [tuple] with the node IDs, [str] names of nodes A and B, [int] IDs of nodes A and B and [int] IDs of the edges in both directions.

  • go (dict) – Contains the organism(s) NCBI taxonomy ID as key [int] and pypath.go.GOAnnotation object as value, which contains the GO annotations for the nodes in the graph. See pypath.go.GOAnnotation for more information.

  • graph (igraph.Graph) – Undirected network graph object.

  • gsea (pypath.gsea.GSEA) – Contains the loaded gene-sets from MSigDB. See pypath.gsea.GSEA for more information.

  • has_cats (set) – Contains the categories (e.g.: resource types) [str] loaded in the current network. Possible categories are: 'm' for PTM/enzyme-substrate resources, 'p' for pathway/activity flow resources, 'i' for undirected/PPI resources, 'r' for process description/reaction resources and 't' for transcription resources.

  • htp (dict) – Contains information about high-throughput data of the network for different thresholds [int] (keys). Values are [dict] containing the number of references ('rnum') [int], number of edges ('enum') [int], number of sources ('snum') [int] and list of PMIDs of the most common references above the given threshold ('htrefs') [set].

  • labDct (dict) – Maps the undirected graph node labels [str] (keys) to their indices [int] (values).

  • lists (dict) – Contains specific lists of nodes (values) for different categories [str] (keys). These can to be loaded from a file or a resource. Some methods include pypath.main.PyPath.receptor_list() ('rec'), pypath.main.PyPath.druggability_list() ('dgb'), pypath.main.PyPath.kinases_list() ('kin'), pypath.main.PyPath.tfs_list() ('tf'), pypath.main.PyPath.disease_genes_list() ('dis'), pypath.main.PyPath.signaling_proteins_list() ('sig'), pypath.main.PyPath.proteome_list() ('proteome') and pypath.main.PyPath.cancer_drivers_list() ('cdv').

  • loglevel (str) – The level of the logger.

  • loops (bool) – Whether if self-loop edges are allowed in the graph.

  • mapper (pypath.mapping.Mapper) – pypath.mapper.Mapper object for ID conversion and other ID-related operations across resources.

  • mutation_samples (list) – DEPRECATED

  • mysql_conf (tuple) – Contains the MySQL parameters used by the pypath.mapping module to load the ID conversion tables.

  • name (str) – Session or project name (custom).

  • ncbi_tax_id (int) – NCBI Taxonomic identifier of the organism from which the data will be downloaded.

  • negatives (dict) – Contains a list of negative interactions according to a given source (e.g.: Negatome database). See pypath.main.PyPath.apply_negative() for more information.

  • nodDct (dict) – Maps the undirected graph node names [str] (keys) to their indices [int] (values).

  • nodInd (set) – Stores the undirected graph node names [str].

  • nodLab (dict) – Maps the undirected graph node indices [int] (keys) to their labels [str] (values).

  • nodNam (dict) – Maps the directed graph node indices [int] (keys) to their names [str] (values).

  • outdir (str) – Output directory where to store all output files.

  • palette (list) – Contains a list of hexadecimal [str] of colors. Used for plotting purposes.

  • pathway_types (list) – Contains the names of all the loaded pathway resources [str].

  • pathways (dict) – Contains the list of pathways (values) for each resource (keys) loaded in the network.

  • plots (dict) – DEPRECATED (?)

  • proteomicsdb (pypath.proteomicsdb.ProteomicsDB) – Contains a pypath.proteomicsdb.ProteomicsDB instance, see the class documentation for more information.

  • raw_data (list) – Contains a list of loaded edges [dict] from a data file. See PyPath.read_data_file() for more information.

  • seq (dict) – (?)

  • session (str) – Session ID, a five random alphanumeric characters.

  • session_name (str) – Session name and ID (e.g. 'unnamed-abc12').

  • sourceNetEdges (igraph.Graph) – (?)

  • sourceNetNodes (igraph.Graph) – (?)

  • sources (list) – List contianing the names of the loaded resources [str].

  • u_pfam (dict) – Dictionary of dictionaries, contains the mapping of UniProt IDs to their respective protein families and other information.

  • uniprot_mapped (list) – DEPRECATED (?)

  • unmapped (list) – Contains the names of unmapped items [str]. See pypath.main.PyPath.map_item() for more information.

  • vertexAttrs (dict) – Stores the node attribute names [str] as keys and their corresponding types (e.g.: set, list, str, …) as values.

acsn_effects(graph=None)[source]
add_genesets(genesets)[source]
add_grouped_eattr(edge, attr, group, value)[source]

Merges (or creates) a given edge attribute as [dict] of [list] values.

Parameters
  • edge (int) – Edge index where the given attribute value is to be merged or created.

  • attr (str) – The name of the attribute. If such attribute does not exist in the network edges, it will be created on all edges (as an empty [dict], value will only be assigned to the given edge and group).

  • group (str) – The key of the attribute dictionary where value is to be assigned.

  • value (list) – The value of the attribute to be assigned/merged.

add_grouped_set_eattr(edge, attr, group, value)[source]

Merges (or creates) a given edge attribute as [dict] of [set] values.

Parameters
  • edge (int) – Edge index where the given attribute value is to be merged or created.

  • attr (str) – The name of the attribute. If such attribute does not exist in the network edges, it will be created on all edges (as an empty [dict], value will only be assigned to the given edge and group).

  • group (str) – The key of the attribute dictionary where value is to be assigned.

  • value (set) – The value of the attribute to be assigned/merged.

add_list_eattr(edge, attr, value)[source]

Merges (or creates) a given edge attribute as [list].

Parameters
  • edge (int) – Edge index where the given attribute value is to be merged or created.

  • attr (str) – The name of the attribute. If such attribute does not exist in the network edges, it will be created on all edges (as an empty [list], value will only be assigned to the given edge).

  • value (list) – The value of the attribute to be assigned/merged.

add_set_eattr(edge, attr, value)[source]

Merges (or creates) a given edge attribute as [set].

Parameters
  • edge (int) – Edge index where the given attribute value is to be merged or created.

  • attr (str) – The name of the attribute. If such attribute does not exist in the network edges, it will be created on all edges (as an empty [set], value will only be assigned to the given edge).

  • value (set) – The value of the attribute to be assigned/merged.

add_update_edge(id_a, id_b, source, is_directed, refs, stim, inh, taxon_a, taxon_b, typ, extra_attrs={}, add=False)[source]

Updates the attributes of one edge in the (undirected) network. Optionally it creates a new edge and sets the attributes, but it is not efficient as igraph needs to reindex edges after this operation, so better to create new edges in batches.

Parameters
  • id_a (str) – Name of the source node of the edge to be added/updated.

  • id_b (str) – Name of the source node of the edge to be added/updated.

  • source (set) – Or [list], contains the names [str] of the resources supporting that edge.

  • is_directed (bool) – Whether if the edge is directed or not.

  • refs (set) – Or [list], contains the instances of the references pypath.refs.Reference for that edge.

  • stim (bool) – Whether the edge is stimulatory or not.

  • inh (bool) – Whether the edge is inhibitory or note

  • taxon_a (int) – NCBI Taxonomic identifier of the source molecule.

  • taxon_b (int) – NCBI Taxonomic identifier of the target molecule.

  • typ (str) – The type of interaction (e.g.: 'PPI')

  • extra_attrs (dict) – Optional, {} by default. Contains any extra attributes for the edge to be updated.

  • add (bool) – Optional, False by default. If set to True and the edge is not in the network, it will be created. Otherwise, in such case it will raise an error message.

add_update_vertex(default_attrs, original_name, original_name_type, extra_attrs={}, add=False)[source]

Updates the attributes of one node in the (undirected) network. Optionally it creates a new node and sets the attributes, but it is not efficient as igraph needs to reindex vertices after this operation, so better to create new nodes in batches.

Parameters
  • default_attrs (dict) – The attribute dictionary of the node to be updated/created.

  • original_name (str) – Original node name (e.g.: UniProt ID).

  • original_name_type (str) – The original node name type (e.g.: for the previous example, this would be 'uniprot').

  • extra_attrs (dict) – Optional, {} by default. Contains any extra attributes for the node to be updated.

  • add (bool) – Optional, False by default. If set to True and the node is not in the network, it will be created. Otherwise, in such case it will raise an error message.

affects(identifier)[source]
all_between(id_a, id_b)[source]

Checks for any edges (in any direction) between the provided nodes.

Parameters
  • id_a (str) – The name of the source node.

  • id_b (str) – The name of the target node.

Returns

(dict) – Contains information on the directionality of the requested edge. Keys are 'ab' and 'ba', denoting the straight/reverse directionalities respectively. Values are [list] whose elements are the edge ID or None according to the existance of that edge in the following categories: undirected, straight and reverse (in that order).

all_neighbours(indices=False)[source]

Looks for the first neighbours of all the nodes and creates an attribute ('neighbours') on each one of them containing a list of their UniProt IDs.

Parameters

indices (bool) – Optional, False by default. Whether to list the neighbour nodes indices or their UniProt IDs.

apply_list(name, node_or_edge='node')[source]

Creates vertex or edge attribute based on a list.

Parameters
  • name (str) – The name of the list to be added as attribute. Must have been previously loaded with pypath.main.PyPath.load_list() or other methods. See description of pypath.main.PyPath.lists attribute for more information.

  • node_or_edge (str) – Optional, 'node' by default. Whether the attribute list is to be added to the nodes or to the edges.

apply_negative(settings)[source]

Loads a negative interaction source (e.g.: Negatome) into the current network.

Parameters

settings (pypath.input_formats.ReadSettings) – pypath.input_formats.ReadSettings instance containing the detailed definition of the input format to the downloaded file. For instance pypath.data_formats.negative['negatome']

attach_network(edge_list=False, regulator=False)[source]

Adds edges to the network from edge_list obtained from file or other input method. If none is passed, checks for such data in pypath.main.PyPath.raw_data.

Parameters
  • edge_list (str) – Optional, False by default. The source name of the list of edges to be added. This must have been loaded previously (e.g.: with pypath.main.PyPath.read_data_file()). If none is passed, loads the data directly from pypath.main.PyPath.raw_data.

  • regulator (bool) – Optional, False by default. If set to True, non previously existing nodes, will not be added (and hence, the edges involved).

basic_stats(latex=False, caption='', latex_hdr=True, fontsize=8, font='HelveticaNeueLTStd-LtCn', fname=None, header_format='%s', row_order=None, by_category=True, use_cats=['p', 'm', 'i', 'r'], urls=True, annots=False)[source]

Returns basic numbers about the network resources, e.g. edge and node counts.

latex

Return table in a LaTeX document. This can be compiled by PDFLaTeX: latex stats.tex

basic_stats_intergroup(groupA, groupB, header=None)[source]
cancer_drivers_list(intogen_file=None)[source]

Loads the list of cancer drivers. Contains information from COSMIC (needs user log in credentials) and IntOGen (if provided) and adds the attribute to the undirected network nodes.

Parameters

intogen_file (str) – Optional, None by default. Path to the data file. Can also be [function] that provides the data. In general, anything accepted by pypath.input_formats.ReadSettings.input.

cancer_gene_census_list()[source]

Loads the list of cancer driver proteins from the COSMIC Cancer Gene Census.

clean_graph(organisms_allowed=None)[source]

Removes multiple edges, unknown molecules and those from wrong taxon. Multiple edges will be combined by pypath.main.PyPath.combine_attr() method. Loops will be deleted unless the attribute pypath.main.PyPath.loops is set to True.

Parameters

organisms_allowed (set) – NCBI Taxonomy identifiers [int] of the organisms allowed in the network.

collapse_by_name(graph=None)[source]

Collapses nodes with the same name by copying and merging all edges and attributes. Operates directly on the provided network object.

Parameters

graph (igraph.Graph) – Optional, None by default. The network for which the nodes are to be collapsed. If none is provided, takes pypath.main.PyPath.graph (undirected network) by default.

combine_attr(lst, num_method=<built-in function max>)[source]

Combines multiple attributes into one. This method attempts to find out which is the best way to combine attributes.

  • If there is only one value or one of them is None, then returns the one available.

  • Lists: concatenates unique values of lists.

  • Numbers: returns the greater by default or calls num_method if given.

  • Sets: returns the union.

  • Dictionaries: calls pypath.common.merge_dicts().

  • Direction: calls their special pypath.main.Direction.merge() method.

Works on more than 2 attributes recursively.

Parameters
  • lst (list) – List of one or two attribute values.

  • num_method (function) – Optional, max by default. Method to merge numeric attributes.

communities(method, **kwargs)[source]
complex_comembership_network(graph=None, resources=None)[source]
complexes(methods=['3dcomplexes', 'havugimana', 'corum', 'complexportal', 'compleat'])[source]
complexes_in_network(csource='corum', graph=None)[source]
compounds_from_chembl(chembl_mysql=None, nodes=None, crit=None, andor='or', assay_types=['B', 'F'], relationship_types=['D', 'H'], multi_query=False, **kwargs)[source]

Loads compound data from ChEMBL to the network.

Parameters
  • chebl_mysql (tuple) – Optional, None by default. Contains the MySQL parameters used by the pypath.mapping module to loadthe ChEMBL ID conversion tables. If none is passed, takes the current instance pypath.main.PyPath.chembl_mysql attribute.

  • nodes (list) – Optional, None by default. List of node indices for which the information is to be loaded. If none is provided calls the method pypath.main.PyPath.get_sub() with the provided crit parameter.

  • crit (dict) – Optional, None by default. Defines the critical attributes to generate a subnetwork to extract the nodes in case nodes is not provided. Keys are 'edge' and 'node' and values are [dict] containing the critical attribute names [str] and values are [set] containing those attributes of the nodes/edges that are to be kept in the subnetwork. If none is provided, takes the whole network.

  • andor (str) – Optional, 'or' by default. Determines the search mode for the subnetwork generation (if nodes=None). See pypath.main.PyPath.search_attr_or() and pypath.main.PyPath.search_attr_and() for more details.

  • assay_types (list) – Optional, ['B', 'F'] by default. Types of assay to query Options are: 'A' (ADME), 'B' (Binding),``’F’`` (Functional), 'P' (Physicochemical), 'T' (Toxicity) and/or 'U' (Unassigned).

  • relationship_types (list) – Optional, ['D', 'H'] by default. Assay relationship types to query. Possible values are: 'D' (Direct protein target assigned), 'H' (Homologous protein target assigned), 'M' (Molecular target other than protein assigned), 'N' (Non-molecular target assigned), 'S' (Subcellular target assigned) and/or 'U' (Default value, target has yet to be curated).

  • multi_query (bool) – Optional, False by default. Not used.

  • **kwargs – Additional keyword arguments for pypath.chembl.Chembl.compounds_targets().

consistency()[source]
copy(other)[source]

Copies another pypath.main.PyPath instance into the current one.

Parameters

other (pypath.main.PyPath) – The instance to be copied from.

copy_edges(sources, target, move=False, graph=None)[source]

Copies edges from sources node(s) to another one (target), keeping attributes and directions.

Parameters
  • sources (list) – Contains the vertex index(es) [int] of the node(s) to be copied or moved.

  • target (int) – Vertex index where edges and attributes are to be copied to.

  • move (bool) – Optional, False by default. Whether to perform copy or move (remove or keep the source edges).

  • graph (igraph.Graph) – Optional, None by default. The network graph object from which the nodes are to be merged. If none is passed, takes the undirected network graph.

count_sol()[source]

Counts the number of nodes with zero degree.

Returns

(int) – The number of nodes with zero degree.

coverage(lst)[source]

Computes the coverage (range [0, 1]) of a list of nodes against the current (undirected) network.

Parameters

lst (set) – Can also be [list] (will be converted to [set]) or [str]. In the latter case it will retrieve the list with that name (if such list exists in pypath.main.PyPath.lists).

curation_effort(sum_by_source=False)[source]

Returns the total number of reference-interactions pairs.

@sum_by_sourcebool

If True, counts the refrence-interaction pairs by sources, and returns the sum of these values.

curation_stats(by_category=True)[source]
curation_tab(fname='curation_stats.tex', by_category=True, use_cats=['p', 'm', 'i', 'r'], header_size='normalsize', **kwargs)[source]
curators_work()[source]

Computes and prints an estimation of how many years of curation took to achieve the amount of information on the network.

databases_similarity(index='simpson')[source]

Computes the similarity across databases according to a given index metric. Computes the similarity across the loaded resources (listed in pypath.main.PyPath.sources in terms of nodes and edges separately.

Parameters

index (str) – Optional, 'simpson' by default. The type of index metric to use to compute the similarity. Options are 'simpson', 'sorensen' and 'jaccard'.

Returns

(dict) – Nested dictionaries (three levels). First-level keys are 'nodes' and 'edges', then second and third levels correspond to sources names which map to the similarity index between those sources [float].

degree_dist(prefix, g=None, group=None)[source]

Computes the degree distribution over all nodes of the network. If group is provided, also across nodes of that group(s).

Parameters
  • prefix (str) – Prefix for the file name(s).

  • g (igraph.Graph) – Optional, None by default. The network over which to compute the degree distribution. If none is passed, takes the undirected network of the current instance.

  • group (list) – Optional, None by default. Additional group(s) name(s) [str] of node attributes to subset the network and compute its degree distribution.

degree_dists()[source]

Computes the degree distribution for all the different network sources. This is, for each source, the subnetwork comprising all interactions coming from it is extracted and the degree distribution information is computed and saved into a file. A file is created for each resource under the name 'pwnet-<session_id>-degdist-<resource>''. Files are stored in pypath.main.PyPath.outdir ('results' by default).

delete_by_organism(organisms_allowed=None)[source]

Removes the proteins of all organisms which are not given in tax.

Parameters

organisms_allowed (list,set) – List of NCBI Taxonomy IDs [int] of the organism(s) that are to be kept.

delete_by_source(source, vertexAttrsToDel=None, edgeAttrsToDel=None)[source]

Deletes nodes and edges from the network according to a provided source name. Optionally can also remove the given list of attributes from nodes and/or edges.

Parameters
  • source (str) – Name of the source from which the nodes and edges have to be removed.

  • vertexAttrsToDel (list) – Optional, None by default. Contains the names [str] of the attributes to be removed from the nodes.

  • edgeAttrsToDel (list) – Optional, None by default. Contains the names [str] of the attributes to be removed from the edges.

delete_unknown(organisms_allowed=None, entity_type='protein', default_name_type=None)[source]

Removes those items which are not in the list of all default IDs of the organisms. By default, it means to remove all protein nodes not having a human UniProt ID.

Parameters
  • typ (str) – Optional, 'protein' by default. Determines the molecule type. These can be 'protein', 'drug', 'lncrna', 'mirna' or any other type defined in pypath.main.PyPath.default_name_type.

  • default_name_type (str) – Optional, None by default. The default name type for the given molecular species. If none is specified takes it from pypath.main.PyPath.default_name_type (e.g.: for 'protein', default is 'uniprot').

  • organisms_allowed (set) – NCBI Taxonomy identifiers [int] of the organisms allowed in the network.

delete_unmapped()[source]

Checks the network for any existing unmapped node and removes it.

dgenesymbol(genesymbol)[source]

Returns igraph.Vertex() object if the GeneSymbol can be found in the default directed network, otherwise None.

@genesymbolstr

GeneSymbol.

dgenesymbols(genesymbols)[source]
dgs(genesymbol)

Returns igraph.Vertex() object if the GeneSymbol can be found in the default directed network, otherwise None.

@genesymbolstr

GeneSymbol.

dgss(genesymbols)
dneighbors(identifier, mode='ALL')[source]
dp(identifier)

Same as PyPath.get_node, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

dproteins(identifiers)
dps(identifiers)
duniprot(uniprot)[source]

Same as PyPath.uniprot(), just for directed graph. Returns ``igraph.Vertex() object if the UniProt can be found in the default directed network, otherwise None.

@uniprotstr

UniProt ID.

duniprots(uniprots)[source]

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default directed graph.

dup(uniprot)

Same as PyPath.uniprot(), just for directed graph. Returns ``igraph.Vertex() object if the UniProt can be found in the default directed network, otherwise None.

@uniprotstr

UniProt ID.

dups(uniprots)

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default directed graph.

dv(identifier)

Same as PyPath.get_node, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

dvs(identifiers)
edge_exists(id_a, id_b)[source]

Returns a tuple of vertex indices if edge doesn’t exist, otherwise, the edge ID. Not sensitive to direction.

Parameters
  • id_a (str) – Name of the source node.

  • id_b (str) – Name of the target node.

Returns

(int) – The edge index, if exists such edge. Otherwise, [tuple] of [int] corresponding to the node IDs.

edge_loc(graph=None, topn=2)[source]
edge_names(e)[source]

Returns the node names of a given edge.

Parameters

e (int) – The edge index.

Returns

(tuple) – Contains the source and target node names of the edge [str].

edges_3d(methods=['dataio.get_instruct', 'dataio.get_i3d'])[source]
edges_between(group1, group2, directed=True, strict=False)[source]

Selects edges between two groups of vertex IDs. Returns set of edge IDs.

Parameters
  • group1,group2 (set) – List, set or tuple of vertex IDs.

  • directed (bool) – Only edges with direction group1 -> group2 selected.

  • strict (bool) – Edges with no direction information still selected even if directed is False.

edges_expression(func=<function PyPath.<lambda>>)[source]

Executes function func for each pairs of connected proteins in the network, for every expression dataset. By default, func simply gives the product the (normalized) expression values.

funccallable

Function to handle 2 vectors (pandas.Series() objects), should return one vector of the same length.

edges_in_comlexes(csources=['corum'], graph=None)[source]

Creates edge attributes complexes and in_complex. These are both dicts where the keys are complex resources. The values in complexes are the list of complex names both the source and the target vertices belong to. The values in_complex are boolean values whether there is at least one complex in the given resources both the source and the target vertex of the edge belong to.

@csourceslist

List of complex resources. Should be already loaded.

@graphigraph.Graph()

The graph object to do the calculations on.

edges_ptms()[source]
edgeseq_inverse(edges)[source]

Returns the sequence of all edge indexes that are not in the argument edges.

Parameters

edges (set) – Sequence of edge indices [int] that will not be returned.

Returns

(list) – Contains all edge indices [int] of the undirected network except the ones on edges argument.

entities_by_resources()[source]

Returns a dict of sets with resources as keys and sets of entity IDs as values.

export_dot(nodes=None, edges=None, directed=True, labels='genesymbol', edges_filter=<function PyPath.<lambda>>, nodes_filter=<function PyPath.<lambda>>, edge_sources=None, dir_sources=None, graph=None, return_object=False, save_dot=None, save_graphics=None, prog='neato', format=None, hide=False, font=None, auto_edges=False, hide_nodes=[], defaults={}, **kwargs)[source]

Builds a pygraphviz.AGraph() object with filtering the edges and vertices along arbitrary criteria. Returns the Agraph object if requesred, or exports the dot file, or saves the graphics.

@nodes : list List of vertex ids to be included. @edges : list List of edge ids to be included. @directed : bool Create a directed or undirected graph. @labels : str Name type to be used as id/label in the dot format. @edges_filter : function Function to filter edges, accepting igraph.Edge as argument. @nodes_filter : function Function to filter vertices, accepting igraph.Vertex as argument. @edge_sources : list Sources to be included. @dir_sources : list Direction and effect sources to be included. @graph : igraph.Graph The graph object to export. @return_object : bool Whether to return the pygraphviz.AGraph object. @save_dot : str Filename to export the dot file to. @save_graphics : str Filename to export the graphics, the extension defines the format. @prog : str The graphviz layout algorithm to use. @format : str The graphics format passed to pygraphviz.AGrapg().draw(). @hide : bool Hide filtered edges instead of omit them. @hide nodes : list Nodes to hide. List of vertex ids. @auto_edges : str Automatic, built-in style for edges. ‘DIRECTIONS’ or ‘RESOURCE_CATEGORIES’ are supported. @font : str Font to use for labels. For using more than one fonts refer to graphviz attributes with constant values or define callbacks or mapping dictionaries. @defaults : dict Default values for graphviz attributes, labeled with the entity, e.g. {‘edge_penwidth’: 0.2}. @**kwargs : constant, callable or dict Graphviz attributes, labeled by the target entity. E.g. edge_penwidth, ‘vertex_shape` or graph_label. If the value is constant, this value will be used. If the value is dict, and has _name as key, for every instance of the given entity, the value of the attribute defined by _name will be looked up in the dict, and the corresponding value will be given to the graphviz attribute. If the key _name is missing from the dict, igraph vertex and edge indices will be looked up among the keys. If the value is callable, it will be called with the current instance of the entity and the returned value will be used for the graphviz attribute. E.g. edge_arrowhead(edge) or vertex_fillcolor(vertex) Example:

import pypath from pypath import data_formats net = pypath.PyPath() net.init_network(pfile = ‘cache/default.pickle’) #net.init_network({‘arn’: data_formats.omnipath[‘arn’]}) tgf = [v.index for v in net.graph.vs if ‘TGF’ in v[‘slk_pathways’]] dot = net.export_dot(nodes = tgf, save_graphics = ‘tgf_slk.pdf’, prog = ‘dot’,

main_title = ‘TGF-beta pathway’, return_object = True, label_font = ‘HelveticaNeueLTStd Med Cn’, edge_sources = [‘SignaLink3’], dir_sources = [‘SignaLink3’], hide = True)

export_edgelist(fname, graph=None, names=['name'], edge_attributes=[], sep='\t')[source]

Write edge list to text file with attributes

Parameters
  • fname – the name of the file or a stream to read from.

  • graph – the igraph object containing the network

  • names – list with the vertex attribute names to be printed for source and target vertices

  • edge_attributes – list with the edge attribute names to be printed

  • sep – string used to separate columns

export_graphml(outfile=None, graph=None, name='main')[source]

Saves the network in a .graphml file.

Parameters
  • outfile (str) – Optional, None by default. Name/path of the output file. If none is passed, 'results/netrowk-<session_id>.graphml' is used.

  • graph (igraph.Graph) – Optional, None by default. The network object to be saved. If none is passed, takes the undirected network of the current instance.

  • name (str) – Optional, 'main' by default. The graph name for the output file.

export_ptms_tab(outfile=None)[source]

Exports a tab file containing the PTM interaction information loaded in the network.

Parameters

outfile (str) – Optional, None by default. The output file nama/path to store the PTM information. If none is provided, the default is 'results/network-<session_id>.tab'.

Returns

(list) – Contains the edge indices [int] of all PTM interactions.

export_sif(outfile=None)[source]

Exports the network interactions in .sif format (Simple Interaction Format).

Parameters

outfile (str) – Optional, None by default. Name/path of the output file. If none is passed, 'results/netrowk-<session_id>.sif' is used.

export_struct_tab(outfile=None)[source]

Exports a tab file containing the domain interaction information and PTM regulation loaded in the network.

Parameters

outfile (str) – Optional, None by default. The output file nama/path to store the PTM information. If none is provided, the default is 'results/network-<session_id>.tab'.

Returns

(list) – Contains the edge indices [int] of all PTM interactions.

export_tab(outfile=None, extra_node_attrs={}, extra_edge_attrs={}, unique_pairs=True, **kwargs)[source]

Exports the network in a tabular format. By default UniProt IDs, Gene Symbols, source databases, literature references, directionality and sign information and interaction type are included.

Parameters
  • outfile (str) – Optional, None by default. Name/path of the output file. If none is passed, 'results/netrowk-<session_id>.tab' is used.

  • extra_node_attrs (dict) – Optional, {} by default. Additional node attributes to be included in the exported table. Keys are column names used in the header while values are names of vertex attributes. In the header '_A' and '_B' suffixes will be appended to the column names so the values can be assigned to A and B side interaction partners.

  • extra_edge_attrs (dict) – Optional, {} by default. Additional edge attributes to be included in the exported table. Keys are column names used in the header while values are names of edge attributes.

  • unique_pairs (bool) – Optional, True by default. If set to True each line corresponds to a unique pair of molecules, all directionality and sign information are covered in other columns. If False, order of 'A' and 'B' IDs corresponds to the direction while sign covered in further columns.

  • kwargs (**) – Additional keyword arguments passed to pypath.export.Export.

filters(line, positive_filters=[], negative_filters=[])[source]
find_all_paths(start, end, attr=None, mode='OUT', maxlen=2, graph=None, silent=False, update_adjlist=True)[source]

Finds all paths up to length maxlen between groups of vertices. This function is needed only becaues igraph`s get_all_shortest_paths() finds only the shortest, not any path up to a defined length.

startint or list

Indices of the starting node(s) of the paths.

endint or list

Indices of the target node(s) of the paths.

attrstr

Name of the vertex attribute to identify the vertices by. Necessary if start and end are not igraph vertex ids but for example vertex names or labels.

mode‘IN’, ‘OUT’, ‘ALL’

Passed to igraph.Graph.neighbors()

maxlenint

Maximum length of paths in steps, i.e. if maxlen = 3, then the longest path may consist of 3 edges and 4 nodes.

graphigraph.Graph object

The graph you want to find paths in. self.graph by default.

find_all_paths2(graph, start, end, mode='OUT', maxlen=2, psize=100, update_adjlist=True)[source]
find_complex(search)[source]

Finds complexes by their non standard names. E.g. to find DNA polymerases you can use the search term DNA pol which will be tested against complex names in CORUM.

first_neighbours(node, indices=False)[source]

Looks for the first neighbours of a given node and returns a list of their UniProt IDs.

Parameters
  • node (str) – The UniProt ID of the node of interest. Can also be the index of such node [int].

  • indices (bool) – Optional, False by default. Whether to return the neighbour nodes indices or their UniProt IDs.

Returns

(list) – The list containing the first neighbours of the queried node.

fisher_enrichment(lst, attr, ref='proteome')[source]

Computes an enrichment analysis using Fisher’s exact test. The contingency table is built as follows: First row contains the number of nodes in the ref list (such list is considered to be loaded in pypath.main.PyPath.lists) and the number of nodes in the current (undirected) network. Second row contains the number of nodes in lst list (also considered to be already loaded) and the number of nodes in the network with a non-empty attribute attr. Uses scipy.stats.fisher_exact(), see the documentation of the corresponding package for more information.

Parameters
  • lst (str) – Name of the list in pypath.main.PyPath.lists whose number of elements will be the first element in the second row of the contingency table.

  • attr (str) – The node attribute name for which the number of nodes in the network with such attribute will be the second element of the second row of the contingency table.

  • ref (str) – Optional, 'proteome' by default. The name of the list in pypath.main.PyPath.lists whose number of elements will be the first element of the first row of the contingency table.

Returns

  • (float) – Prior odds ratio.

  • (float) – P-value or probability of obtaining a distribution as extreme as the observed, assuming that the null hypothesis is true.

geneset_enrichment(proteins, all_proteins=None, geneset_ids=None, alpha=0.05, correction_method='hommel')[source]

Does not work at the moment because cfisher module should be replaced with scipy.

genesymbol(genesymbol)[source]

Returns igraph.Vertex() object if the GeneSymbol can be found in the default undirected network, otherwise None.

@genesymbolstr

GeneSymbol.

genesymbol_labels(graph=None, remap_all=False)[source]

Creats vertex attribute 'label' and fills up with the corresponding GeneSymbols of all proteins where the GeneSymbol can be looked up based on the default name of the protein vertex (UniProt ID by default). If the attribute 'label' has been already initialized, updates this attribute or recreates if remap_all is set to True.

Parameters
  • graph (igraph.Graph) – Optional, None by default. The network graph object where the GeneSymbol labels are to be set/updated. If none is passed, takes the current network undirected graph by default (pypath.main.PyPath.graph).

  • remap_all (bool) – Optional, False by default. Whether to map anew the GeneSymbol labels if those were already initialized.

genesymbols(genesymbols)[source]
get_attrs(line, spec, lnum)[source]
get_directed(graph=False, conv_edges=False, mutual=False, ret=False)[source]

Converts a copy of graph undirected igraph.Graph object to a directed one. By default it converts the current network instance in pypath.main.PyPath.graph and places the copy of the directed instance in pypath.main.PyPath.dgraph.

Parameters
  • graph (igraph.Graph) – Optional, None by default. Undirected graph object. If none is passed, takes the current undirected network instance and saves the directed network under the attribute pypath.main.PyPath.dgraph. Otherwise, the directed graph will be returned instead.

  • conv_edges (bool) – Optional, False by default. Whether to convert undirected edges (those without explicit direction information) to an arbitrary direction edge or a pair of opposite edges. Otherwise those will be deleted.

  • mutual (bool) – Optional, False by default. If conv_edges is True, whether to convert the undirected edges to a single, arbitrary directed edge, or a pair of opposite directed edges.

  • ret (bool) – Optional, False by default. Whether to return the directed graph instance, or not. If a graph is provided, its directed version will be returned anyway.

Returns

(igraph.Graph) – If graph is passed or ret is True, returns the copy of the directed graph. otherwise returns None.

get_dirs_signs()[source]
get_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

Parameters
  • source (int,str) – Vertex index or UniProt ID or GeneSymbol or igraph.Vertex object.

  • target (int,str) – Vertex index or UniProt ID or GeneSymbol or igraph.Vertex object.

  • directed (bool) – To be passed to igraph.Graph.get_eid()

get_edges(sources, targets, directed=True)[source]

Returns a generator with all edges between source and target vertices.

Parameters
  • sources (iterable) – Source vertex IDs, names, labels, or any iterable yielding igraph.Vertex objects.

  • targets (iterable) – Target vertec IDs, names, labels, or any iterable yielding igraph.Vertex objects.

  • directed (bool) – Passed to igraph.get_eid().

get_function(fun)[source]
get_giant(replace=False, graph=None)[source]

Returns the giant component of the graph, or replaces the igraph.Graph instance with only the giant component if specified.

Parameters
  • replace (bool) – Optional, False by default. Specifies whether to replace the igraph.Graph instance. This can be either the undirected network of the current pypath.main.PyPath instance (default) or the one passed under the keyword argument graph.

  • graph (igraph.Graph) – Optional, None by default. The graph object from which the giant component is to be computed. If none is specified, takes the undirected network of the current pypath.main.PyPath instance.

Returns

(igraph.Graph) – If replace=False, returns a copy of the giant component graph.

get_go(organism=None)[source]

Returns the GOAnnotation object for the organism requested (or the default one).

get_max(attrList)[source]
get_network(crit, andor='or', graph=None)[source]

Retrieves a subnetwork according to a set of user-defined attributes. Basically applies pypath.main.PyPath.get_sub() on a given graph.

Parameters
  • crit (dict) – Defines the critical attributes to generate the subnetwork. Keys are 'edge' and 'node' and values are [dict] containing the critical attribute names [str] and values are [set] containing those attributes of the nodes/edges that are to be kept.

  • andor (str) – Optional, 'or' by default. Determines the search mode. See pypath.main.PyPath.search_attr_or() and pypath.main.PyPath.search_attr_and() for more details.

  • graph (igraph.Graph) – Optional, None by default. The graph object where to extract the subnetwork. If none is passed, takes the current network (undirected) graph (pypath.main.PyPath.graph).

Returns

(igraph.Graph) – The subgraph obtained from filtering according to the attributes defined in crit.

get_node(identifier)[source]

Returns igraph.Vertex() object if the identifier is a valid vertex index in the default undirected graph, or a UniProt ID or GeneSymbol which can be found in the default undirected network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

get_node_d(identifier)[source]

Same as PyPath.get_node, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

get_node_pair(id_a, id_b, directed=False)[source]

Retrieves the node IDs from a pair of node names.

Parameters
  • id_a (str) – Name of the source node.

  • id_b (str) – Name of the target node.

  • directed (bool) – Optional, False by default. Whether to return the node indices from the directed or undirected graph.

Returns

(tuple) – The pair of node IDs of the selected graph. If not found, returns False.

get_nodes(identifiers)[source]
get_nodes_d(identifiers)[source]
get_pathways(source)[source]
get_proteomicsdb(user, passwd, tissues=None, pickle=None)[source]
get_sub(crit, andor='or', graph=None)[source]

Selects the nodes from graph (and edges to be removed) according to a set of user-defined attributes.

Parameters
  • crit (dict) – Defines the critical attributes to generate the subnetwork. Keys are 'edge' and 'node' and values are [dict] containing the critical attribute names [str] and values are [set] containing those attributes of the nodes/edges that are to be kept.

  • andor (str) – Optional, 'or' by default. Determines the search mode. See pypath.main.PyPath.search_attr_or() and pypath.main.PyPath.search_attr_and() for more details.

  • graph (igraph.Graph) – Optional, None by default. The graph object where to extract the subnetwork. If none is passed, takes the current network (undirected) graph (pypath.main.PyPath.graph).

Returns

(dict) – Keys are 'nodes' and 'edges' whose values are [lst] of elements (as indexes [int]). Nodes are those to be kept and edges to be removed on the extracted subnetwork.

get_taxon(tax_dict, fields)[source]
go_annotate_graph(aspects=('C', 'F', 'P'))[source]

Annotates protein nodes with GO terms. In the go vertex attribute each node is annotated by a dict of sets where keys are one letter codes of GO aspects and values are sets of GO accessions.

go_enrichment(proteins=None, aspect='P', alpha=0.05, correction_method='hommel', all_proteins=None)[source]

Does not work at the moment because cfisher module should be replaced with scipy.

gs(genesymbol)

Returns igraph.Vertex() object if the GeneSymbol can be found in the default undirected network, otherwise None.

@genesymbolstr

GeneSymbol.

gs_affected_by(genesymbol)[source]
gs_affects(genesymbol)[source]
gs_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

@sourcestr

GeneSymbol

@targetstr

GeneSymbol

@directedbool

To be passed to igraph.Graph.get_eid()

gs_in_directed(genesymbol)[source]
gs_in_undirected(genesymbol)[source]
gs_inhibited_by(genesymbol)[source]
gs_inhibits(genesymbol)[source]
gs_neighborhood(genesymbols, order=1, mode='ALL')[source]
gs_neighbors(genesymbol, mode='ALL')[source]
gs_stimulated_by(genesymbol)[source]
gs_stimulates(genesymbol)[source]
gss(genesymbols)
guide2pharma()[source]
having_attr(attr, graph=None, index=True, edges=True)[source]

Checks if edges or nodes of the network have a specific attribute and returns an iterator of the indices (or the edge/node instances) of edges/nodes having such attribute.

Parameters
  • attr (str) – The name of the attribute to look for.

  • graph (igraph.Graph) – Optional, None by default. The graph object where the edge/node attribute is to be searched. If none is passed, takes the undirected network of the current instance.

  • index (bool) – Optional, True by default. Whether to return the iterator of the indices or the node/edge instances.

  • edges (bool) – Optional, True by default. Whether to look for the attribute in the networks edges or nodes instead.

Returns

(generator) – Generator object containing the edge/node indices (or instances) having the specified attribute.

having_eattr(attr, graph=None, index=True)[source]

Checks if edges of the network have a specific attribute and returns an iterator of the indices (or the edge instances) of edges having such attribute.

Parameters
  • attr (str) – The name of the attribute to look for.

  • graph (igraph.Graph) – Optional, None by default. The graph object where the edge/node attribute is to be searched. If none is passed, takes the undirected network of the current instance.

  • index (bool) – Optional, True by default. Whether to return the iterator of the indices or the node/edge instances.

Returns

(generator) – Generator object containing the edge indices (or instances) having the specified attribute.

having_ptm(index=True, graph=None)[source]

Checks if edges of the network have the 'ptm' attribute and returns an iterator of the indices (or the edge instances) of edges having such attribute.

Parameters
  • index (bool) – Optional, True by default. Whether to return the iterator of the indices or the node/edge instances.

  • graph (igraph.Graph) – Optional, None by default. The graph object where the edge/node attribute is to be searched. If none is passed, takes the undirected network of the current instance.

Returns

(generator) – Generator object containing the edge indices (or instances) having the ptm'' attribute.

having_vattr(attr, graph=None, index=True)[source]

Checks if nodes of the network have a specific attribute and returns an iterator of the indices (or the node instances) of nodes having such attribute.

Parameters
  • attr (str) – The name of the attribute to look for.

  • graph (igraph.Graph) – Optional, None by default. The graph object where the edge/node attribute is to be searched. If none is passed, takes the undirected network of the current instance.

  • index (bool) – Optional, True by default. Whether to return the iterator of the indices or the node/edge instances.

Returns

(generator) – Generator object containing the node indices (or instances) having the specified attribute.

homology_translation(target, source=None, only_swissprot=True, graph=None)

Translates the current object to another organism by orthology. Proteins without known ortholog will be deleted.

Parameters

target (int) – NCBI Taxonomy ID of the target organism. E.g. 10090 for mouse.

htp_stats()[source]
in_complex(csources=['corum'])[source]

Deprecated, will be removed.

in_directed(vertex)[source]
in_undirected(vertex)[source]
info(name)[source]

Given the name of a resource, prints out the information about that source/database. You can check the list of available resource descriptions in ypath.descriptions.descriptions.keys().

Parameters

name (str) – The name of the resource from which to print the information.

init_complex_attr(graph, name)[source]
init_edge_attr(attr)[source]

Fills all edges attribute attr with its default type (if such attribute value is None), creates [list] if in pypath.main.PyPath.edgeAttrs such attribute is registered as [list].

Parameters

attr (str) – The attribute name to be initialized on the network edges.

init_gsea(user)[source]

Initializes a pypath.gsea.GSEA object and shows the list of the collections in MSigDB.

init_network(lst=None, exclude=[], cache_files={}, pfile=False, save=False, reread=None, redownload=False, keep_raw=False, **kwargs)[source]

Loads the network data.

This is a lazy way to start the module, load data and build the high confidence, literature curated part of the signaling network.

Parameters
  • lst (dict) – Optional, None by default. Specifies the data input formats for the different resources (keys) [str]. Values are pypath.input_formats.ReadSettings instances containing the information. By default uses the set of resources of OmniPath.

  • exclude (list) – Optional, [] by default. List of resources [str] to exclude from the network.

  • cache_files (dict) – Optional, {} by default. Contains the resource name(s) [str] (keys) and the corresponding cached file name [str]. If provided (and file exists) bypasses the download of the data for that resource and uses the cache file instead.

  • pfile (str) – Optional, False by default. If any, provides the file name or path to a previously saved network pickle file. If True is passed, takes the default path from PyPath.save_network() ('cache/default_network.pickle').

  • save (bool) – Optional, False by default. If set to True, saves the loaded network to its default location ('cache/default_network.pickle').

  • reread (bool) – Optional, False by default. Specifies whether to reread the data files from the cache or omit them (similar to redownload).

  • redownload (bool) – Optional, False by default. Specifies whether to re-download the data and ignore the cache.

  • **kwargs – Not used.

init_vertex_attr(attr)[source]

Fills all vertices attribute attr with its default type (if such attribute value is None), creates [list] if in pypath.main.PyPath.vertexAttrs such attribute is registered as [list].

Parameters

attr (str) – The attribute name to be initialized on the network vertices.

intergroup_shortest_paths(groupA, groupB, random=False)[source]
intogen_cancer_drivers_list(intogen_file)[source]

Loads the list of cancer driver proteins from IntOGen data.

Parameters

intogen_file (str) – Path to the data file. Can also be [function] that provides the data. In general, anything accepted by pypath.input_formats.ReadSettings.input.

iter_interactions()[source]

Iterates over edges and yields interaction records.

jaccard_edges()[source]

Computes the Jaccard similarity index between the sets of first neighbours of all node pairs. NOTE: this method can take a while to compute, e.g.: if the network has 10K nodes, the total number of possible pairs to compute is:

\[\binom{10^4}{2} = 49995000\]
Returns

(list) – Large list of [tuple] elements containing the node pair names [str] and their corresponding first neighbours Jaccard index [float].

jaccard_meta(jedges, critical)[source]

Creates a (undirected) graph from a list of edges filtering by their Jaccard index.

Parameters
  • jedges (list) – List of [tuple] containing the edges node names [str] and their Jaccard index. Basically, the output of pypath.main.PyPath.jaccard_edges().

  • critical (float) – Specifies the threshold of the Jaccard index from above which an edge will be included in the graph.

Returns

(igraph.Graph) – The Undirected graph instance containing only the edges whose Jaccard similarity index is above the threshold specified by critical.

kegg_directions(graph=None)[source]
kegg_pathways(graph=None)[source]
kinase_stats()[source]
label(label, idx, what='vertices')[source]

Creates a boolean attribute label True for the vertex or edge IDs in the set idx.

label_by_go(label, go_terms, method='ANY')[source]

Assigns a boolean vertex attribute to nodes which tells whether the node is annotated by all or any of the GO terms.

label_edges(label, edges)[source]

Creates a boolean edge attribute label True for the edge IDs in the set edges.

label_vertices(label, vertices)[source]

Creates a boolean vertex attribute label True for the vertex IDs in the set vertices.

laudanna_directions(graph=None)[source]
laudanna_effects(graph=None)[source]
static license(self)[source]

Prints information about data licences.

static list_resources()[source]

Prints the list of resources through the standard output.

load_3dcomplexes(graph=None)[source]
load_3did_ddi()[source]
load_3did_ddi2(ddi=True, interfaces=False)[source]
load_3did_dmi()[source]
load_3did_interfaces()[source]
load_all_pathways(graph=None)[source]
load_compleat(graph=None)[source]

Loads complexes from Compleat. Loads data into vertex attribute graph.vs[‘complexes’][‘compleat’]. This resource is human only.

load_complexportal(graph=None)[source]

Loads complexes from ComplexPortal. Loads data into vertex attribute graph.vs[‘complexes’][‘complexportal’]. This resource is human only.

load_comppi(graph=None)[source]
load_corum(graph=None)[source]

Loads complexes from CORUM database. Loads data into vertex attribute graph.vs[‘complexes’][‘corum’]. This resource is human only.

load_dbptm(non_matching=False, trace=False, **kwargs)[source]
load_ddi(ddi)[source]

ddi is either a list of intera.DomainDomain objects, or a function resulting this list

load_ddis(methods=['dataio.get_3dc_ddi', 'dataio.get_domino_ddi', 'self.load_3did_ddi2'])[source]
load_depod_dmi()[source]
load_disgenet(dataset='curated', score=0.0, umls=False, full_data=False)[source]

Assigns DisGeNet disease-gene associations to the proteins in the network. Disease annotations will be added to the dis vertex attribute.

Parameters
  • score (float) – Confidence score from DisGeNet. Only associations above the score provided will be considered.

  • ulms (bool) – By default we assign a list of disease names to each protein. To use Unified Medical Language System IDs instead set this to True.

  • full_data (bool) – By default we load only disease names. Set this to True if you wish to load additional annotations like number of PubMed IDs, number of SNPs and original sources.

load_dmi(dmi)[source]

dmi is either a list of intera.DomainMotif objects, or a function resulting this list

load_dmis(methods=['self.pfam_regions', 'self.load_depod_dmi', 'self.load_dbptm', 'self.load_mimp_dmi', 'self.load_pnetworks_dmi', 'self.load_domino_dmi', 'self.load_pepcyber', 'self.load_psite_reg', 'self.load_psite_phos', 'self.load_ielm', 'self.load_phosphoelm', 'self.load_elm', 'self.load_3did_dmi'])[source]
load_domino_dmi(organism=None)[source]
load_elm()[source]
load_exocarta_attrs(load_samples=False, load_refs=False)[source]

Creates vertex attributes from ExoCarta data. Creates a boolean attribute exocarts_exosomal which tells whether a protein is in ExoCarta i.e. has been found in exosomes. Optionally creates attributes exocarta_samples and exocarta_refs listing the sample tissue and the PubMed references, respectively.

load_expression(array=False)[source]

Expression data can be loaded into vertex attributes, or into a pandas DataFrame – the latter offers faster ways to process and use these huge matrices.

load_go(organism=None)[source]

Creates a pypath.go.GOAnnotation object for one organism in the dict under go attribute.

Parameters

organism (int) – NCBI Taxonomy ID of the organism.

load_havugimana(graph=None)[source]

Loads complexes from Havugimana 2012. Loads data into vertex attribute graph.vs[‘complexes’][‘havugimana’]. This resource is human only.

load_hpa(normal=True, pathology=True, cancer=True, summarize_pathology=True, tissues=None, quality={'Approved', 'Supported'}, levels={'High': 3, 'Low': 1, 'Medium': 2, 'Not detected': 0}, graph=None, na_value=0)[source]

Loads Human Protein Atlas data into vertex attributes.

load_hprd_ptms(non_matching=False, trace=False, **kwargs)[source]
load_ielm()[source]
load_interfaces()[source]
load_li2012_ptms(non_matching=False, trace=False, **kwargs)[source]
load_ligand_receptor_network(lig_rec_resources=True, inference_from_go=True, sources=None, keep_undirected=False, keep_rec_rec=False, keep_lig_lig=False)[source]

Initializes a ligand-receptor network.

load_lmpid(method)[source]
load_matrisome_attrs(organism=None)[source]

Loads vertex attributes from MatrisomeDB 2.0. Attributes are matrisome_class, matrisome_subclass and matrisome_notes.

load_membranome_attrs()[source]

Loads attributes from Membranome, a database of single-helix transmembrane proteins.

load_mimp_dmi(non_matching=False, trace=False, **kwargs)[source]
load_mutations(attributes=None, gdsc_datadir=None, mutation_file=None)[source]

Mutations are listed in vertex attributes. Mutation() objects offers methods to identify residues and look up in Ptm(), Motif() and Domain() objects, to check if those residues are modified, or are in some short motif or domain.

load_negatives()[source]
load_old_omnipath(kinase_substrate_extra=False, remove_htp=False, htp_threshold=1, keep_directed=False, min_refs_undirected=2)[source]

Loads the OmniPath network as it was before August 2016. Furthermore it gives some more options.

load_omnipath(omnipath=None, kinase_substrate_extra=False, ligand_receptor_extra=False, pathway_extra=False, remove_htp=True, htp_threshold=1, keep_directed=True, min_refs_undirected=2, old_omnipath_resources=False, exclude=None)[source]

Loads the OmniPath network.

load_pathways(source, graph=None)[source]

Generic method to load pathway annotations from a resource. We don’t recommend calling this method but either specific methods for a single source e.g. kegg_pathways() or sirnor_pathways() or call load_all_pathways() to load all resources.

Parameters
  • source (str) – Name of the source, this need to match a method in the dict in get_pathways() method and the edge and vertex attributes with pathway annotations will be called “<source>_pathways”.

  • graph (igraph.Graph) – A graph, by default the default the graph attribute of the current instance.

load_pdb(graph=None)[source]

Loads the 3D structure information from PDB into the network. Creates the node attribute 'pdb' containing a [dict] whose keys are the PDB identifier [str] and values are [tuple] of two elements denoting the experimental method [str] (e.g.: 'X-ray', 'NMR', …) and the resolution [float] (if applicable).

Parameters

graph (igraph.Graph) – Optional, None by default. The network object for which the information is to be loaded. If none is passed, takes the undirected network of the current instance.

load_pepcyber()[source]
load_pfam(graph=None)[source]

Loads the protein family information from UniProt into the network. Creates the node attribute 'pfam' containing a [list] of protein family identifier(s) [str].

Parameters

graph (igraph.Graph) – Optional, None by default. The network object for which the information is to be loaded. If none is passed, takes the undirected network of the current instance.

load_pfam2()[source]

Loads the protein family information from Pfam into the network. Creates the node attribute 'pfam' containing a [list] of [dict] whose keys are protein family identifier(s) [str] and corresponding values are [list] of [dict] containing detailed information about the protein family(ies) for regions and isoforms of the protein.

Parameters

graph (igraph.Graph) – Optional, None by default. The network object for which the information is to be loaded. If none is passed, takes the undirected network of the current instance.

load_pfam3()[source]

Loads the protein domain information from Pfam into the network. Creates the node attribute 'doms' containing a [list] of pypath.intera.Domain instances with information about each domain of the protein (see the corresponding class documentation for more information).

load_phospho_dmi(source, trace=False, return_raw=False, **kwargs)[source]
load_phosphoelm(trace=False, **kwargs)[source]
load_pisa(graph=None)[source]
load_pnetworks_dmi(trace=False, **kwargs)[source]
load_psite_phos(trace=False, **kwargs)[source]
load_psite_reg()[source]
load_ptms()[source]
load_ptms2(input_methods=None, map_by_homology_from=[9606], homology_only_swissprot=True, ptm_homology_strict=False, nonhuman_direct_lookup=True, inputargs={}, database=None, force_load=False)[source]

This is a new method which will replace load_ptms. It uses pypath.ptm.PtmAggregator, a newly introduced module for combining enzyme-substrate data from multiple resources using homology translation on users demand.

Parameters
  • input_methods (list) – Resources to collect enzyme-substrate interactions from. E.g. [‘Signor’, ‘phosphoELM’]. By default it contains Signor, PhosphoSitePlus, HPRD, phosphoELM, dbPTM, PhosphoNetworks, Li2012 and MIMP.

  • map_by_homology_from (list) – List of NCBI Taxonomy IDs of source taxons used for homology translation of enzyme-substrate interactions. If you have a human network and you add here [10090, 10116] then mouse and rat interactions from the source databases will be translated to human.

  • homology_only_swissprot (bool) – True by default which means only SwissProt IDs are accepted at homology translateion, Trembl IDs will be dropped.

  • ptm_homology_strict (bool) – For homology translation use PhosphoSite’s PTM homology table. This guarantees that only truely homologous sites will be included. Otherwise we only check if at the same numeric offset in the homologous sequence the appropriate residue can be find.

  • nonhuman_direct_lookup (bool) – Fetch also directly nonhuman data from the resources whereever it’s available. PhosphoSite contains mouse enzyme-substrate interactions and it is possible to extract these directly beside translating the human ones to mouse.

  • inputargs (dict) – Additional arguments passed to PtmProcessor. A dict can be supplied for each resource, e.g. {‘Signor’: {…}, ‘PhosphoSite’: {…}, …}. Those not used by PtmProcessor are forwarded to the pypath.dataio methods.

  • database – A PtmAggregator object. If provided no new database will be created.

  • force_load (bool) – If True the database will be loaded with the parameters provided here; otherwise if the ptm module already has a database no new database will be created. This means the parameters specified in other arguments might have no effect.

load_resource(settings, clean=True, cache_files={}, reread=None, redownload=False, keep_raw=False)[source]

Loads the data from a single resource and attaches it to the network

Parameters
  • settings (pypath.input_formats.ReadSettings) – pypath.input_formats.ReadSettings instance containing the detailed definition of the input format to the downloaded file.

  • clean (bool) – Optional, True by default. Whether to clean the graph after importing the data or not. See pypath.main.PyPath.clean_graph() for more information.

  • cache_files (dict) – Optional, {} by default. Contains the resource name(s) [str] (keys) and the corresponding cached file name [str]. If provided (and file exists) bypasses the download of the data for that resource and uses the cache file instead.

  • reread (bool) – Optional, False by default. Specifies whether to reread the data files from the cache or omit them (similar to redownload).

  • redownload (bool) – Optional, False by default. Specifies whether to re-download the data and ignore the cache.

load_resources(lst=None, exclude=[], cache_files={}, reread=False, redownload=False, keep_raw=False)[source]

Loads multiple resources, and cleans up after. Looks up ID types, and loads all ID conversion tables from UniProt if necessary. This is much faster than loading the ID conversion and the resources one by one.

Parameters
  • lst (dict) – Optional, None by default. Specifies the data input formats for the different resources (keys) [str]. Values are pypath.input_formats.ReadSettings instances containing the information. By default uses the set of resources of OmniPath.

  • exclude (list) – Optional, [] by default. List of resources [str] to exclude from the network.

  • cache_files (dict) – Optional, {} by default. Contains the resource name(s) [str] (keys) and the corresponding cached file name [str]. If provided (and file exists) bypasses the download of the data for that resource and uses the cache file instead.

  • reread (bool) – Optional, False by default. Specifies whether to reread the data files from the cache or omit them (similar to redownload).

  • redownload (bool) – Optional, False by default. Specifies whether to re-download the data and ignore the cache.

load_signor_ptms(non_matching=False, trace=False, **kwargs)[source]
load_surfaceome_attrs()[source]

Loads vertex attributes from the In Silico Human Surfaceome. Attributes are surfaceome_score, surfaceome_class and surfaceome_subclass.

load_tfregulons(levels={'A', 'B'}, only_curated=False)[source]

Adds TF-target interactions from TF regulons to the network. TF regulons is a comprehensive resource of TF-target interactions combining multiple lines of evidences: literature curated databases, ChIP-Seq data, PWM based prediction using HOCOMOCO and JASPAR matrices and prediction from GTEx expression data by ARACNe.

For details see https://github.com/saezlab/DoRothEA.

Parameters
  • levels (set) – Optional, {'A', 'B'} by default. Confidence levels to be loaded (from A to E) [str].

  • only_curated (bool) – Optional, False by default. Whether to retrieve only the literature curated interactions or not.

load_vesiclepedia_attrs(load_samples=False, load_refs=False, load_vesicle_type=False)[source]

Creates vertex attributes from Vesiclepedia data. Creates a boolean attribute vesiclepedia_in_vesicle which tells whether a protein is in ExoCarta i.e. has been found in exosomes. Optionally creates attributes vesiclepedia_samples, vesiclepedia_refs and vesiclepedia_vesicles listing the sample tissue, the PubMed references and the vesicle types, respectively.

lookup_cache(name, cache_files, int_cache, edges_cache)[source]

Checks up the cache folder for the files of a given resource. First checks if name is on the cache_files dictionary. If so, loads either the interactions or edges otherwise. If not, checks edges_cache or int_cache otherwise.

Parameters
  • name (str) – Name of the resource (lower-case).

  • cache_files (dict) – Contains the resource name(s) [str] (keys) and the corresponding cached file name [str] (values).

  • int_cache (str) – Path to the interactions cache file of the resource.

  • edges_cache (str) – Path to the edges cache file of the resource.

Returns

  • (file) – The loaded pickle file from the cache if the file is contains the interactions. None otherwise.

  • (list) – List of mapped edges if the file contains the information from the edges. [] otherwise.

loop_edges(index=True, graph=None)[source]

Returns an iterator of the indices (or the edge instances) of the edges which represent a loop (whose source and target node are the same).

Parameters
  • index (bool) – Optional, True by default. Whether to return the iterator of the indices or the edge instances.

  • graph (igraph.Graph) – Optional, None by default. The graph object where the edge loops are to be searched. If none is passed, takes the undirected network of the current instance.

Returns

(generator) – Generator object containing the edge indices (or instances) containing loops.

map_edge(edge, expand_complexes=True)[source]

Translates the identifiers in edge representing an edge. Default name types are defined in pypath.main.PyPath.default_name_type If the mapping is unsuccessful, the item will be added to pypath.main.PyPath.unmapped list.

Parameters
  • edge (dict) – Item whose name is to be mapped to a default name type.

  • expand_complexes (bool) – Expand complexes, i.e. create links between each member of the complex and the interacting partner.

Returns

(list) – Contains the edge(s) [dict] with default mapped names.

map_item(item, expand_complexes=True)[source]

Translates the name in item representing a molecule. Default name types are defined in pypath.main.PyPath.default_name_type If the mapping is unsuccessful, the item will be added to pypath.main.PyPath.unmapped list.

Parameters
  • item (dict) – Item whose name is to be mapped to a default name type.

  • expand_complexes (bool) – Expand complexes, i.e. create links between each member of the complex and the interacting partner.

Returns

(list) – The default mapped name(s) [str] of item.

map_list(lst, single_list=False, expand_complexes=True)[source]

Maps the names from a list of edges or items (molecules).

Parameters
  • lst (list) – List of items or edge dictionaries whose names have to be mapped.

  • single_list (bool) – Optional, False by default. Determines whether the provided elements are items or edges. This is, either calls pypath.main.PyPath.map_edge() or pypath.main.PyPath.map_item() to map the item names.

  • expand_complexes (bool) – Expand complexes, i.e. create links between each member of the complex and the interacting partner.

Returns

(list) – Copy of lst with their elements’ names mapped.

mean_reference_per_interaction()[source]

Computes the mean number of references per interaction of the network.

Returns

(float) – Mean number of interactions per edge.

merge_lists(id_a, id_b, name=None, and_or='and', delete=False, func='max')[source]

Merges two lists from pypat.main.PyPath.lists.

Parameters
  • id_a (str) – Name of the first list to be merged.

  • id_b (str) – Name of the second list to be merged.

  • name (str) – Optional, None by default. Specifies a new name for the merged list. If none is passed, name will be set to id_a*_*id_b.

  • and_or (str) – Optional, 'and' by default. The logic operation perfomed in the merging: 'and' performs an union, 'or' for the intersection.

  • delete (bool) – Optional, False by default. Whether to delete the former lists or not.

  • func (str) – Optional, 'max' by default. Not used.

merge_nodes(nodes, primary=None, graph=None)[source]

Merges all attributes and edges of selected nodes and assigns them to the primary node (by default the one with lowest index).

Parameters
  • nodes (list) – List of node indexes [int] that are to be collapsed.

  • primary (int) – Optional, None by default. ID of the primary edge, if none is passed, the node with lowest index on nodes is selected.

  • graph (igraph.Graph) – Optional, None by default. The network graph object from which the nodes are to be merged. If none is passed, takes the undirected network graph.

mimp_directions(graph=None)[source]
mutated_edges(sample)[source]

Compares the mutated residues and the modified residues in PTMs. Interactions are marked as mutated if the target residue in the underlying PTM is mutated.

name_edgelist(graph=None)[source]

Returns an edge list, i.e. a list with tuples of vertex names.

names2vids(names)[source]

From a list of node names, returns their corresponding indices.

Parameters

names (list) – Contains the node names [str] for which the IDs are to be searched.

Returns

(list) – The queried node IDs [int].

negative_report(lst=True, outFile=None)[source]

Generates a report file with the negative interactions (assumed to be already loaded).

Parameters
  • lst (bool) – Optional, True by default. Whether to retun a list of edges containing the edge instances which have negative references.

  • outFile (str) – Optional, None by default. The output file name/path. If none is passed, the default is 'results/<session_id>-negatives'

Returns

(list) – If lst is set to True, returns a [list] is returned with the igraph.Edge instances that contain at least a negative reference.

neighborhood(identifiers, order=1, mode='ALL')[source]
neighbors(identifier, mode='ALL')[source]
neighbourhood_network(center, second=False)[source]
network_by_go(node_categories, network_sources=None, include=None, exclude=None, directed=False, keep_undirected=False, prefix='GO', delete=True, copy=False, vertex_attrs=True, edge_attrs=True)[source]

Creates or filters a network based on Gene Ontology annotations.

Parameters
  • node_categories (dict) –

    A dict with custom category labels as keys and expressions of GO terms as values. E.g. ``{‘extracell’: ‘GO:0005576 and not GO:0070062’,

    ’plasmamem’: ‘GO:0005887’}``.

  • network_sources (dict) – A dict with anything as keys and network input format definintions (input_formats.ReadSettings instances) as values.

  • include (list) – A list of tuples of category label pairs. By default we keep all edges connecting proteins annotated with any of the defined categories. If include is defined then only edges between category pairs defined here will be kept and all others deleted.

  • exclude (list) – Similarly to include, all edges will be kept but the ones listed in exclude will be deleted.

  • directed (bool) – If True include and exclude relations will be processed with directed (source, target) else direction won’t be considered.

  • keep_undirected (bool) – If True the interactions without direction information will be kept even if directed is True. Passed to edges_between as strict argument.

  • prefix (str) – Prefix for all vertex and edge attributes created in this operation. E.g. if you have a category label ‘bar’ and prefix is ‘foo’ then you will have a new vertex attribute ‘foo__bar’.

  • delete (bool) – Delete the vertices and edges which don’t belong to any of the categories.

  • copy (bool) – Return a copy of the entire PyPath object with the graph filtered by GO terms. By default the object is modified in place and None is returned.

  • vertex_attrs (bool) – Create vertex attributes.

  • edge_attrs (bool) – Create edge attributes.

network_filter(p=2.0)[source]

This function aims to cut the number of edges in the network, without losing nodes, to make the network less connected, less hairball-like, more usable for analysis.

network_stats(outfile=None)[source]

Calculates basic statistics for the whole network and each of sources (node and edge counts, average node degree, graph diameter, transitivity, adhesion and cohesion). Writes the results in a tab file. File is stored in pypath.main.PyPath.outdir ('results' by default).

Parameters

outfile (str) – Optional, None by default. Specifies the file name. If none is specified, this will be 'pwnet-<session_id>-stats'.

new_edges(edges)[source]

Adds new edges from any iterable of edges to the undirected graph. Basically, calls igraph.Graph.add_edges().

Parameters

edges (list) – Contains the edges that are to be added to the network.

new_nodes(nodes)[source]

Adds new nodes from any iterable of nodes to the undirected graph. Basically, calls igraph.Graph.add_vertices().

Parameters

nodes (list) – Contains the nodes that are to be added to the network.

node_exists(name)[source]

Checks if a node exists in the (undirected) network.

Parameters

name (str) – The name of the node to be searched.

Returns

(bool) – Whether the node exists in the network or not.

numof_directed_edges()[source]
numof_reference_interaction_pairs()[source]

Returns the total of unique references per interaction.

Returns

(int) – Total number of unique references per interaction.

numof_references()[source]

Counts the number of reference on the network.

Counts the total number of unique references in the edges of the network.

Returns

(int) – Number of unique references in the network.

numof_undirected_edges()[source]
orthology_translation(target, source=None, only_swissprot=True, graph=None)[source]

Translates the current object to another organism by orthology. Proteins without known ortholog will be deleted.

Parameters

target (int) – NCBI Taxonomy ID of the target organism. E.g. 10090 for mouse.

p(identifier)

Returns igraph.Vertex() object if the identifier is a valid vertex index in the default undirected graph, or a UniProt ID or GeneSymbol which can be found in the default undirected network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

pathway_attributes(graph=None)[source]
pathway_members(pathway, source)[source]

Returns an iterator with the members of a single pathway. Apart from the pathway name you need to supply its source database too.

pathway_names(source, graph=None)[source]

Returns the names of all pathways having at least one member in the current graph.

pathway_similarity(outfile=None)[source]

Computes the Sorensen’s similarity index across nodes and edges for all the available pathway sources (already loaded in the network) and saves them into table files. Files are stored in pypath.main.PyPath.outdir ('results' by default). See pypath.main.PyPath.sorensen_pathways() for more information..

Parameters

outfile (str) – Optional, None by default. Specifies the file name prefix (suffixes will be '-nodes' and '-edges'). If none is specified, this will be 'pwnet-<session_id>-sim-pw'.

pathways_table(filename='genes_pathways.list', pw_sources=['signalink', 'signor', 'netpath', 'kegg'], graph=None)[source]
pfam_regions()[source]
phosphonetworks_directions(graph=None)[source]
phosphopoint_directions(graph=None)[source]
phosphorylation_directions()[source]
phosphorylation_signs()[source]
phosphosite_directions(graph=None)[source]
prdb_tissue_expr(tissue, prdb=None, graph=None, occurrence=1, group_function=<function PyPath.<lambda>>, na_value=0.0)[source]
process_direction(line, dir_col, dir_val, dir_sep)[source]

Processes the direction information of an interaction according to a data file from a source.

Parameters
  • line (list) – The stripped and separated line from the resource data file containing the information of an interaction.

  • dir_col (int) – The column/position number where the information about the direction is to be found (on line).

  • dir_val (list) – Contains the terms [str] for which that interaction is to be considered directed.

  • dir_sep (str) – Separator for the field in line containing the direction information (if any).

Returns

(bool) – Determines whether the given interaction is directed or not.

process_directions(dirs, name, directed=None, stimulation=None, inhibition=None, graph=None, id_type=None, dirs_only=False)[source]
process_dmi(source, **kwargs)[source]

This is an universal function for loading domain-motif objects like load_phospho_dmi() for PTMs. TODO this will replace load_elm, load_ielm, etc

process_sign(signData, signDef)[source]

Processes the sign of an interaction, used when processing an input file.

Parameters
  • signData (str) – Data regarding the sign to be processed.

  • signDef (tuple) – Contains information about how to process signData. This is defined in pypath.data_formats. First element determines the position on the direction information of each line on the data file [int], second element is either [str] or [list] and defines the terms for which an interaction is defined as stimulation, third element is similar but for the inhibition and third (optional) element determines the separator for signData if contains more than one element.

Returns

  • (bool) – Determines whether the processed interaction is considered stimulation or not.

  • (bool) – Determines whether the processed interaction is considered inhibition or not.

protein(identifier)

Same as PyPath.get_node, just for the directed graph. Returns igraph.Vertex() object if the identifier is a valid vertex index in the default directed graph, or a UniProt ID or GeneSymbol which can be found in the default directed network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

protein_edge(source, target, directed=True)

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

Parameters
  • source (int,str) – Vertex index or UniProt ID or GeneSymbol or igraph.Vertex object.

  • target (int,str) – Vertex index or UniProt ID or GeneSymbol or igraph.Vertex object.

  • directed (bool) – To be passed to igraph.Graph.get_eid()

proteins(identifiers)
ps(identifiers)
random_walk_with_return(q, graph=None, c=0.5, niter=1000)[source]

Random walk with return (RWR) starting from one or more query nodes. Returns affinity (probability) vector of all nodes in the graph.

param int,list q

Vertex IDs of query nodes.

param igraph.Graph graph

An igraph.Graph object.

param float c

Probability of restart.

param int niter

Number of iterations.

>>> import igraph
>>> import pypath
>>> pa = pypath.PyPath()
>>> pa.init_network({
        'signor': pypath.data_formats.pathway['signor']
    })
>>> q = [
        pa.gs('EGFR').index,
        pa.gs('ATG4B').index
    ]
>>> rwr = pa.random_walk_with_return(q = q)
>>> palette = igraph.RainbowPalette(n = 100)
>>> colors  = [palette.get(int(round(i))) for i in rwr / max(rwr) * 99]
>>> igraph.plot(pa.graph, vertex_color = colors)
random_walk_with_return2(q, c=0.5, niter=1000)[source]

Literally does random walks. Only for testing of the other method, to be deleted later.

read_data_file(param, keep_raw=False, cache_files={}, reread=None, redownload=False)[source]

Reads interaction data file containing node and edge attributes that can be read from simple text based files and adds it to the networkdata. This function works not only with files, but with lists as well. Any other function can be written to download and preprocess data, and then give it to this function to finally attach to the network.

Parameters
  • param (pypath.input_formats.ReadSettings) – pypath.input_formats.ReadSettings instance containing the detailed definition of the input format of the file. Instead of the file name (on the pypath.input_formats.ReadSettings.input attribute) you can give a custom function name, which will be executed, and the returned data will be used instead.

  • keep_raw (bool) – Optional, False by default. Whether to keep the raw data read by this function, in order for debugging purposes, or further use.

  • cache_files (dict) – Optional, {} by default. Contains the resource name(s) [str] (keys) and the corresponding cached file name [str]. If provided (and file exists) bypasses the download of the data for that resource and uses the cache file instead.

  • reread (bool) – Optional, False by default. Specifies whether to reread the data files from the cache or omit them (similar to redownload).

  • redownload (bool) – Optional, False by default. Specifies whether to re-download the data and ignore the cache.

read_from_cache(cache_file)[source]

Reads a pickle file from the cache and returns it. It is assumed that the subfolder cache/ is on the supplied path.

Parameters

cache_file (str) – Path to the cache file that is to be loaded.

Returns

(file) – The loaded pickle file from the cache. Type will depend on the file itself (e.g.: if the pickle was saved from a dictionary, the type will be [dict]).

read_list_file(settings, **kwargs)[source]

Reads a list from a file and adds it to pypath.main.PyPath.lists.

Parameters
  • settings (pypath.input_formats.ReadList) – python.data_formats.ReadList instance specifying the settings of the file to be read. See the class documentation for more details.

  • **kwargs – Extra arguments passed to the file reading function. Such function name is outlined in the python.data_formats.ReadList.input attribute and defined in pypath.dataio.

reference_edge_ratio()[source]

Computes the average number of references per edge (as in the undirected graph).

Returns

(float) – Average number of references per edge.

reference_hist(filename=None)[source]

Generates a file containing a table with information about the network’s edges. First column contains the source node ID, followed by the target’s ID, third column contains the number of references for that interaction and finally the number of sources. Writes the results in a tab file.

Parameters

filename (str) – Optional, None by default. Specifies the file name and path to save the table. If none is passed, file will be saved in pypath.main.PyPath.outdir ('results' by default) with the name '<session_id>-refs-hist'.

reload()[source]

Reloads the object from the module level.

remove_htp(threshold=50, keep_directed=False)[source]
remove_undirected(min_refs=None)[source]
run_batch(methods, toCall=None)[source]
save_network(pfile=None)[source]

Saves the network object.

Stores the instance into a pickle (binary) file which can be reloaded in the future.

Parameters

pfile (str) – Optional, None by default. The path/file name where to store the pcikle file. If not specified, saves the network to its default location ('cache/default_network.pickle').

save_session()[source]

Save the current session state into pickle dump. The file will be saved in the current working directory as 'pypath-<session_id>.pickle'.

search_attr_and(obj, lst)[source]

Searches a given collection of attributes in a given object. Only returns True, if all elements of lst can be found in obj.

Parameters
  • obj (object) – Object (dictionary-like) where to search for elements of lst.

  • lst (dict) – Keys are the attribute names [str] and values the collection of elements to be searched in such attribute [set].

Returns

(bool) – True only if lst is empty or all of its elements are found in obj. Returns False otherwise (as soon as one element of lst is not found).

search_attr_or(obj, lst)[source]

Searches a given collection of attributes in a given object. As soon as one item is found, returns True, if none could be found then returns False.

Parameters
  • obj (object) – Object (dictionary-like) where to search for elements of lst.

  • lst (dict) – Keys are the attribute names [str] and values the collection of elements to be searched in such attribute [set].

Returns

(bool) – True if lst is empty or any of its elements is found in obj. Returns only False if cannot find anything.

second_neighbours(node, indices=False, with_first=False)[source]

Looks for the (first and) second neighbours of a given node and returns a list of their UniProt IDs.

Parameters
  • node (str) – The UniProt ID of the node of interest. Can also be the index of such node [int].

  • indices (bool) – Optional, False by default. Whether to return the neighbour nodes indices or their UniProt IDs.

  • wit_first (bool) – Optional, False by default. Whether to return also the first neighbours or not.

Returns

(list) – The list containing the second neighbours of the queried node (including the first ones if specified).

select_by_go(go_terms)[source]

Retrieves the vertex IDs of all vertices annotated with any Gene Ontology terms or their descendants, or evaluates string expression (see select_by_go_expr).

Parameters

go_terms (str,set) – A single GO term, a set of GO terms or an expression with GO terms.

select_by_go_all(go_terms)[source]

Selects the nodes annotated by all GO terms in go_terms.

Returns set of vertex IDs.

Parameters

go_terms (list) – List, set or tuple of GO terms.

select_by_go_expr(go_expr)[source]

Selects vertices based on an expression of Gene Ontology terms. Operator precedence not considered, please use parentheses.

Parameters

go_expr (str) – An expression of Gene Ontology terms. E.g. '(GO:0005576 and not GO:0070062) or GO:0005887'. Parentheses and operators and, or and not can be used.

separate()[source]

Separates the undirected network according to the different sources. Basically applies pypath.main.PyPath.get_network() for each resource.

Returns

(dict) – Keys are resource names [str] whose values are the subnetwork [igraph.Graph] containing the elements of that source.

separate_by_category()[source]

Separates the undirected network according to resource categories. Possible categories are:

  • 'm': PTM/enzyme-substrate resources.

  • 'p': Pathway/activity flow resources.

  • 'i': Undirected/PPI resources.

  • 'r': Process description/reaction resources.

  • 't': Transcription resources.

Works in the same way as pypath.main.PyPath.separate().

Returns

(dict) – Keys are category names [str] whose values are the subnetwork [igraph.Graph] containing the elements of those resources corresponding to that category.

sequences(isoforms=True, update=False)[source]
set_boolean_vattr(attr, vids, negate=False)[source]
set_categories()[source]

Sets the category attribute on the network nodes and edges ('cat') as well the edge attribute coercing the references by category ('refs_by_cat'). The possible categories are as follows:

  • 'm': PTM/enzyme-substrate resources.

  • 'p': Pathway/activity flow resources.

  • 'i': Undirected/PPI resources.

  • 'r': Process description/reaction resources.

  • 't': Transcription resources.

set_chembl_mysql(title, config_file=None)[source]

Sets the ChEMBL MySQL configuration according to the title section in config_file ini file configuration.

Parameters
  • title (str) – Section title of the ini file.

  • config_file (str) – Optional, None by default. Specifies the configuration file name if none is passed, mysql_config/defaults.mysql will be used.

set_disease_genes(dataset='curated')[source]

Creates a vertex attribute named dis with boolean values True if the protein encoded by a disease related gene according to DisGeNet.

Parameters

dataset (str) – Which dataset to use from DisGeNet. Default is curated.

set_druggability()[source]

Creates a vertex attribute dgb with value True if the protein is druggable, otherwise False.

set_drugtargets(pchembl=5.0)[source]

Creates a vertex attribute dtg with value True if the protein has at least one compound binding with affinity higher than pchembl, otherwise False.

Parameters

pchembl (float) – Pchembl threshold.

set_kinases()[source]

Creates a vertex attribute kin with value True if the protein is a kinase, otherwise False.

set_plasma_membrane_proteins_cspa()[source]

Creates a vertex attribute cspa with value True if the protein is a plasma membrane protein according to CPSA, otherwise False.

set_plasma_membrane_proteins_cspa_surfaceome(score_threshold=0.0)[source]

Creates a vertex attribute surf with value True if the protein is a plasma membrane protein according either to the Cell Surface Protein Atlas or the In Silico Human Surfaceome.

set_plasma_membrane_proteins_surfaceome(score_threshold=0.0)[source]

Creates a vertex attribute ishs with value True if the protein is a plasma membrane protein according to the In Silico Human Surfaceome, otherwise False.

set_receptors()[source]

Creates a vertex attribute rec with value True if the protein is a receptor, otherwise False.

set_signaling_proteins()[source]

Creates a vertex attribute kin with value True if the protein is a kinase, otherwise False.

set_tfs(classes=['a', 'b', 'other'])[source]
set_transcription_factors(classes=['a', 'b', 'other'])[source]

Creates a vertex attribute tf with value True if the protein is a transcription factor, otherwise False.

Parameters

classes (list) – Classes to use from TF Census. Default is [‘a’, ‘b’, ‘other’].

shortest_path_dist(graph=None, subset=None, outfile=None, **kwargs)[source]

Computes the distribution of shortest paths for each pair of nodes in the network (or between group(s) of nodes if subset is provided). NOTE: this method can take a while to compute, e.g.: if the network has 10K nodes, the total number of possible pairs to compute is:

\[\binom{10^4}{2} = 49995000\]
Parameters
  • graph (igraph.Graph) – Optional, None by default. The network object for which the shortest path distribution is to be computed. If none is passed, takes the undirected network of the current instance.

  • susbet (tuple) – Optional, None by default. Contains two lists of node indices defining two groups between which the distribution is to be computed. Can also be [list] if the shortest paths are to be searched whithin the group. If none is passed, the whole network is taken by default.

  • outfile (str) – Optional, None by default. File name/path to save the shortest path distribution. If none is passed, no file is generated.

  • **kwargs – Additional keyword arguments passed to igraph.Graph.get_shortest_paths().

Returns

(list) – The length of the shortest paths for each pair of nodes of the network (or whithin/between group/s if subset is provided).

signaling_proteins_list()[source]

Compiles a list of signaling proteins (as opposed to other proteins like metabolic enzymes, matrix proteins, etc), by looking up a few simple keywords in short description of GO terms.

signor_pathways(graph=None)[source]
similarity_groups(groups, index='simpson')[source]

Computes the similarity index across the given groups.

Parameters
  • groups (dict) – Contains the different group names [str] as keys and their corresponding elements [set].

  • index (str) – Optional, 'simpson' by default. The type of index metric to use to compute the similarity. Options are 'simpson', 'sorensen' and 'jaccard'.

Returns

(dict) – Dictionary of dictionaries containing the groups names [str] as keys (for both inner and outer dictionaries) and the index metric as inner value [float] between those groups.

small_plot(graph, **kwargs)[source]

This method is deprecated, do not use it.

sorensen_pathways(pwlist=None)[source]

Computes the Sorensen’s similarity index across nodes and edges for the given list of pathway sources (all loaded pathway sources by default).

Parameters

pwlist (list) – Optional, None by default. The list of pathway sources to be compared.

Returns

(dict) – Nested dictionaries (three levels). First-level keys are 'nodes' and 'edges', then second and third levels correspond to <source>__<patwhay> names which map to the similarity index between those pathways [float].

source_diagram(outf=None, **kwargs)[source]
source_network(font='HelveticaNeueLTStd')[source]

For EMBL branding, use Helvetica Neue Linotype Standard light

source_similarity(outfile=None)[source]

Computes the Sorensen’s similarity index across nodes and edges for all the sources available (already loaded in the network) and saves them into table files. Files are stored in pypath.main.PyPath.outdir ('results' by default). See pypath.main.PyPath.databases_similarity() for more information.

Parameters

outfile (str) – Optional, None by default. Specifies the file name prefix (suffixes will be '-nodes' and '-edges'). If none is specified, this will be 'pwnet-<session_id>-sim-src'.

source_stats()[source]
sources_hist()[source]

Counts the number of sources per interaction in the graph and saves them into a file named source_num. File is stored in pypath.main.PyPath.outdir ('results' by default).

sources_overlap(diagonal=False)[source]
sources_venn_data(fname=None, return_data=False)[source]

Computes the overlap in number of interactions for all pairs of sources.

Parameters
  • fname (str) – Optional, None by default. If provided, saves the results into a table file. File is stored in pypath.main.PyPath.outdir ('results' by default).

  • return_data (bool) – Optional, False by default. Whether to return the results as a [list].

Returns

(list) – Only if return_data is set to True. List of lists containing the counts for each pair of resources. This is, for instance, number of interactions only in resource A, number of interactions only in resource B and number of common interactions between A and B.

straight_between(id_a, id_b)[source]

Finds an edge between the provided node names.

Parameters
  • id_a (str) – The name of the source node.

  • id_b (str) – The name of the target node.

Returns

(int) – The edge ID. If the edge doesn’t exist, returns [list] with the node indices [int].

string_effects(graph=None)[source]
sum_in_complex(csources=['corum'], graph=None)[source]

Returns the total number of edges in the network falling between two members of the same complex. Returns as a dict by complex resources. Calls :py:func:pypath.pypath.Pypath.edges_in_comlexes() to do the calculations.

@csourceslist

List of complex resources. Should be already loaded.

@graphigraph.Graph()

The graph object to do the calculations on.

table_latex(fname, header, data, sum_row=True, row_order=None, latex_hdr=True, caption='', font='HelveticaNeueLTStd-LtCn', fontsize=8, sum_label='Total', sum_cols=None, header_format='%s', by_category=True)[source]
third_source_directions(graph=None, use_string_effects=False, use_laudanna_data=False)[source]

This method calls a series of methods to get additional direction & effect information from sources having no literature curated references, but giving sufficient evidence about the directionality for interactions already supported by literature evidences from other sources.

tissue_network(tissue, graph=None)[source]

Returns a network which includes the proteins expressed in certain tissue according to ProteomicsDB.

Parameters
  • tissue (str) – Tissue name as used in ProteomicsDB.

  • graph (igraph.Graph) – A graph object, by default the graph attribute of the current instance.

transcription_factors()[source]
uniprot(uniprot)[source]

Returns igraph.Vertex() object if the UniProt can be found in the default undirected network, otherwise None.

@uniprotstr

UniProt ID.

uniprots(uniprots)[source]

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default undirected graph.

uniq_node_list(lst)[source]

Returns a given list of nodes containing only the unique elements.

Parameters

lst (list) – List of nodes.

Returns

(list) – Copy of lst containing only unique nodes.

uniq_ptm(ptms)[source]
uniq_ptms()[source]
up(uniprot)

Returns igraph.Vertex() object if the UniProt can be found in the default undirected network, otherwise None.

@uniprotstr

UniProt ID.

up_affected_by(uniprot)[source]
up_affects(uniprot)[source]
up_edge(source, target, directed=True)[source]

Returns igraph.Edge object if an edge exist between the 2 proteins, otherwise None.

@sourcestr

UniProt ID

@targetstr

UniProt ID

@directedbool

To be passed to igraph.Graph.get_eid()

up_in_directed(uniprot)[source]
up_in_undirected(uniprot)[source]
up_inhibited_by(uniprot)[source]
up_inhibits(uniprot)[source]
up_neighborhood(uniprots, order=1, mode='ALL')[source]
up_neighbors(uniprot, mode='ALL')[source]
up_stimulated_by(uniprot)[source]
up_stimulates(uniprot)[source]
update_adjlist(graph=None, mode='ALL')[source]

Creates an adjacency list in a list of sets format.

update_attrs()[source]

Updates the node and edge attributes. Note that no data is donwloaded, mainly updates the dictionaries of attributes pypath.main.PyPath.edgeAttrs and pypath.main.PyPath.vertexAttrs containing the attributes names and their correspoding types and initializes such attributes in the network nodes/edges if they weren’t.

update_cats()[source]

Makes sure that the pypath.main.PyPath.has_cats attribute is an up to date [set] of all categories in the current network.

update_db_dict()[source]
update_pathway_types()[source]

Updates the pathway types attribute (pypath.main.PyPath.pathway_types) according to the loaded resources of the undirected network.

update_pathways()[source]

Makes sure that the pypath.main.PyPath.pathways attribute is an up to date [dict] of all pathways and their sources in the current network.

update_sources()[source]

Makes sure that the pypath.main.PyPath.sources attribute is an up to date [list] of all sources in the current network.

update_vertex_sources()[source]

Updates the all the vertex attributes 'sources' and 'references' according to their related edges (on the undirected graph).

update_vindex()[source]

This is deprecated.

update_vname()[source]

Fast lookup of node names and indexes, these are hold in a [list] and a [dict] as well. However, every time new nodes are added, these should be updated. This function is automatically called after all operations affecting node indices.

ups(uniprots)

Returns list of igraph.Vertex() object for a list of UniProt IDs omitting those could not be found in the default undirected graph.

v(identifier)

Returns igraph.Vertex() object if the identifier is a valid vertex index in the default undirected graph, or a UniProt ID or GeneSymbol which can be found in the default undirected network, otherwise None.

@identifierint, str

Vertex index (int) or GeneSymbol (str) or UniProt ID (str) or igraph.Vertex object.

vertex_pathways()[source]

Some resources assignes interactions some others proteins to pathways. This function copies pathway annotations from edge attributes to vertex attributes.

vsgs()[source]

Returns a generator sequence of the node names as GeneSymbols [str] (from the undirected graph).

Returns

(generator) – Sequence containing the node names as GeneSymbols [str].

vsup()[source]

Returns a generator sequence of the node names as UniProt IDs [str] (from the undirected graph).

Returns

(generator) – Sequence containing the node names as UniProt IDs [str].

wang_effects(graph=None)[source]
write_table(tbl, outfile, sep='\t', cut=None, colnames=True, rownames=True)[source]

Writes a given table to a file.

Parameters
  • tbl (dict) – Contains the data of the table. It is assumed that keys are the row names [str] and the values, well, values. Column names (if any) are defined with the key 'header'.

  • outfile (str) – File name where to save the table. The file will be saved under the object’s pypath.main.PyPath.outdir ('results' by default).

  • sep (str) – Optional, '       ' (tab) by default. Specifies the separator for the file.

  • cut (int) – Optional, None by default. Specifies the maximum number of characters for the row names.

  • colnames (bool) – Optional, True by default. Specifies whether to write the column names in the file or not.

  • rownames (bool) – Optional, True by default. Specifies whether to write the row names in the file or not.

class pypath.main.Direction(id_a, id_b)[source]

Object storing directionality information of an edge. Also includes information about the reverse direction, mode of regulation and sources of that information.

Parameters
  • id_a (str) – Name of the source node.

  • id_b (str) – Name of the target node.

Variables
  • dirs (dict) – Dictionary containing the presence of directionality of the given edge. Keys are straight, reverse and 'undirected' and their values denote the presence/absence [bool].

  • negative (dict) – Dictionary contianing the presence/absence [bool] of negative interactions for both straight and reverse directions.

  • negative_sources (dict) – Contains the resource names [str] supporting a negative interaction on straight and reverse directions.

  • nodes (list) – Contains the node names [str] sorted alphabetically (id_a, id_b).

  • positive (dict) – Dictionary contianing the presence/absence [bool] of positive interactions for both straight and reverse directions.

  • positive_sources (dict) – Contains the resource names [str] supporting a positive interaction on straight and reverse directions.

  • reverse (tuple) – Contains the node names [str] in reverse order e.g. (id_b, id_a).

  • sources (dict) – Contains the resource names [str] of a given edge for each directionality (straight, reverse and 'undirected'). Values are sets containing the names of those resources supporting such directionality.

  • straight (tuple) – Contains the node names [str] in the original order e.g. (id_a, id_b).

check_nodes(nodes)[source]

Checks if nodes is contained in the edge.

Parameters

nodes (list) – Or [tuple], contains the names of the nodes to be checked.

Returns

(bool) – True if all elements in nodes are contained in the object nodes list.

check_param(di)[source]

Checks if di is 'undirected' or contains the nodes of the current edge. Used internally to check that di is a valid key for the object attributes declared on dictionaries.

Parameters

di (tuple) – Or [str], key to be tested for validity.

Returns

(bool) – True if di is 'undirected' or a tuple

of node names contained in the edge, False otherwise.

consensus_edges()[source]

Infers the consensus edge(s) according to the number of supporting sources. This includes direction and sign.

Returns

(list) – Contains the consensus edge(s) along with the consensus sign. If there is no major directionality, both are returned. The structure is as follows: ['<source>', '<target>', '<(un)directed>', '<sign>']

get_dir(direction, sources=False)[source]

Returns the state (or sources if specified) of the given direction.

Parameters
  • direction (tuple) – Or [str] (if 'undirected'). Pair of nodes from which direction information is to be retrieved.

  • sources (bool) – Optional, 'False' by default. Specifies if the sources information of the given direction is to be retrieved instead.

Returns

(bool or set) – (if sources=True). Presence/absence of the requested direction (or the list of sources if specified). Returns None if direction is not valid.

get_dirs(src, tgt, sources=False)[source]

Returns all directions with boolean values or list of sources.

Parameters
  • src (str) – Source node.

  • tgt (str) – Target node.

  • sources (bool) – Optional, False by default. Specifies whether to return the sources attribute instead of dirs.

Returns

Contains the dirs (or sources if specified) of the given edge.

get_sign(direction, sign=None, sources=False)[source]

Retrieves the sign information of the edge in the given diretion. If specified in sign, only that sign’s information will be retrieved. If specified in sources, the sources of that information will be retrieved instead.

Parameters
  • direction (tuple) – Contains the pair of nodes specifying the directionality of the edge from which th information is to be retrieved.

  • sign (str) – Optional, None by default. Denotes whether to retrieve the 'positive' or 'negative' specific information.

  • sources (bool) – Optional, False by default. Specifies whether to return the sources instead of sign.

Returns

(list) – If sign=None containing [bool] values denoting the presence of positive and negative sign on that direction, if sources=True the [set] of sources for each of them will be returned instead. If sign is specified, returns [bool] or [set] (if sources=True) of that specific direction and sign.

has_sign(direction=None)[source]

Checks whether the edge (or for a specific direction) has any signed information (about positive/negative interactions).

Parameters

direction (tuple) – Optional, None by default. If specified, only the information of that direction is checked for sign.

Returns

(bool) – True if there exist any information on the

sign of the interaction, False otherwise.

is_directed()[source]

Checks if edge has any directionality information.

Returns

(bool) – Returns True```if any of the :py:attr:`dirs` attribute values is ``True (except 'undirected'), False otherwise.

is_inhibition(direction=None)[source]

Checks if any (or for a specific direction) interaction is inhibition (negative interaction).

Parameters

direction (tuple) – Optional, None by default. If specified, checks the negative attribute of that specific directionality. If not specified, checks both.

Returns

(bool) – True if any interaction (or the specified direction) is inhibitory (negative).

is_stimulation(direction=None)[source]

Checks if any (or for a specific direction) interaction is activation (positive interaction).

Parameters

direction (tuple) – Optional, None by default. If specified, checks the positive attribute of that specific directionality. If not specified, checks both.

Returns

(bool) – True if any interaction (or the specified direction) is activatory (positive).

majority_dir()[source]

Infers which is the major directionality of the edge by number of supporting sources.

Returns

(tuple) – Contains the pair of nodes denoting the consensus directionality. If the number of sources on both directions is equal, None is returned. If there is no directionality information, 'undirected'` will be returned.

majority_sign()[source]

Infers which is the major sign (activation/inhibition) of the edge by number of supporting sources on both directions.

Returns

(dict) – Keys are the node tuples on both directions (straight/reverse) and values can be either None if that direction has no sign information or a list of two [bool] elements corresponding to majority of positive and majority of negative support. In case both elements of the list are True, this means the number of supporting sources for both signs in that direction is equal.

merge(other)[source]

Merges current edge with another (if and only if they are the same class and contain the same nodes). Updates the attributes dirs, sources, positive, negative, positive_sources and negative_sources.

Parameters

other (pypath.main.Direction) – The new edge object to be merged with the current one.

negative_reverse()[source]

Checks if the reverse directionality is a negative interaction.

Returns

(bool) – True if there is supporting information on the reverse direction of the edge as inhibition. False otherwise.

negative_sources_reverse()[source]

Retrieves the list of sources for the reverse direction and negative sign.

Returns

(set) – Contains the names of the sources supporting the reverse directionality of the edge with a negative sign.

negative_sources_straight()[source]

Retrieves the list of sources for the straight direction and negative sign.

Returns

(set) – Contains the names of the sources supporting the straight directionality of the edge with a negative sign.

negative_straight()[source]

Checks if the straight directionality is a negative interaction.

Returns

(bool) – True if there is supporting information on the straight direction of the edge as inhibition. False otherwise.

positive_reverse()[source]

Checks if the reverse directionality is a positive interaction.

Returns

(bool) – True if there is supporting information on the reverse direction of the edge as activation. False otherwise.

positive_sources_reverse()[source]

Retrieves the list of sources for the reverse direction and positive sign.

Returns

(set) – Contains the names of the sources supporting the reverse directionality of the edge with a positive sign.

positive_sources_straight()[source]

Retrieves the list of sources for the straight direction and positive sign.

Returns

(set) – Contains the names of the sources supporting the straight directionality of the edge with a positive sign.

positive_straight()[source]

Checks if the straight directionality is a positive interaction.

Returns

(bool) – True if there is supporting information on the straight direction of the edge as activation. False otherwise.

set_dir(direction, source)[source]

Adds directionality information with the corresponding data source named. Modifies self attributes dirs and sources.

Parameters
  • direction (tuple) – Or [str], the directionality key for which the value on dirs has to be set True.

  • source (set) – Contains the name(s) of the source(s) from which such information was obtained.

set_sign(direction, sign, source)[source]

Sets sign and source information on a given direction of the edge. Modifies the attributes positive and positive_sources or negative and negative_sources depending on the sign. Direction is also updated accordingly, which also modifies the attributes dirs and sources.

Parameters
  • direction (tuple) – Pair of edge nodes specifying the direction from which the information is to be set/updated.

  • sign (str) – Specifies the type of interaction. If 'positive', is considered activation, otherwise, is assumed to be negative (inhibition).

  • source (set) – Contains the name(s) of the source(s) from which the information was obtained.

sources_reverse()[source]

Retrieves the list of sources for the reverse direction.

Returns

(set) – Contains the names of the sources supporting the reverse directionality of the edge.

sources_straight()[source]

Retrieves the list of sources for the straight direction.

Returns

(set) – Contains the names of the sources supporting the straight directionality of the edge.

sources_undirected()[source]

Retrieves the list of sources without directed information.

Returns

(set) – Contains the names of the sources supporting the edge presence but without specific directionality information.

src(undirected=False)[source]

Returns the name(s) of the source node(s) for each existing direction on the interaction.

Parameters

undirected (bool) – Optional, False by default.

Returns

(list) – Contains the name(s) for the source node(s). This means if the interaction is bidirectional, the list will contain both identifiers on the edge. If the interaction is undirected, an empty list will be returned.

src_by_source(source)[source]

Returns the name(s) of the source node(s) for each existing direction on the interaction for a specific source.

Parameters

source (str) – Name of the source according to which the information is to be retrieved.

Returns

(list) – Contains the name(s) for the source node(s) according to the specified source. This means if the interaction is bidirectional, the list will contain both identifiers on the edge. If the specified source is not found or invalid, an empty list will be returned.

tgt(undirected=False)[source]

Returns the name(s) of the target node(s) for each existing direction on the interaction.

Parameters

undirected (bool) – Optional, False by default.

Returns

(list) – Contains the name(s) for the target node(s). This means if the interaction is bidirectional, the list will contain both identifiers on the edge. If the interaction is undirected, an empty list will be returned.

tgt_by_source(source)[source]

Returns the name(s) of the target node(s) for each existing direction on the interaction for a specific source.

Parameters

source (str) – Name of the source according to which the information is to be retrieved.

Returns

(list) – Contains the name(s) for the target node(s) according to the specified source. This means if the interaction is bidirectional, the list will contain both identifiers on the edge. If the specified source is not found or invalid, an empty list will be returned.

translate(ids)[source]

Translates the node names/identifiers according to the dictionary ids.

Parameters

ids (dict) – Dictionary containing (at least) the current names of the nodes as keys and their translation as values.

Returns

(pypath.main.Direction) – The copy of current edge object with translated node names.

unset_dir(direction, source=None)[source]

Removes directionality and/or source information of the specified direction. Modifies attribute dirs and sources.

Parameters
  • direction (tuple) – Or [str] (if 'undirected') the pair of nodes specifying the directionality from which the information is to be removed.

  • source (set) – Optional, None by default. If specified, determines which specific source(s) is(are) to be removed from sources attribute in the specified direction.

unset_sign(direction, sign, source=None)[source]

Removes sign and/or source information of the specified direction and sign. Modifies attribute positive and positive_sources or negative and negative_sources (or positive_attributes/negative_sources only if source=True).

Parameters
  • direction (tuple) – The pair of nodes specifying the directionality from which the information is to be removed.

  • sign (str) – Sign from which the information is to be removed. Must be either 'positive' or 'negative'.

  • source (set) – Optional, None by default. If specified, determines which source(s) is(are) to be removed from the sources in the specified direction and sign.

which_dirs()[source]

Returns the pair(s) of nodes for which there is information about their directionality.

Returns

(list) – List of tuples containing the nodes for which their attribute dirs is True.

class pypath.main.AttrHelper(value, name=None, defaults={})[source]

Attribute helper class.

  • Initialization arguments:
    • value [dict/str]?:

    • name [str]?: Optional, None by default.

    • defaults [dict]:

  • Attributes:
    • value [dict]?:

    • name [str]?:

    • defaults [dict]:

    • id_type [type]:

  • Call arguments:
    • instance []:

    • this_directed [tuple?]: Optional, None by default.

    • thisSign []: Optional, None by default.

    • this_directedSources []: Optional, None by default.

    • thisSources []: Optional, None by default.

  • Returns:

ptm module