pypath.core.annot.CustomAnnotation§

class pypath.core.annot.CustomAnnotation(class_definitions=None, excludes=None, excludes_extra=None, build=True, pickle_file=None, annotdb_pickle_file=None, composite_resource_name=None)[source]§

Bases: Logger

__init__(class_definitions=None, excludes=None, excludes_extra=None, build=True, pickle_file=None, annotdb_pickle_file=None, composite_resource_name=None)[source]§
Parameters:
  • class_definitions (tuple) – A series of annotation class definitions, each represented by an instance of pypath.internals.annot_formats.AnnotDef. These definitions carry the attributes and instructions to populate the classes.

  • excludes (dict) – A dict with parent category names (strings) or category keys (tuples) as keys and sets if identifiers as values. The identifiers in this dict will be excluded from all the respective categories while building the database. E.g. if the UniProt ID P00533 (EGFR) is in the set under the key of adhesion it will be excluded from the category adhesion and all it’s direct children.

  • excludes_extra (dict) – Same kind of dict as excludes but it will be added to the built-in default. The built in and the provided extra sets will be merged. If you want to overwrite or modify the built-in sets provide your custom dict as excludes.

  • build (bool) – Execute the build upon instantiation or set up an empty object the build can be executed on later.

Methods

__init__([class_definitions, excludes, ...])

param tuple class_definitions:

add_class_definitions(class_definitions)

all_resources()

browse([start])

Print gene information as a table.

build()

class_to_class_connections(**kwargs)

kwargs passed to filter_interclass_network.

class_to_class_connections_directed(**kwargs)

class_to_class_connections_inhibitory(**kwargs)

class_to_class_connections_signed(**kwargs)

class_to_class_connections_stimulatory(**kwargs)

class_to_class_connections_undirected(**kwargs)

classes_by_entity(element[, labels])

Returns a set of class keys with the classes containing at least one of the elements.

complexes_by_resource()

consensus_score(name, entity)

consensus_score_normalized(name, entity)

count_inter_class_connections([...])

count_inter_class_connections_all([...])

count_inter_class_connections_directed([...])

count_inter_class_connections_inhibitory([...])

count_inter_class_connections_signed([...])

count_inter_class_connections_stimulatory([...])

count_inter_class_connections_undirected([...])

counts([entity_type, labels])

Returns a dict with number of elements in each class.

counts_by_class([entity_type, labels])

Returns a dict with number of elements in each class.

counts_by_resource([entity_types])

counts_df([groupby])

create_class(classdef[, override])

Creates a category of entities by processing a custom definition.

degree_inter_class_network([...])

degrees_of : str

degree_inter_class_network_2([degrees_of, ...])

degree_inter_class_network_directed([...])

degree_inter_class_network_directed_2(**kwargs)

degree_inter_class_network_inhibitory([...])

degree_inter_class_network_inhibitory_2(**kwargs)

degree_inter_class_network_stimulatory([...])

degree_inter_class_network_stimulatory_2(...)

degree_inter_class_network_undirected([...])

degree_inter_class_network_undirected_2(**kwargs)

difference(*args)

ensure_annotdb()

entities_by_resource([entity_types])

export(fname, **kwargs)

filter([entity_type])

Filters the annotated entities by annotation class attributes and entity_type.

filter_classes(classes, **kwargs)

Returns a list of annotation classes filtered by their attributes.

filter_df(annot_df[, entities, postfix])

filter_entity_type(cls[, entity_type])

filter_interclass_network([annot_df, ...])

Combines the annotation data frame and a network data frame.

filtered([annot_df, entities])

get_aspect(name[, parent, resource])

get_class(definition[, parent, resource, ...])

Retrieves a class by its name or definition.

get_class_label(name[, parent, resource])

get_class_scope(name[, parent, resource])

get_complexes()

get_df()

Returns the data frame of custom annotations.

get_entities([entity_types])

get_interclass_network_df(**kwargs)

If the an interclass network is already present the network and other kwargs provided not considered.

get_mirnas()

get_parent(name[, parent, resource])

get_parents(name[, parent, resource])

As names should be unique for resources, a combination of a name and resource determines the parent category.

get_proteins()

get_resource(name[, parent])

For a category name and its parent returns a single resource name.

get_resources(name[, parent])

Returns a set with the names of all resources defining a category with the given name and parent.

get_source(name[, parent, resource])

inter_class_network([annot_args_source, ...])

inter_class_network_directed([...])

inter_class_network_inhibitory([...])

inter_class_network_signed([...])

inter_class_network_stimulatory([...])

inter_class_network_undirected([...])

intersection(*args)

isdisjoint(*args)

iter_classes(**kwargs)

labels(name[, parent, resource, entity_type])

Same as select but returns a list of labels (more human readable).

load()

load_from_pickle(pickle_file)

make_df([all_annotations, full_name])

Creates a pandas.DataFrame where each record assigns a molecular entity to an annotation category.

mirnas_by_resource()

network_df([annot_df, network, combined_df, ...])

Combines the annotation data frame and a network data frame.

numof_classes()

numof_complex_records()

numof_complexes()

numof_entities([entity_types])

numof_mirna_records()

numof_mirnas()

numof_protein_records()

numof_proteins()

numof_records([entity_types])

populate_classes([update])

Creates a classification of proteins according to the custom annotation definitions.

populate_scores()

Creates the consensus score dictionaries based on the number of resources annotating an entity for each composite category.

post_load()

pre_build()

process_annot(classdef)

Processes an annotation definition and returns a set of identifiers.

proteins_by_resource()

quality_check_table([path, fmt, ...])

Exports a table in tsv format for quality check and browsing purposes.

register_network(network)

Sets network as the default network dataset for the instance.

reload()

Reloads the object from the module level.

resources_in_category(key)

Returns a list of resources contributing to the definition of a category.

save_to_pickle(pickle_file)

select(definition[, parent, resource, ...])

Retrieves a class by its name or definition.

set_interclass_network_df(**kwargs)

Creates a data frame of the whole inter-class network and keeps it assigned to the instance in order to make subsequent queries faster.

sets(*args)

show(name[, parent, resource])

Same as select but prints a table to the console with basic information from the UniProt datasheets.

symmetric_difference(*args)

union(*args)

unset_interclass_network_df()

update_excludes()

update_parents()

Creates a dict :py:attr:children with parent class names as keys and sets of children class keys as values.

browse(start: int = 0, **kwargs)[source]§

Print gene information as a table.

Presents information about annotation classes as ascii tables printed in the terminal. If one class provided, prints one table. If multiple classes provided, prints a table for each of them one by one proceeding to the next one once you hit return. If no classes provided goes through all classes.

kwargs passed to pypath.utils.uniprot.info.

class_to_class_connections(**kwargs)[source]§

kwargs passed to filter_interclass_network.

classes_by_entity(element, labels=False)[source]§

Returns a set of class keys with the classes containing at least one of the elements.

Parameters:
  • element (str,set) – One or more element (entity) to search for in the classes.

  • labels (bool) – Return labels instead of keys.

counts(entity_type='protein', labels=True, **kwargs)[source]§

Returns a dict with number of elements in each class.

Parameters:

labels (bool) – Use keys or labels as keys in the returned dict.

All other arguments passed to iter_classes.

counts_by_class(entity_type='protein', labels=True, **kwargs)§

Returns a dict with number of elements in each class.

Parameters:

labels (bool) – Use keys or labels as keys in the returned dict.

All other arguments passed to iter_classes.

create_class(classdef, override=False)[source]§

Creates a category of entities by processing a custom definition.

degree_inter_class_network(annot_args_source=None, annot_args_target=None, degrees_of='target', **kwargs)[source]§
degrees_ofstr

Either source or target. Count the degrees for the source or the target class.

filter(entity_type=None, **kwargs)[source]§

Filters the annotated entities by annotation class attributes and entity_type. kwargs passed to filter_classes.

static filter_classes(classes, **kwargs)[source]§

Returns a list of annotation classes filtered by their attributes. kwargs contains attributes and values.

filter_interclass_network(annot_df=None, network=None, combined_df=None, network_args=None, annot_args=None, annot_args_source=None, annot_args_target=None, entities=None, entities_source=None, entities_target=None, only_directed=False, only_undirected=False, only_signed=None, only_effect=None, only_proteins=False, swap_undirected=True, undirected_orientation=None, entities_or=False)§

Combines the annotation data frame and a network data frame. Creates a pandas.DataFrame where each record is an interaction between a pair of molecular enitities labeled by their annotations.

networkpypath.network.Network,pandas.DataFrame

A pypath.network.Network object or a data frame with network data.

combined_dfpandas.DataFrame

Optional, a network data frame already combined with annotations for filtering only.

resourcesset,None

Use only these network resources.

entitiesset,None

Limit the network only to these molecular entities.

entities_sourceset,None

Limit the source side of network connections only to these molecular entities.

entities_targetset,None

Limit the target side of network connections only to these molecular entities.

annot_argsdict,None

Parameters for filtering annotation classes; note, the defaults might include some filtering, provide an empty dict if you want no filtering at all; however this might result in huge data frame and consequently memory issues. Passed to the filtered method.

annot_args_sourcedict,None

Same as annot_args but only for the source side of the network connections. These override annot_args but all the criteria not defined here will be applied from annot_args.

annot_args_targetdict,None

Same as annot_args but only for the target side of the network connections. These override annot_args but all the criteria not defined here will be applied from annot_args.

only_directedbool

Use only the directed interactions.

only_undirectedbool

Use only the undirected interactions. Specifically for retrieving and counting the interactions without direction information.

only_effectint,None

Use only the interactions with this effect. Either -1 or 1.

only_proteinsbool

Use only the interactions where each of the partners is a protein (i.e. not complex, miRNA, small molecule or other kind of entity).

swap_undirectedbool

Convert undirected interactions to a pair of mutual interactions.

undirected_orientationstr,None

Ignore the direction at all interactions and make sure all of them have a uniform orientation. If id, all interactions will be oriented by the identifiers of the partenrs; if category, the interactions will be oriented by the categories of the partners.

get_class(definition, parent=None, resource=None, entity_type=None, **kwargs)§

Retrieves a class by its name or definition. The definition can be a class name (string) or a set of entities, or an AnnotDef object defining the contents based on original resources or an AnnotOp which defines the contents as an operation over other definitions.

get_df()[source]§

Returns the data frame of custom annotations. If it does not exist yet builds the data frame.

get_interclass_network_df(**kwargs)[source]§

If the an interclass network is already present the network and other kwargs provided not considered. Otherwise these are passed to network_df.

get_parents(name, parent=None, resource=None)[source]§

As names should be unique for resources, a combination of a name and resource determines the parent category. This method looks up the parent for a pair of name and resource.

get_resource(name, parent=None)[source]§

For a category name and its parent returns a single resource name. If a category belonging to the composite database matches the name and the parent the name of the composite database will be returned, otherwise the resource name first in alphabetic order.

get_resources(name, parent=None)[source]§

Returns a set with the names of all resources defining a category with the given name and parent.

labels(name, parent=None, resource=None, entity_type=None)[source]§

Same as select but returns a list of labels (more human readable).

make_df(all_annotations=False, full_name=False)[source]§

Creates a pandas.DataFrame where each record assigns a molecular entity to an annotation category. The data frame will be assigned to the df attribute.

network_df(annot_df=None, network=None, combined_df=None, network_args=None, annot_args=None, annot_args_source=None, annot_args_target=None, entities=None, entities_source=None, entities_target=None, only_directed=False, only_undirected=False, only_signed=None, only_effect=None, only_proteins=False, swap_undirected=True, undirected_orientation=None, entities_or=False)[source]§

Combines the annotation data frame and a network data frame. Creates a pandas.DataFrame where each record is an interaction between a pair of molecular enitities labeled by their annotations.

networkpypath.network.Network,pandas.DataFrame

A pypath.network.Network object or a data frame with network data.

combined_dfpandas.DataFrame

Optional, a network data frame already combined with annotations for filtering only.

resourcesset,None

Use only these network resources.

entitiesset,None

Limit the network only to these molecular entities.

entities_sourceset,None

Limit the source side of network connections only to these molecular entities.

entities_targetset,None

Limit the target side of network connections only to these molecular entities.

annot_argsdict,None

Parameters for filtering annotation classes; note, the defaults might include some filtering, provide an empty dict if you want no filtering at all; however this might result in huge data frame and consequently memory issues. Passed to the filtered method.

annot_args_sourcedict,None

Same as annot_args but only for the source side of the network connections. These override annot_args but all the criteria not defined here will be applied from annot_args.

annot_args_targetdict,None

Same as annot_args but only for the target side of the network connections. These override annot_args but all the criteria not defined here will be applied from annot_args.

only_directedbool

Use only the directed interactions.

only_undirectedbool

Use only the undirected interactions. Specifically for retrieving and counting the interactions without direction information.

only_effectint,None

Use only the interactions with this effect. Either -1 or 1.

only_proteinsbool

Use only the interactions where each of the partners is a protein (i.e. not complex, miRNA, small molecule or other kind of entity).

swap_undirectedbool

Convert undirected interactions to a pair of mutual interactions.

undirected_orientationstr,None

Ignore the direction at all interactions and make sure all of them have a uniform orientation. If id, all interactions will be oriented by the identifiers of the partenrs; if category, the interactions will be oriented by the categories of the partners.

populate_classes(update=False)[source]§

Creates a classification of proteins according to the custom annotation definitions.

populate_scores()[source]§

Creates the consensus score dictionaries based on the number of resources annotating an entity for each composite category.

process_annot(classdef)[source]§

Processes an annotation definition and returns a set of identifiers.

quality_check_table(path=None, fmt='tsv', only_swissprot=True, top=None, **kwargs)[source]§

Exports a table in tsv format for quality check and browsing purposes. Each protein represented in one row of this table with basic data from UniProt and the list of annotation categories from this database.

Parameters:
  • path (str) – Path for the exported file.

  • fmt (str) – Format: either tsv or latex.

register_network(network)[source]§

Sets network as the default network dataset for the instance. All methods afterwards will use this network. Also it discards the interclass network data frame if it present to make sure future queries will address the network registered here.

reload()[source]§

Reloads the object from the module level.

resources_in_category(key)[source]§

Returns a list of resources contributing to the definition of a category.

select(definition, parent=None, resource=None, entity_type=None, **kwargs)[source]§

Retrieves a class by its name or definition. The definition can be a class name (string) or a set of entities, or an AnnotDef object defining the contents based on original resources or an AnnotOp which defines the contents as an operation over other definitions.

set_interclass_network_df(**kwargs)[source]§

Creates a data frame of the whole inter-class network and keeps it assigned to the instance in order to make subsequent queries faster.

show(name, parent=None, resource=None, **kwargs)[source]§

Same as select but prints a table to the console with basic information from the UniProt datasheets.

update_parents()[source]§

Creates a dict :py:attr:children with parent class names as keys and sets of children class keys as values. Also a dict :py:attr:parents with children class keys as keys and parent class keys as values.