pypath.core.annot.CustomAnnotation§
- class pypath.core.annot.CustomAnnotation(class_definitions=None, excludes=None, excludes_extra=None, build=True, pickle_file=None, annotdb_pickle_file=None, composite_resource_name=None)[source]§
Bases:
Logger
- __init__(class_definitions=None, excludes=None, excludes_extra=None, build=True, pickle_file=None, annotdb_pickle_file=None, composite_resource_name=None)[source]§
- Parameters:
class_definitions (tuple) – A series of annotation class definitions, each represented by an instance of
pypath.internals.annot_formats.AnnotDef
. These definitions carry the attributes and instructions to populate the classes.excludes (dict) – A dict with parent category names (strings) or category keys (tuples) as keys and sets if identifiers as values. The identifiers in this dict will be excluded from all the respective categories while building the database. E.g. if the UniProt ID P00533 (EGFR) is in the set under the key of adhesion it will be excluded from the category adhesion and all it’s direct children.
excludes_extra (dict) – Same kind of dict as excludes but it will be added to the built-in default. The built in and the provided extra sets will be merged. If you want to overwrite or modify the built-in sets provide your custom dict as excludes.
build (bool) – Execute the build upon instantiation or set up an empty object the build can be executed on later.
Methods
__init__
([class_definitions, excludes, ...])- param tuple class_definitions:
add_class_definitions
(class_definitions)all_resources
()browse
([start])Print gene information as a table.
build
()class_to_class_connections
(**kwargs)kwargs
passed tofilter_interclass_network
.class_to_class_connections_directed
(**kwargs)class_to_class_connections_inhibitory
(**kwargs)class_to_class_connections_signed
(**kwargs)class_to_class_connections_stimulatory
(**kwargs)class_to_class_connections_undirected
(**kwargs)classes_by_entity
(element[, labels])Returns a set of class keys with the classes containing at least one of the elements.
complexes_by_resource
()consensus_score
(name, entity)consensus_score_normalized
(name, entity)count_inter_class_connections
([...])count_inter_class_connections_all
([...])count_inter_class_connections_directed
([...])count_inter_class_connections_inhibitory
([...])count_inter_class_connections_signed
([...])count_inter_class_connections_stimulatory
([...])count_inter_class_connections_undirected
([...])counts
([entity_type, labels])Returns a dict with number of elements in each class.
counts_by_class
([entity_type, labels])Returns a dict with number of elements in each class.
counts_by_resource
([entity_types])counts_df
([groupby])create_class
(classdef[, override])Creates a category of entities by processing a custom definition.
degree_inter_class_network
([...])degrees_of : str
degree_inter_class_network_2
([degrees_of, ...])degree_inter_class_network_directed
([...])degree_inter_class_network_directed_2
(**kwargs)degree_inter_class_network_inhibitory
([...])degree_inter_class_network_inhibitory_2
(**kwargs)degree_inter_class_network_stimulatory
([...])degree_inter_class_network_stimulatory_2
(...)degree_inter_class_network_undirected
([...])degree_inter_class_network_undirected_2
(**kwargs)difference
(*args)ensure_annotdb
()entities_by_resource
([entity_types])export
(fname, **kwargs)filter
([entity_type])Filters the annotated entities by annotation class attributes and
entity_type
.filter_classes
(classes, **kwargs)Returns a list of annotation classes filtered by their attributes.
filter_df
(annot_df[, entities, postfix])filter_entity_type
(cls[, entity_type])filter_interclass_network
([annot_df, ...])Combines the annotation data frame and a network data frame.
filtered
([annot_df, entities])get_aspect
(name[, parent, resource])get_class
(definition[, parent, resource, ...])Retrieves a class by its name or definition.
get_class_label
(name[, parent, resource])get_class_scope
(name[, parent, resource])get_complexes
()get_df
()Returns the data frame of custom annotations.
get_entities
([entity_types])get_interclass_network_df
(**kwargs)If the an interclass network is already present the
network
and otherkwargs
provided not considered.get_mirnas
()get_parent
(name[, parent, resource])get_parents
(name[, parent, resource])As names should be unique for resources, a combination of a name and resource determines the parent category.
get_proteins
()get_resource
(name[, parent])For a category name and its parent returns a single resource name.
get_resources
(name[, parent])Returns a set with the names of all resources defining a category with the given name and parent.
get_source
(name[, parent, resource])inter_class_network
([annot_args_source, ...])inter_class_network_directed
([...])inter_class_network_inhibitory
([...])inter_class_network_signed
([...])inter_class_network_stimulatory
([...])inter_class_network_undirected
([...])intersection
(*args)isdisjoint
(*args)iter_classes
(**kwargs)labels
(name[, parent, resource, entity_type])Same as
select
but returns a list of labels (more human readable).load
()load_from_pickle
(pickle_file)make_df
([all_annotations, full_name])Creates a
pandas.DataFrame
where each record assigns a molecular entity to an annotation category.mirnas_by_resource
()network_df
([annot_df, network, combined_df, ...])Combines the annotation data frame and a network data frame.
numof_classes
()numof_complex_records
()numof_complexes
()numof_entities
([entity_types])numof_mirna_records
()numof_mirnas
()numof_protein_records
()numof_proteins
()numof_records
([entity_types])populate_classes
([update])Creates a classification of proteins according to the custom annotation definitions.
Creates the consensus score dictionaries based on the number of resources annotating an entity for each composite category.
post_load
()pre_build
()process_annot
(classdef)Processes an annotation definition and returns a set of identifiers.
proteins_by_resource
()quality_check_table
([path, fmt, ...])Exports a table in tsv format for quality check and browsing purposes.
register_network
(network)Sets
network
as the default network dataset for the instance.reload
()Reloads the object from the module level.
Returns a list of resources contributing to the definition of a category.
save_to_pickle
(pickle_file)select
(definition[, parent, resource, ...])Retrieves a class by its name or definition.
set_interclass_network_df
(**kwargs)Creates a data frame of the whole inter-class network and keeps it assigned to the instance in order to make subsequent queries faster.
sets
(*args)show
(name[, parent, resource])Same as
select
but prints a table to the console with basic information from the UniProt datasheets.symmetric_difference
(*args)union
(*args)unset_interclass_network_df
()update_excludes
()Creates a dict :py:attr:
children
with parent class names as keys and sets of children class keys as values.- browse(start: int = 0, **kwargs)[source]§
Print gene information as a table.
Presents information about annotation classes as ascii tables printed in the terminal. If one class provided, prints one table. If multiple classes provided, prints a table for each of them one by one proceeding to the next one once you hit return. If no classes provided goes through all classes.
kwargs
passed topypath.utils.uniprot.info
.
- classes_by_entity(element, labels=False)[source]§
Returns a set of class keys with the classes containing at least one of the elements.
- Parameters:
element (str,set) – One or more element (entity) to search for in the classes.
labels (bool) – Return labels instead of keys.
- counts(entity_type='protein', labels=True, **kwargs)[source]§
Returns a dict with number of elements in each class.
- Parameters:
labels (bool) – Use keys or labels as keys in the returned dict.
All other arguments passed to
iter_classes
.
- counts_by_class(entity_type='protein', labels=True, **kwargs)§
Returns a dict with number of elements in each class.
- Parameters:
labels (bool) – Use keys or labels as keys in the returned dict.
All other arguments passed to
iter_classes
.
- create_class(classdef, override=False)[source]§
Creates a category of entities by processing a custom definition.
- degree_inter_class_network(annot_args_source=None, annot_args_target=None, degrees_of='target', **kwargs)[source]§
- degrees_ofstr
Either source or target. Count the degrees for the source or the target class.
- filter(entity_type=None, **kwargs)[source]§
Filters the annotated entities by annotation class attributes and
entity_type
.kwargs
passed tofilter_classes
.
- static filter_classes(classes, **kwargs)[source]§
Returns a list of annotation classes filtered by their attributes.
kwargs
contains attributes and values.
- filter_interclass_network(annot_df=None, network=None, combined_df=None, network_args=None, annot_args=None, annot_args_source=None, annot_args_target=None, entities=None, entities_source=None, entities_target=None, only_directed=False, only_undirected=False, only_signed=None, only_effect=None, only_proteins=False, swap_undirected=True, undirected_orientation=None, entities_or=False)§
Combines the annotation data frame and a network data frame. Creates a
pandas.DataFrame
where each record is an interaction between a pair of molecular enitities labeled by their annotations.- networkpypath.network.Network,pandas.DataFrame
A
pypath.network.Network
object or a data frame with network data.- combined_dfpandas.DataFrame
Optional, a network data frame already combined with annotations for filtering only.
- resourcesset,None
Use only these network resources.
- entitiesset,None
Limit the network only to these molecular entities.
- entities_sourceset,None
Limit the source side of network connections only to these molecular entities.
- entities_targetset,None
Limit the target side of network connections only to these molecular entities.
- annot_argsdict,None
Parameters for filtering annotation classes; note, the defaults might include some filtering, provide an empty dict if you want no filtering at all; however this might result in huge data frame and consequently memory issues. Passed to the
filtered
method.- annot_args_sourcedict,None
Same as
annot_args
but only for the source side of the network connections. These overrideannot_args
but all the criteria not defined here will be applied fromannot_args
.- annot_args_targetdict,None
Same as
annot_args
but only for the target side of the network connections. These overrideannot_args
but all the criteria not defined here will be applied fromannot_args
.- only_directedbool
Use only the directed interactions.
- only_undirectedbool
Use only the undirected interactions. Specifically for retrieving and counting the interactions without direction information.
- only_effectint,None
Use only the interactions with this effect. Either -1 or 1.
- only_proteinsbool
Use only the interactions where each of the partners is a protein (i.e. not complex, miRNA, small molecule or other kind of entity).
- swap_undirectedbool
Convert undirected interactions to a pair of mutual interactions.
- undirected_orientationstr,None
Ignore the direction at all interactions and make sure all of them have a uniform orientation. If id, all interactions will be oriented by the identifiers of the partenrs; if category, the interactions will be oriented by the categories of the partners.
- get_class(definition, parent=None, resource=None, entity_type=None, **kwargs)§
Retrieves a class by its name or definition. The definition can be a class name (string) or a set of entities, or an AnnotDef object defining the contents based on original resources or an AnnotOp which defines the contents as an operation over other definitions.
- get_df()[source]§
Returns the data frame of custom annotations. If it does not exist yet builds the data frame.
- get_interclass_network_df(**kwargs)[source]§
If the an interclass network is already present the
network
and otherkwargs
provided not considered. Otherwise these are passed tonetwork_df
.
- get_parents(name, parent=None, resource=None)[source]§
As names should be unique for resources, a combination of a name and resource determines the parent category. This method looks up the parent for a pair of name and resource.
- get_resource(name, parent=None)[source]§
For a category name and its parent returns a single resource name. If a category belonging to the composite database matches the name and the parent the name of the composite database will be returned, otherwise the resource name first in alphabetic order.
- get_resources(name, parent=None)[source]§
Returns a set with the names of all resources defining a category with the given name and parent.
- labels(name, parent=None, resource=None, entity_type=None)[source]§
Same as
select
but returns a list of labels (more human readable).
- make_df(all_annotations=False, full_name=False)[source]§
Creates a
pandas.DataFrame
where each record assigns a molecular entity to an annotation category. The data frame will be assigned to thedf
attribute.
- network_df(annot_df=None, network=None, combined_df=None, network_args=None, annot_args=None, annot_args_source=None, annot_args_target=None, entities=None, entities_source=None, entities_target=None, only_directed=False, only_undirected=False, only_signed=None, only_effect=None, only_proteins=False, swap_undirected=True, undirected_orientation=None, entities_or=False)[source]§
Combines the annotation data frame and a network data frame. Creates a
pandas.DataFrame
where each record is an interaction between a pair of molecular enitities labeled by their annotations.- networkpypath.network.Network,pandas.DataFrame
A
pypath.network.Network
object or a data frame with network data.- combined_dfpandas.DataFrame
Optional, a network data frame already combined with annotations for filtering only.
- resourcesset,None
Use only these network resources.
- entitiesset,None
Limit the network only to these molecular entities.
- entities_sourceset,None
Limit the source side of network connections only to these molecular entities.
- entities_targetset,None
Limit the target side of network connections only to these molecular entities.
- annot_argsdict,None
Parameters for filtering annotation classes; note, the defaults might include some filtering, provide an empty dict if you want no filtering at all; however this might result in huge data frame and consequently memory issues. Passed to the
filtered
method.- annot_args_sourcedict,None
Same as
annot_args
but only for the source side of the network connections. These overrideannot_args
but all the criteria not defined here will be applied fromannot_args
.- annot_args_targetdict,None
Same as
annot_args
but only for the target side of the network connections. These overrideannot_args
but all the criteria not defined here will be applied fromannot_args
.- only_directedbool
Use only the directed interactions.
- only_undirectedbool
Use only the undirected interactions. Specifically for retrieving and counting the interactions without direction information.
- only_effectint,None
Use only the interactions with this effect. Either -1 or 1.
- only_proteinsbool
Use only the interactions where each of the partners is a protein (i.e. not complex, miRNA, small molecule or other kind of entity).
- swap_undirectedbool
Convert undirected interactions to a pair of mutual interactions.
- undirected_orientationstr,None
Ignore the direction at all interactions and make sure all of them have a uniform orientation. If id, all interactions will be oriented by the identifiers of the partenrs; if category, the interactions will be oriented by the categories of the partners.
- populate_classes(update=False)[source]§
Creates a classification of proteins according to the custom annotation definitions.
- populate_scores()[source]§
Creates the consensus score dictionaries based on the number of resources annotating an entity for each composite category.
- process_annot(classdef)[source]§
Processes an annotation definition and returns a set of identifiers.
- quality_check_table(path=None, fmt='tsv', only_swissprot=True, top=None, **kwargs)[source]§
Exports a table in tsv format for quality check and browsing purposes. Each protein represented in one row of this table with basic data from UniProt and the list of annotation categories from this database.
- Parameters:
path (str) – Path for the exported file.
fmt (str) – Format: either tsv or latex.
- register_network(network)[source]§
Sets
network
as the default network dataset for the instance. All methods afterwards will use this network. Also it discards the interclass network data frame if it present to make sure future queries will address the network registered here.
- resources_in_category(key)[source]§
Returns a list of resources contributing to the definition of a category.
- select(definition, parent=None, resource=None, entity_type=None, **kwargs)[source]§
Retrieves a class by its name or definition. The definition can be a class name (string) or a set of entities, or an AnnotDef object defining the contents based on original resources or an AnnotOp which defines the contents as an operation over other definitions.
- set_interclass_network_df(**kwargs)[source]§
Creates a data frame of the whole inter-class network and keeps it assigned to the instance in order to make subsequent queries faster.