pypath.core.annot.UniprotFamilies§
- class pypath.core.annot.UniprotFamilies(ncbi_tax_id=9606, **kwargs)[source]§
Bases:
AnnotationBase
Methods
__init__
([ncbi_tax_id])Protein families from UniProt.
add_complexes_by_inference
([complexes])Creates complex annotations by in silico inference and adds them to this annotation set.
All protein complexes annotated in this resource.
all_entities
([entity_types])All entities annotated in this resource.
All miRNAs annotated in this resource.
All UniProt IDs annotated in this resource.
all_refs
()Some annotations contain references.
annotate_complex
(cplex)Infers annotations for a single complex.
browse
([field, start])Print gene information as a table.
complex_inference
([complexes])Annotates all complexes in complexes, by default in the default complex database (existing in the complex module or generated on demand according to the module's current settings).
coverage
(other)Calculates the coverage of the annotation i.e. the proportion of entities having at least one record in this annotation resource for an arbitrary set of entities.
Counts the reference-record pairs.
from_dump
()get_entity_type
(key)Returns the list of field names in the records.
get_reference_set
([proteins, complexes, ...])Retrieves the reference set i.e. the set of all entities which potentially have annotation in this resource.
get_subset
([method, entity_type])Retrieves a subset by filtering based on
kwargs
.get_subset_bool_array
([reference_set])Returns a boolean vector with True and False values for each entity in the reference set.
get_values
(name[, exclude_none])Returns the set of all possible values of a field.
has_complexes
()is_complex
(key)is_mirna
(key)is_protein
(key)labels
([method])Same as
select
but returns a list of labels (more human readable).load
()Loads the annotation data by calling the input method.
Loads the data by calling
input_method
.Retrieves a set of all UniProt IDs to have a base set of the entire proteome.
make_df
([rebuild])Compiles a
pandas.DataFrame
from the annotation data.numof_complex_records
()numof_complexes
()The number of annotated entities in the resource.
numof_mirna_records
()numof_mirnas
()numof_protein_records
()numof_proteins
()numof_records
([entity_types])The total number of annotation records.
Some annotations contain references.
process
()Calls the
_process_method
.proportion
(other)reload
()Reloads the object from the module level.
save_to_pickle
(pickle_file)select
([method, entity_type])Retrieves a subset by filtering based on
kwargs
.Sets the data input method by looking up in
inputs
module if necessary.Assigns the reference set to the :py:attr``reference_set`` attribute.
show
([method, table_param])Same as
select
but prints a table to the console with basic information from the UniProt datasheets.subset_intersection
(universe, **kwargs)Calculates the proportion of entities in a subset occuring in the set
universe
.to_array
([reference_set, use_fields])Returns an entity vs feature array.
to_bool_array
(reference_set)Returns a presence/absence boolean array for a reference set.
to_set
()Returns the entities present in this annotation resource as a set.
Attributes
has_fields
summary
- add_complexes_by_inference(complexes=None)§
Creates complex annotations by in silico inference and adds them to this annotation set.
- all_complexes()§
All protein complexes annotated in this resource.
- all_entities(entity_types=None)§
All entities annotated in this resource.
- all_mirnas()§
All miRNAs annotated in this resource.
- all_proteins()§
All UniProt IDs annotated in this resource.
- all_refs()§
Some annotations contain references. The field name for references is always
pmid
(PubMed ID). This method collects the references across all records. Returns list.
- annotate_complex(cplex)§
Infers annotations for a single complex.
- browse(field: str | list[str] | dict[str, str] | None = None, start: int = 0, **kwargs)§
Print gene information as a table.
Presents information about annotation categories as ascii tables printed in the terminal. If one category provided, prints one table. If multiple categories provided, prints a table for each of them one by one proceeding to the next one once you hit return. If no categories provided goes through all levels of the primary category.
- Args
- field:
The field to browse categories by.
If None the primary field will be selected. If this annotation resource doesn’t have fields, all proteins will be presented as one single category.
If a string it will be considered a field name and it will browse through all levels of this field.
If a
list
, set or tuple, it will be considered either alist
of field names or a list of values from the primary field. In the former case all combinations of the values of the fields will be presented, in the latter case the browsing will be limited to the levels of the primary field contained infield
.If a
dict
, keys are supposed to be field names and values as list of levels. If any of the values are None, all levels from that field will be used.
- start:
Start browsing from this category. E.g. if there are 500 categories and start is 250 it will skip everything before the 250th.
- kwargs:
Passed to
pypath.utils.uniprot.info
.
- complex_inference(complexes=None)§
Annotates all complexes in complexes, by default in the default complex database (existing in the complex module or generated on demand according to the module’s current settings).
- Returns:
Dict with complexes as keys and sets of annotations as values.
Complexes with no valid information in this annotation resource
won’t be in the dict.
- Parameters:
complexes (iterable) – Iterable yielding complexes.
- coverage(other)§
Calculates the coverage of the annotation i.e. the proportion of entities having at least one record in this annotation resource for an arbitrary set of entities.
- curation_effort()§
Counts the reference-record pairs.
- get_names()§
Returns the list of field names in the records. The annotation consists of uniform records and each entity might be annotated with one or more records. Each record is a tuple of fields, for example
('cell_type', 'expression_level', 'score')
.
- static get_reference_set(proteins=(), complexes=(), use_complexes=False, ncbi_tax_id=9606, swissprot_only=True)§
Retrieves the reference set i.e. the set of all entities which potentially have annotation in this resource. Typically this is the proteome of the organism from UniProt optionally with all the protein complexes from the complex database.
- get_subset(method=None, entity_type=None, **kwargs)§
Retrieves a subset by filtering based on
kwargs
. Each argument should be a name and a value or set of values. Elements having the provided values in the annotation will be returned. Returns a set of UniProt IDs.
- get_subset_bool_array(reference_set=None, **kwargs)§
Returns a boolean vector with True and False values for each entity in the reference set. The values represent presence absence data in the simplest case, but by providing
kwargs
any kind of matching and filtering is possible.kwargs
are passed to theselect
method.
- get_values(name, exclude_none=True)§
Returns the set of all possible values of a field. E.g. if the records of this annotation have a field
cell_type
then calling this method can tell you that across all records the values of this field might be{'macrophage', 'epithelial_cell', ...}
.
- labels(method=None, **kwargs)§
Same as
select
but returns a list of labels (more human readable).
- load()§
Loads the annotation data by calling the input method. Infers annotations for complexes in the complex database if py:attr:
infer_complexes
is True.
- load_data()§
Loads the data by calling
input_method
.
- load_proteins()§
Retrieves a set of all UniProt IDs to have a base set of the entire proteome.
- make_df(rebuild=False)§
Compiles a
pandas.DataFrame
from the annotation data. The data frame will be assigned to :py:attr``df``.
- numof_entities()§
The number of annotated entities in the resource.
- numof_records(entity_types=None)§
The total number of annotation records.
- numof_references()§
Some annotations contain references. The field name for references is always
pmid
(PubMed ID). This method collects and counts the references across all records.
- process()§
Calls the
_process_method
.
- reload()§
Reloads the object from the module level.
- select(method=None, entity_type=None, **kwargs)§
Retrieves a subset by filtering based on
kwargs
. Each argument should be a name and a value or set of values. Elements having the provided values in the annotation will be returned. Returns a set of UniProt IDs.
- set_method()§
Sets the data input method by looking up in
inputs
module if necessary.
- set_reference_set()§
Assigns the reference set to the :py:attr``reference_set`` attribute. The reference set is the set of all entities which potentially have annotation in this resource. Typically this is the proteome of the organism from UniProt optionally with all the protein complexes from the complex database.
- show(method=None, table_param=None, **kwargs)§
Same as
select
but prints a table to the console with basic information from the UniProt datasheets.
- subset_intersection(universe, **kwargs)§
Calculates the proportion of entities in a subset occuring in the set
universe
. The subset is selected by passingkwargs
to theselect
method.
- to_array(reference_set=None, use_fields=None)§
Returns an entity vs feature array. In case of more complex annotations this might be huge.
- to_bool_array(reference_set)§
Returns a presence/absence boolean array for a reference set.
- to_set()§
Returns the entities present in this annotation resource as a set.