pypath.utils.orthology.OrthologyManager§
- class pypath.utils.orthology.OrthologyManager(cleanup_period: int = 10, lifetime: int = 300, **kwargs)[source]§
Bases:
Logger
- __init__(cleanup_period: int = 10, lifetime: int = 300, **kwargs)[source]§
Make this instance a logger.
- Parameters:
name – The label of this instance that will be prepended to all messages it sends to the logger.
module – Send the messages by the logger of this module.
Methods
__init__
([cleanup_period, lifetime])Make this instance a logger.
get_df
(target[, source, id_type, ...])Create a data frame for one source organism and ID type.
get_dict
(target[, source, id_type, ...])Create a dictionary for one source organism and ID type.
load
(key)reload
()translate
(identifiers, target[, source, ...])Translate one or more identifiers by orthologous gene pairs.
translate_df
(df, target[, source, cols, ...])Translate columns in a data frame.
which_table
(target[, source, ...])Attributes
RESOURCE_PARAM
TRANSLATION_PARAM
- get_df(target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False, **kwargs) DataFrame [source]§
Create a data frame for one source organism and ID type.
- Parameters:
target – Name or NCBI Taxonomy ID of the target organism.
source – Name or NCBI Taxonomy ID of the source organism.
id_type – The identifier type to use.
only_swissprot – Use only SwissProt IDs.
oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.
homologene – Use orthology information from NCBI HomoloGene.
ensembl – Use orthology information from Ensembl.
oma_rel_type – Restrict relations to certain types.
oma_score – Lower threshold for similarity metric.
ensembl_hc – Use only the high confidence orthology relations from Ensembl.
ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.
full_records – Include not only the identifiers, but also some properties of the orthology relationships.
kwargs – Ignored.
- Returns:
A data frame with pairs of orthologous identifiers, in two columns: “source” and “target”.
- get_dict(target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False) dict[str, set[OrthologBase]] [source]§
Create a dictionary for one source organism and ID type.
- Parameters:
target – Name or NCBI Taxonomy ID of the target organism.
source – Name or NCBI Taxonomy ID of the source organism.
id_type – The identifier type to use.
only_swissprot – Use only SwissProt IDs.
oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.
homologene – Use orthology information from NCBI HomoloGene.
ensembl – Use orthology information from Ensembl.
oma_rel_type – Restrict relations to certain types.
oma_score – Lower threshold for similarity metric.
ensembl_hc – Use only the high confidence orthology relations from Ensembl.
ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.
full_records – Include not only the identifiers, but also some properties of the orthology relationships.
- Returns:
A dict with identifiers of the source organism as keys, and sets of their orthologs as values.
- translate(identifiers: str | Iterable[str], target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool = None, homologene: bool = None, ensembl: bool = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False)[source]§
Translate one or more identifiers by orthologous gene pairs.
- Parameters:
identifiers – One or more identifers of the source organism, of ID type id_type.
target – Name or NCBI Taxonomy ID of the target organism.
source – Name or NCBI Taxonomy ID of the source organism.
id_type – The identifier type to use.
only_swissprot – Use only SwissProt IDs.
oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.
homologene – Use orthology information from NCBI HomoloGene.
ensembl – Use orthology information from Ensembl.
oma_rel_type – Restrict relations to certain types.
oma_score – Lower threshold for similarity metric.
ensembl_hc – Use only the high confidence orthology relations from Ensembl.
ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.
full_records – Include not only the identifiers, but also some properties of the orthology relationships.
- Returns:
Set of identifiers of orthologous genes or proteins in the target taxon.
- translate_df(df: DataFrame, target: str | int, source: str | int = 9606, cols: str | list[str] | dict[str, str] | None = None, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, **kwargs: str | tuple[str, str]) DataFrame [source]§
Translate columns in a data frame.
- Parameters:
df – A data frame.
cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be id_type.
target – Name or NCBI Taxonomy ID of the target organism.
source – Name or NCBI Taxonomy ID of the source organism.
id_type – The default identifier type to use, will be used for all columns where ID type is not specified.
only_swissprot – Use only SwissProt IDs.
oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.
homologene – Use orthology information from NCBI HomoloGene.
ensembl – Use orthology information from Ensembl.
oma_rel_type – Restrict relations to certain types.
oma_score – Lower threshold for similarity metric.
ensembl_hc – Use only the high confidence orthology relations from Ensembl.
ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.
kwargs – Same as providing a dict to
cols
, but beware, keys (column names) can not match existing argument names of this function.
- Returns:
A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.