pypath.utils.orthology.OrthologyManager§

class pypath.utils.orthology.OrthologyManager(cleanup_period: int = 10, lifetime: int = 300, **kwargs)[source]§

Bases: Logger

__init__(cleanup_period: int = 10, lifetime: int = 300, **kwargs)[source]§

Make this instance a logger.

Parameters:
  • name – The label of this instance that will be prepended to all messages it sends to the logger.

  • module – Send the messages by the logger of this module.

Methods

__init__([cleanup_period, lifetime])

Make this instance a logger.

get_df(target[, source, id_type, ...])

Create a data frame for one source organism and ID type.

get_dict(target[, source, id_type, ...])

Create a dictionary for one source organism and ID type.

load(key)

reload()

translate(identifiers, target[, source, ...])

Translate one or more identifiers by orthologous gene pairs.

translate_df(df, target[, source, cols, ...])

Translate columns in a data frame.

which_table(target[, source, ...])

Attributes

RESOURCE_PARAM

TRANSLATION_PARAM

get_df(target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False, **kwargs) DataFrame[source]§

Create a data frame for one source organism and ID type.

Parameters:
  • target – Name or NCBI Taxonomy ID of the target organism.

  • source – Name or NCBI Taxonomy ID of the source organism.

  • id_type – The identifier type to use.

  • only_swissprot – Use only SwissProt IDs.

  • oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.

  • homologene – Use orthology information from NCBI HomoloGene.

  • ensembl – Use orthology information from Ensembl.

  • oma_rel_type – Restrict relations to certain types.

  • oma_score – Lower threshold for similarity metric.

  • ensembl_hc – Use only the high confidence orthology relations from Ensembl.

  • ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Ignored.

Returns:

A data frame with pairs of orthologous identifiers, in two columns: “source” and “target”.

get_dict(target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False) dict[str, set[OrthologBase]][source]§

Create a dictionary for one source organism and ID type.

Parameters:
  • target – Name or NCBI Taxonomy ID of the target organism.

  • source – Name or NCBI Taxonomy ID of the source organism.

  • id_type – The identifier type to use.

  • only_swissprot – Use only SwissProt IDs.

  • oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.

  • homologene – Use orthology information from NCBI HomoloGene.

  • ensembl – Use orthology information from Ensembl.

  • oma_rel_type – Restrict relations to certain types.

  • oma_score – Lower threshold for similarity metric.

  • ensembl_hc – Use only the high confidence orthology relations from Ensembl.

  • ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

Returns:

A dict with identifiers of the source organism as keys, and sets of their orthologs as values.

translate(identifiers: str | Iterable[str], target: str | int, source: str | int = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool = None, homologene: bool = None, ensembl: bool = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, full_records: bool = False)[source]§

Translate one or more identifiers by orthologous gene pairs.

Parameters:
  • identifiers – One or more identifers of the source organism, of ID type id_type.

  • target – Name or NCBI Taxonomy ID of the target organism.

  • source – Name or NCBI Taxonomy ID of the source organism.

  • id_type – The identifier type to use.

  • only_swissprot – Use only SwissProt IDs.

  • oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.

  • homologene – Use orthology information from NCBI HomoloGene.

  • ensembl – Use orthology information from Ensembl.

  • oma_rel_type – Restrict relations to certain types.

  • oma_score – Lower threshold for similarity metric.

  • ensembl_hc – Use only the high confidence orthology relations from Ensembl.

  • ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

Returns:

Set of identifiers of orthologous genes or proteins in the target taxon.

translate_df(df: DataFrame, target: str | int, source: str | int = 9606, cols: str | list[str] | dict[str, str] | None = None, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, **kwargs: str | tuple[str, str]) DataFrame[source]§

Translate columns in a data frame.

Parameters:
  • df – A data frame.

  • cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be id_type.

  • target – Name or NCBI Taxonomy ID of the target organism.

  • source – Name or NCBI Taxonomy ID of the source organism.

  • id_type – The default identifier type to use, will be used for all columns where ID type is not specified.

  • only_swissprot – Use only SwissProt IDs.

  • oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.

  • homologene – Use orthology information from NCBI HomoloGene.

  • ensembl – Use orthology information from Ensembl.

  • oma_rel_type – Restrict relations to certain types.

  • oma_score – Lower threshold for similarity metric.

  • ensembl_hc – Use only the high confidence orthology relations from Ensembl.

  • ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.

  • kwargs – Same as providing a dict to cols, but beware, keys (column names) can not match existing argument names of this function.

Returns:

A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.