pypath.utils.orthology.ProteinOrthology§

class pypath.utils.orthology.ProteinOrthology(target: str | int, source: str | int | None = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, **kwargs)[source]§

Bases: Proteomes

__init__(target: str | int, source: str | int | None = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, **kwargs)[source]§

This class translates between homologous UniProt IDs of two organisms based on NCBI HomoloGene and Ensembl data. In case of HomoloGene, the UniProt-UniProt translation table is created by translating the source organism UniProts to RefSeq and Entrez IDs, finding the homologues (orthologues) for these IDs, and then translating them to the target organism UniProt IDs. In case of Ensembl, we obtain data with Ensembl protein identifiers and translate those to UniProt.

Args
target:

Name or NCBI Taxonomy ID of the target organism.

source:

Name or NCBI Taxonomy ID of the source organism.

id_type:

The identifier type to use.

only_swissprot:

Use only SwissProt IDs.

kwargs:

Resource specific parameters.

Methods

__init__(target[, source, id_type, ...])

This class translates between homologous UniProt IDs of two organisms based on NCBI HomoloGene and Ensembl data.

asdict([full_records])

Create a dictionary from the translation table.

df([full_records])

Orthologous pairs as data frame.

get_taxon(protein[, only_swissprot])

get_taxon_trembl(protein)

has_protein(protein)

is_swissprot(protein)

load([source])

load_proteome(taxon[, only_swissprot])

load_taxonomy()

match(ortholog, **kwargs)

reload()

translate(identifier[, full_records])

For one UniProt ID of the source organism returns all orthologues from the target organism.

translate_df(df[, cols, ortho_df])

Translate columns in a data frame.

Attributes

key

pickle_path

asdict(full_records: bool = False, **kwargs) dict[str, set[OrthologBase]][source]§

Create a dictionary from the translation table.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A dict with identifiers of the source organism as keys, and sets of their orthologs as values.

df(full_records: bool = False, **kwargs) DataFrame[source]§

Orthologous pairs as data frame.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A data frame with pairs of orthologous identifiers, in two columns: “source” and “target”.

translate(identifier: str | Iterable[str], full_records: bool = False, **kwargs) set[str][source]§

For one UniProt ID of the source organism returns all orthologues from the target organism.

Parameters:
  • identifier – An identifier corresponding to the ID type and source organism of the instance.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific translation parameters.

Returns:

A set of identifiers of orthologues in the target taxon.

translate_df(df: DataFrame, cols: str | list[str] | None = None, ortho_df: DataFrame | None = None, **kwargs)[source]§

Translate columns in a data frame.

Parameters:
  • df – A data frame.

  • cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be UniProt.

  • ortho_df – Override the translation data frame. If provided, the parameters in kwargs won’t have an effect. Must have columns “source” and “target”.

  • kwargs – Resource specific translation parameters.

Returns:

A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.