pypath.utils.orthology.HomologeneOrthology§

class pypath.utils.orthology.HomologeneOrthology(target: str | int, source: str | int | None = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, **kwargs)[source]§

Bases: ProteinOrthology

__init__(target: str | int, source: str | int | None = 9606, id_type: str = 'uniprot', only_swissprot: bool = True, **kwargs)§

This class translates between homologous UniProt IDs of two organisms based on NCBI HomoloGene and Ensembl data. In case of HomoloGene, the UniProt-UniProt translation table is created by translating the source organism UniProts to RefSeq and Entrez IDs, finding the homologues (orthologues) for these IDs, and then translating them to the target organism UniProt IDs. In case of Ensembl, we obtain data with Ensembl protein identifiers and translate those to UniProt.

Args
target:

Name or NCBI Taxonomy ID of the target organism.

source:

Name or NCBI Taxonomy ID of the source organism.

id_type:

The identifier type to use.

only_swissprot:

Use only SwissProt IDs.

kwargs:

Resource specific parameters.

Methods

__init__(target[, source, id_type, ...])

This class translates between homologous UniProt IDs of two organisms based on NCBI HomoloGene and Ensembl data.

asdict([full_records])

Create a dictionary from the translation table.

df([full_records])

Orthologous pairs as data frame.

get_taxon(protein[, only_swissprot])

get_taxon_trembl(protein)

has_protein(protein)

is_swissprot(protein)

load()

Load orthology data from NCBI HomoloGene.

load_proteome(taxon[, only_swissprot])

load_taxonomy()

match(ortholog, **kwargs)

reload()

translate(identifier[, full_records])

For one UniProt ID of the source organism returns all orthologues from the target organism.

translate_df(df[, cols, ortho_df])

Translate columns in a data frame.

Attributes

key

pickle_path

resource

asdict(full_records: bool = False, **kwargs) dict[str, set[OrthologBase]]§

Create a dictionary from the translation table.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A dict with identifiers of the source organism as keys, and sets of their orthologs as values.

df(full_records: bool = False, **kwargs) DataFrame§

Orthologous pairs as data frame.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A data frame with pairs of orthologous identifiers, in two columns: “source” and “target”.

load()[source]§

Load orthology data from NCBI HomoloGene.

Builds orthology translation table as dict based on NCBI HomoloGene data. If the id_type is supported by HomoloGene (Gene Symbol, RefSeq, Entrez, GI), the data will be simply loaded. For other ID types it translates HomoloGene Gene Symbol, RefSeq and Entrez tables to UniProt and then translates the orthologous UniProt pairs to the desired ID type.

translate(identifier: str | Iterable[str], full_records: bool = False, **kwargs) set[str]§

For one UniProt ID of the source organism returns all orthologues from the target organism.

Parameters:
  • identifier – An identifier corresponding to the ID type and source organism of the instance.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific translation parameters.

Returns:

A set of identifiers of orthologues in the target taxon.

translate_df(df: DataFrame, cols: str | list[str] | None = None, ortho_df: DataFrame | None = None, **kwargs)§

Translate columns in a data frame.

Parameters:
  • df – A data frame.

  • cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be UniProt.

  • ortho_df – Override the translation data frame. If provided, the parameters in kwargs won’t have an effect. Must have columns “source” and “target”.

  • kwargs – Resource specific translation parameters.

Returns:

A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.