pypath.utils.orthology.EnsemblOrthology§

class pypath.utils.orthology.EnsemblOrthology(target: int | str, source: int | str = 9606, id_type: str = 'uniprot', only_swissprot: bool | None = None, hc: bool | None = None, types: list[Literal['one2one', 'one2many', 'many2many']] | None = None)[source]§

Bases: ProteinOrthology

__init__(target: int | str, source: int | str = 9606, id_type: str = 'uniprot', only_swissprot: bool | None = None, hc: bool | None = None, types: list[Literal['one2one', 'one2many', 'many2many']] | None = None)[source]§

Orthology translation with Ensembl data.

Args
target:

Name or NCBI Taxonomy ID of the target organism.

source:

Name or NCBI Taxonomy ID of the source organism.

id_type:

The identifier type to use.

only_swissprot:

Use only SwissProt IDs.

hc:

Use only high confidence orthology relations from Ensembl. By default it is True. You can also set it by the ensembl_hc attribute.

types:

The Ensembl orthology relationship types to use. Possible values are one2one, one2many and many2many. By default only one2one is used. You can also set this parameter by the ensembl_types attribute.

Methods

__init__(target[, source, id_type, ...])

Orthology translation with Ensembl data.

asdict([full_records])

Create a dictionary from the translation table.

df([full_records])

Orthologous pairs as data frame.

get_taxon(protein[, only_swissprot])

get_taxon_trembl(protein)

has_protein(protein)

is_swissprot(protein)

load()

load_proteome(taxon[, only_swissprot])

load_taxonomy()

match(ortholog, **kwargs)

Check an ortholog against filtering criteria.

reload()

translate(identifier[, full_records])

For one UniProt ID of the source organism returns all orthologues from the target organism.

translate_df(df[, cols, ortho_df])

Translate columns in a data frame.

Attributes

key

pickle_path

resource

asdict(full_records: bool = False, **kwargs) dict[str, set[OrthologBase]]§

Create a dictionary from the translation table.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A dict with identifiers of the source organism as keys, and sets of their orthologs as values.

df(full_records: bool = False, **kwargs) DataFrame§

Orthologous pairs as data frame.

Parameters:
  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific filtering criteria.

Returns:

A data frame with pairs of orthologous identifiers, in two columns: “source” and “target”.

match(ortholog: OrthologBase, **kwargs) bool[source]§

Check an ortholog against filtering criteria.

Parameters:
  • ortholog – An ortholog record.

  • kwargs – Override default filtering parameters.

Returns:

True if the ortholog meets the criteria.

translate(identifier: str | Iterable[str], full_records: bool = False, **kwargs) set[str]§

For one UniProt ID of the source organism returns all orthologues from the target organism.

Parameters:
  • identifier – An identifier corresponding to the ID type and source organism of the instance.

  • full_records – Include not only the identifiers, but also some properties of the orthology relationships.

  • kwargs – Resource specific translation parameters.

Returns:

A set of identifiers of orthologues in the target taxon.

translate_df(df: DataFrame, cols: str | list[str] | None = None, ortho_df: DataFrame | None = None, **kwargs)§

Translate columns in a data frame.

Parameters:
  • df – A data frame.

  • cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be UniProt.

  • ortho_df – Override the translation data frame. If provided, the parameters in kwargs won’t have an effect. Must have columns “source” and “target”.

  • kwargs – Resource specific translation parameters.

Returns:

A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.