pypath.utils.orthology.translate_df§

pypath.utils.orthology.translate_df(df: DataFrame, target: str | int, source: str | int = 9606, cols: str | list[str] | dict[str, str] | None = None, id_type: str = 'uniprot', only_swissprot: bool = True, oma: bool | None = None, homologene: bool | None = None, ensembl: bool | None = None, oma_rel_type: set[Literal['1:1', '1:n', 'm:1', 'm:n']] | None = None, oma_score: float | None = None, ensembl_hc: bool = True, ensembl_types: list[Literal['one2one', 'one2many', 'many2many']] | None = None, **kwargs: str | tuple[str, str]) DataFrame[source]§

Translate columns in a data frame.

Parameters:
  • df – A data frame.

  • target – Name or NCBI Taxonomy ID of the target organism.

  • source – Name or NCBI Taxonomy ID of the source organism.

  • cols – One or more columns to be translated. It can be a single column name, an iterable of column names or a dict where keys are column names and values are ID types. Except this last case, identifiers are assumed to be id_type.

  • id_type – The default identifier type to use, will be used for all columns where ID type is not specified.

  • only_swissprot – Use only SwissProt IDs.

  • oma – Use orthology information from the Orthologous Matrix (OMA). Currently this is the recommended source for orthology data.

  • homologene – Use orthology information from NCBI HomoloGene.

  • ensembl – Use orthology information from Ensembl.

  • oma_rel_type – Restrict relations to certain types.

  • oma_score – Lower threshold for similarity metric.

  • ensembl_hc – Use only the high confidence orthology relations from Ensembl.

  • ensembl_types – Ensembl orthology relation types to use. Possible values are one2one, one2many and many2many. By default only one2one is used.

  • kwargs – Same as providing a dict to cols, but beware, keys (column names) can not match existing argument names of this function.

Returns:

A data frame with the same column layout as the input, and the identifiers translated as demanded. Rows that could not be translated are omitted.