pypath.inputs.hmdb.common.processed§

pypath.inputs.hmdb.common.processed(*fields: str | tuple, dataset: Literal['metabolites', 'proteins'], head: int | None = None, **named_fields: str | tuple) DataFrame[source]§

Parse various simple and nested array fields from HMDB into data frame.

Parameters:
  • fields – Fields to include in the data frame. These must be keys in the schema, and will be also used as column names. Alternatively, tuples of sequetial processing steps can be provided: strings will be used as keys in nested dicts, tuples will be used as multiple keys in dicts, each yielding a separate column, the special symbol “*” means all keys in the sub-dict, while “@” means expand arrays into multiple rows. Be careful with this latter option because it is applied in a combinatorial way, i.e. in case of expanding an array to 5 rowns, and another one to 7 rows results already 35 rows from a single record. This might result excessive memory use and processing time.

  • named_fields – Same as fields, but the column name can be different from the top level key: argument names will be used as column names, values will be used as processing steps.

  • head – Process the first N records only. Useful for peeking into the data.