
class pypath.utils.proteomicsdb.ProteomicsDB(username, password, output_format='json')[source]§

Bases: object

__init__(username, password, output_format='json')[source]§

This is an extensible class for downloading and processing data from ProteomicsDB. Now 2 of the 10 available APIs implemented here, but feel free to write functions for the other APIs. To find out more about ProteomicsDB, take a look at Wilhelm et al. 2014, Nature: To read a comprehensive descritpion of the APIs, visit here:


Registered and API enabled user for ProteomicsDB. To have such a user, you need first to register, AND then write an e-mail to the address given on the webpage. In a couple of days the admins will enable the API for your user.


Password of the user.


Either ‘json’ or ‘xml’. Some functions in this module process JSON further and give certain objects.


__init__(username, password[, output_format])

This is an extensible class for downloading and processing data from ProteomicsDB.

get_expression([normalized, tissue_average])

Extracts normalized or unnormalized expression data from previously downloaded data, stored on disk, and opened for reading in file object ProteomicsDB.result.


get_pieces([size, delimiters])

A generator for reading huge files (hundreds of MBs).

get_proteins(tissue_id[, ...])


Gets an annotated list of all tissues for which ProteomicsDB has expression data.



Returns expression data in a pandas matrix.

query(api, param[, silent, large])

Retrieves data from the API.



tissues_x_proteins([normalized, tissues])

For all tissues downloads the expression of all the proteins.

which_tissues(name, value)

get_expression(normalized=True, tissue_average=False)[source]§

Extracts normalized or unnormalized expression data from previously downloaded data, stored on disk, and opened for reading in file object ProteomicsDB.result. Optionally averages data per tissue.


Read normalized or unnormalized expression values.


Read and store data for each samples, or keep only the mean value per tissue.

get_pieces(size=20480, delimiters=('{', '}'))[source]§

A generator for reading huge files (hundreds of MBs). Reads segments of @size, searches for self-contained JSON objects, and returns a list of them.


Size to read at once (in Bytes).


Starting and closing delimiters. By default, these are curly braces, to return individual JSON objects of the largest possible size.

get_proteins(tissue_id, calculation_method=0, swissprot_only=1, no_isoform=1)[source]§

Gets an annotated list of all tissues for which ProteomicsDB has expression data. Result stored in ProteomicsDB.tissues.


Returns expression data in a pandas matrix. Not implemented.

query(api, param, silent=False, large=False)[source]§

Retrieves data from the API.


Shold be one of the 10 API sections available.


Tuple of the parameters according to the API.


Passed to the curl wrapper function. If True, the file will be written to disk, and a file object open for reading is returned; if False, the raw data will be returned, in case of JSON, converted to python object, in case of XML, as a string.

tissues_x_proteins(normalized=True, tissues=None)[source]§

For all tissues downloads the expression of all the proteins. In the result, a dict of dicts will hold the expression values of each proteins, grouped by samples.