How to load DoRothEA in pypath

pypath is a Python module for building custom molecular interaction networks. It has built-in methods to load TF-target interaction data from DoRothEA. It downloads the data from DoRothEA's git repository, processes it and builds an igraph object out of it. Below I show how to load the data. Note, you can access DoRothEA data also by the web service at omnipathdb.org and static files in the DoRothEA git repo.

First make sure you have pypath installed. You can find installation instructions here. Once you have pypath installed import it and create a pypath.PyPath() object.

In [2]:
	=== d i s c l a i m e r ===

	All data coming with this module
	either as redistributed copy or downloaded using the
	programmatic interfaces included in the present module
	are available under public domain, are free to use at
	least for academic research or education purposes.
	Please be aware of the licences of all the datasets
	you use in your analysis, and please give appropriate
	credits for the original sources when you publish your
	results. To find out more about data sources please
	look at `pypath.descriptions` and
	`pypath.data_formats.urls`.

	> New session started,
	session ID: '6epsq'
	logfile: './log/6epsq.log'
	pypath version: 0.7.110

There is a method dedicated to DoRothEA. You can set the confidence levels (A, B, C, D) or use only literature curated interactions.

In [3]:
	:: Loading data from cache previously downloaded from www.uniprot.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/denes/.pypath/cache/784f0a43e5831454b1d10db1b9480df7-`.
 > TFRegulons
	:: Loading data from cache previously downloaded from github.com
	:: Ready. Resulted `zip extracted data` of type dict of file objects.                                                                                
	:: Local file at `/home/denes/.pypath/cache/d43d9194b1ff704be636c4a5732203a9-database_20180915.csv.zip`.
	:: Loading 'genesymbol' to 'uniprot' mapping table
	:: Loading 'uniprot-sec' to 'uniprot-pri' mapping table
	:: Loading data from cache previously downloaded from www.uniprot.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/denes/.pypath/cache/c87b574b25efc888967e7ab939302989-`.
	:: Loading data from cache previously downloaded from ftp.uniprot.org
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/denes/.pypath/cache/49314fe217bf0f2a5544a2c4314b4adf-sec_ac.txt`.
        Reading from file -- finished: : 0.00it [00:00, ?it/s]
	:: Loading 'genesymbol' to 'trembl' mapping table
	:: Loading 'genesymbol-syn' to 'uniprot' mapping table
        Processing nodes -- finished: 100%|██████████| 4.84k/4.84k [00:00<00:00, 203kit/s]
        Processing edges -- finished: 100%|██████████| 4.84k/4.84k [00:00<00:00, 99.4kit/s]
        Processing attributes -- finished: 100%|██████████| 4.84k/4.84k [00:01<00:00, 2.47kit/s]
 :: Comparing with reference lists... done.

 > 4696 interactions between 2313 nodes
 from 17 resources have been loaded,
 for details see the log: ./log/6epsq.log

As you see this resulted 4.7k interactions between 2.3k genes. Let's see from which resources these come from:

In [4]:
Out[4]:
['trrd_via_tfact',
 'trrust',
 'tred_via_RegNetwork',
 'reviews',
 'IntAct',
 'TFe',
 'jaspar_v2018',
 'kegg',
 'HTRIdb',
 'ARACNe-GTEx',
 'NFIRegulomeDB',
 'oreganno',
 'ReMap',
 'PAZAR',
 'tfact',
 'fantom4',
 'hocomoco_v11']

The PyPath object contains the network in an igraph.Graph object:

In [6]:
Out[6]:
<igraph.Graph at 0x65f4aa42bb88>

Additional details are available in the edge attributes of this graph object. For example we can see the directions of the first edge. It says this edge is from the databases PAZAR and ORegAnno, and it is a stimulatory interaction:

In [5]:
Directions and signs of interaction between P23769 and P47898

	P23769 ===> P47898 :: PAZAR, oreganno
	P23769 =+=> P47898 :: PAZAR, oreganno

See the sources and literature references for the same edge:

In [9]:
Out[9]:
'22951020'
In [10]:
Out[10]:
{'PAZAR', 'oreganno'}

There are many ways to query the PyPath object. For example to see which genes are the targets of the estrogen receptor (a TF):

In [22]:
Out[22]:
'NOTCH2NL, CTSD, MMP13, MYC, CHAT, SERPINB9, EDN3, ADORA1, TRH, LGALS8, CDKN1B, PRL, TAC3, NOTCH2, CAD, NBPF1, NBPF4, BLM, NFIB, PELP1, CEBPB, ARID5B, JUNB, TERT, BCL2, TP53, TNIP1, CYP1A1, GREB1, HOXA10, TRIM16, PMAIP1, SEC61B, AXIN2, UGT2B15, E2F1, ESRRA, NBPF15, CCND1, TFAP2C, NEK6, FBLN2, CDH1, HSPB1, CDKN1A, AVP, CD24, NR5A2, MICB, CYP1B1, RUNX2, FOSL1, TGFA, GSN, POFUT1, VEGFA, ZMYND8, FOXP1, ZFHX3, PLAC1, CCT6B, MDM2, CLDN4, CRH, PBX1, OXT, SPATA2, F12, RET, CD86, TFF1, CYP2C19, FOS, MACROD1, BCL9, RARA, POR, NID2, SP1, CRHBP, PGR, KRT19, SERPINE1, JUN, EGFR, KDR, CEACAM3, NQO1, PTMA, AR, CTNNB1, CYP19A1, BTG2, TYMS, SPARC, CAPN2, MTA3, FOXC1, CXCL12, MB, ABCG2, YWHAQ, NRF1'

You can combine other kind of networks with DoRothEA, for example to add a protein-protein interaction network using the activity flow dataset defined in pypath.data_formats:

In [23]:
	:: Loading data from cache previously downloaded from www.uniprot.org
	:: Ready. Resulted `plain text` of type unicode string.                                                                                              
	:: Local file at `/home/denes/.pypath/cache/784f0a43e5831454b1d10db1b9480df7-`.
 > TRIP
	:: Reading from cache: /home/denes/.pypath/cache/trip.edges.pickle
        Processing nodes -- finished: 100%|██████████| 423/423 [00:00<00:00, 66.6kit/s]
        Processing edges -- finished: 100%|██████████| 423/423 [00:00<00:00, 28.0kit/s]
        Processing attributes -- finished: 100%|██████████| 423/423 [00:00<00:00, 2.49kit/s]
 > SPIKE
	:: Reading from cache: /home/denes/.pypath/cache/spike.edges.pickle
        Processing nodes -- finished: 100%|██████████| 3.72k/3.72k [00:00<00:00, 203kit/s]
        Processing edges -- finished: 100%|██████████| 3.72k/3.72k [00:00<00:00, 126kit/s]
        Processing attributes -- finished: 100%|██████████| 3.72k/3.72k [00:01<00:00, 2.64kit/s]
 > SignaLink3
	:: Reading from cache: /home/denes/.pypath/cache/signalink3.edges.pickle
        Processing nodes -- finished: 100%|██████████| 6.94k/6.94k [00:00<00:00, 361kit/s]
        Processing edges -- finished: 100%|██████████| 6.94k/6.94k [00:00<00:00, 98.3kit/s]
        Processing attributes -- finished: 100%|██████████| 6.94k/6.94k [00:02<00:00, 2.81kit/s]
 > Guide2Pharma
	:: Reading from cache: /home/denes/.pypath/cache/guide2pharma.edges.pickle
        Processing nodes -- finished: 100%|██████████| 266/266 [00:00<00:00, 15.5kit/s]
        Processing edges -- finished: 100%|██████████| 266/266 [00:00<00:00, 27.7kit/s]
        Processing attributes -- finished: 100%|██████████| 266/266 [00:00<00:00, 1.29kit/s]
 > CA1
	:: Reading from cache: /home/denes/.pypath/cache/ca1.edges.pickle
        Processing nodes -- finished: 100%|██████████| 1.88k/1.88k [00:00<00:00, 184kit/s]
        Processing edges -- finished: 100%|██████████| 1.88k/1.88k [00:00<00:00, 81.6kit/s]
        Processing attributes -- finished: 100%|██████████| 1.88k/1.88k [00:00<00:00, 3.87kit/s]
 > ARN
	:: Reading from cache: /home/denes/.pypath/cache/arn.edges.pickle
        Processing nodes -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 23.0kit/s]
        Processing edges -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 15.7kit/s]
        Processing attributes -- finished: 100%|██████████| 95.0/95.0 [00:00<00:00, 993it/s]
 > NRF2ome
	:: Reading from cache: /home/denes/.pypath/cache/nrf2ome.edges.pickle
        Processing nodes -- finished: 100%|██████████| 109/109 [00:00<00:00, 24.0kit/s]
        Processing edges -- finished: 100%|██████████| 109/109 [00:00<00:00, 5.87kit/s]
        Processing attributes -- finished: 100%|██████████| 109/109 [00:00<00:00, 1.08kit/s]
 > Macrophage
	:: Reading from cache: /home/denes/.pypath/cache/macrophage.edges.pickle
        Processing nodes -- finished: 100%|██████████| 4.85k/4.85k [00:00<00:00, 442kit/s]
        Processing edges -- finished: 100%|██████████| 4.85k/4.85k [00:00<00:00, 156kit/s]
        Processing attributes -- finished: 100%|██████████| 4.85k/4.85k [00:00<00:00, 6.86kit/s]
 > DeathDomain
	:: Reading from cache: /home/denes/.pypath/cache/deathdomain.edges.pickle
        Processing nodes -- finished: 100%|██████████| 236/236 [00:00<00:00, 54.9kit/s]
        Processing edges -- finished: 100%|██████████| 236/236 [00:00<00:00, 18.0kit/s]
        Processing attributes -- finished: 100%|██████████| 236/236 [00:00<00:00, 580it/s]  
 > PDZBase
	:: Reading from cache: /home/denes/.pypath/cache/pdzbase.edges.pickle
        Processing nodes -- finished: 100%|██████████| 133/133 [00:00<00:00, 9.23kit/s]
        Processing edges -- finished: 100%|██████████| 133/133 [00:00<00:00, 14.0kit/s]
        Processing attributes -- finished: 100%|██████████| 133/133 [00:00<00:00, 1.02kit/s]
 > Signor
	:: Loading data from cache previously downloaded from signor.uniroma2.it
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/denes/.pypath/cache/a357fe979f74a823bf4a42150a6dcf33-download_entity.php`.
	:: Loading 'genesymbol' to 'swissprot' mapping table
	:: Loading 'genesymbol-syn' to 'swissprot' mapping table
        Processing nodes -- finished: 100%|██████████| 10.1k/10.1k [00:00<00:00, 468kit/s]
        Processing edges -- finished: 100%|██████████| 10.1k/10.1k [00:00<00:00, 166kit/s]
        Processing attributes -- finished: 100%|██████████| 10.1k/10.1k [00:04<00:00, 3.74kit/s]
        Processing nodes -- finished: 100%|██████████| 579/579 [00:00<00:00, 27.9kit/s]
        Processing edges -- finished: 100%|██████████| 579/579 [00:00<00:00, 30.6kit/s]
Processing attributes: initializing:   0%|          | 0.00/579 [00:00<?, ?it/s]
 > HPMR
        Processing attributes -- finished: 100%|██████████| 579/579 [00:00<00:00, 816it/s]
 > CellPhoneDB
	:: Loading data from cache previously downloaded from www.cellphonedb.org
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/denes/.pypath/cache/142eb923569634ee61ca1d56843de13a-interactions_cellphonedb.csv`.
	:: Loading data from cache previously downloaded from www.cellphonedb.org
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/denes/.pypath/cache/183907e20d3c18bd773b7e085fc3a650-heterodimers.csv`.
        Processing nodes -- finished: 100%|██████████| 148/148 [00:00<00:00, 19.8kit/s]
        Processing edges -- finished: 100%|██████████| 148/148 [00:00<00:00, 30.6kit/s]
        Processing attributes -- finished: 100%|██████████| 148/148 [00:00<00:00, 722it/s] 
 > Ramilowski2015
	:: Loading data from cache previously downloaded from media.nature.com
	:: Ready. Resulted `plain text` of type file object.                                                                                                 
	:: Local file at `/home/denes/.pypath/cache/2a72408fb2700d17cff8c9b48701de70-ncomms8866-s3.xlsx`.
        Processing nodes -- finished: 100%|██████████| 341/341 [00:00<00:00, 89.8kit/s]
        Processing edges -- finished: 100%|██████████| 341/341 [00:00<00:00, 15.4kit/s]
        Processing attributes -- finished: 100%|██████████| 341/341 [00:00<00:00, 822it/s]
 :: Comparing with reference lists... done.

 > 19341 interactions between 5798 nodes
 from 31 resources have been loaded,
 for details see the log: ./log/6epsq.log

Now you have both TF-target and protein-protein interactions in your network. The type edge attribute shows the categories of interactions. For example certain pairs of proteins both transcriptionally regulate each other and interact with each other, hence their interaction belongs to both TF and PPI categories:

In [31]:
Out[31]:
['TF', 'PPI']