Introduction

This notebook shows how to extract canonical pathways in pypath using annotated pathway information.

Analysis

In [1]:
# Show all the plots inside the notebook
%matplotlib inline
In [2]:
# load packages
import pypath
import igraph  # import igraph to use the plot function

import numpy as np
import pandas as pd
import seaborn as sns
/usr/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))
In [83]:
pa = pypath.PyPath()

	=== d i s c l a i m e r ===

	All data coming with this module
	either as redistributed copy or downloaded using the
	programmatic interfaces included in the present module
	are available under public domain, are free to use at
	least for academic research or education purposes.
	Please be aware of the licences of all the datasets
	you use in your analysis, and please give appropriate
	credits for the original sources when you publish your
	results. To find out more about data sources please
	look at `pypath.descriptions` and
	`pypath.data_formats.urls`.

	ยป New session started,
	session ID: 'aijot'
	logfile:'./log/aijot.log'.
In [84]:
pa.init_network(pfile = 'cache/default_network.pickle')
	:: Network loaded from `cache/default_network.pickle`. 6710 nodes, 24833 edges.
	:: Loading 'genesymbol' to 'uniprot' mapping table
In [85]:
pa.load_all_pathways()
	:: Processing KEGG Pathways: working on it, 1/176 	:: Loading 'uniprot-sec' to 'uniprot-pri' mapping table
	:: Loading 'genesymbol' to 'trembl' mapping table
	:: Loading 'genesymbol-syn' to 'uniprot' mapping table
	:: Processing KEGG Pathways: finished, 176/176
	:: Downloading data from Signor: finished, 26/26
In [86]:
kegg_pathways_proteins, kegg_pathways_interactions = pa.get_pathways('kegg')
	:: Processing KEGG Pathways: finished, 176/176
In [7]:
print('Name\tNumber of proteins')
print('----\t------------------')
for i in kegg_pathways_proteins.keys():
    print('{}: {}'.format(i, len(kegg_pathways_proteins[i])))
Name	Number of proteins
----	------------------
Ras signaling pathway: 78
Toxoplasmosis: 50
Chemical carcinogenesis: 15
Antigen processing and presentation: 45
Sphingolipid signaling pathway: 60
Long-term potentiation: 36
Shigellosis: 27
Vibrio cholerae infection: 17
Salmonella infection: 33
Huntington's disease: 13
Type I diabetes mellitus: 6
Proteoglycans in cancer: 130
Apoptosis: 61
Rap1 signaling pathway: 79
ECM-receptor interaction: 15
Leukocyte transendothelial migration: 41
Cell cycle: 50
Morphine addiction: 21
Insulin signaling pathway: 81
Thyroid cancer: 16
Intestinal immune network for IgA production: 31
Hedgehog signaling pathway: 18
AMPK signaling pathway: 60
Insulin resistance: 54
Hippo signaling pathway: 47
Bacterial invasion of epithelial cells: 33
Circadian entrainment: 40
Renin secretion: 33
Wnt signaling pathway: 75
Graft-versus-host disease: 28
Prolactin signaling pathway: 47
Choline metabolism in cancer: 40
Regulation of lipolysis in adipocytes: 29
Hypertrophic cardiomyopathy (HCM): 18
Inflammatory bowel disease (IBD): 47
Complement and coagulation cascades: 67
Epstein-Barr virus infection: 78
Cell adhesion molecules (CAMs): 99
Thyroid hormone signaling pathway: 55
Synaptic vesicle cycle: 8
Proximal tubule bicarbonate reclamation: 6
Cytokine-cytokine receptor interaction: 191
Ovarian steroidogenesis: 24
Influenza A: 67
Phototransduction: 15
NF-kappa B signaling pathway: 56
Fc epsilon RI signaling pathway: 38
Osteoclast differentiation: 71
Alcoholism: 35
Alzheimer's disease: 32
B cell receptor signaling pathway: 49
African trypanosomiasis: 21
Autoimmune thyroid disease: 10
Basal cell carcinoma: 16
Vascular smooth muscle contraction: 41
Neurotrophin signaling pathway: 87
Insulin secretion: 35
Tuberculosis: 88
Renin-angiotensin system: 2
Gastric acid secretion: 24
Endometrial cancer: 28
Dopaminergic synapse: 37
T cell receptor signaling pathway: 69
Adipocytokine signaling pathway: 43
PI3K-Akt signaling pathway: 79
TNF signaling pathway: 58
Aldosterone synthesis and secretion: 27
Chemokine signaling pathway: 57
mTOR signaling pathway: 37
Fc gamma R-mediated phagocytosis: 50
Colorectal cancer: 27
Leishmaniasis: 33
Gap junction: 50
Cardiac muscle contraction: 8
Renal cell carcinoma: 37
Regulation of actin cytoskeleton: 67
Non-small cell lung cancer: 36
Thyroid hormone synthesis: 23
Olfactory transduction: 15
Cytosolic DNA-sensing pathway: 22
VEGF signaling pathway: 36
Inflammatory mediator regulation of TRP channels: 45
HTLV-I infection: 86
Melanoma: 26
Hepatitis B: 93
Bladder cancer: 22
Phosphatidylinositol signaling system: 26
Hepatitis C: 53
Toll-like receptor signaling pathway: 84
Endocrine and other factor-regulated calcium reabsorption: 26
Epithelial cell signaling in Helicobacter pylori infection: 34
Glucagon signaling pathway: 37
HIF-1 signaling pathway: 34
Chronic myeloid leukemia: 40
Neuroactive ligand-receptor interaction: 16
Carbohydrate digestion and absorption: 6
GABAergic synapse: 17
Circadian rhythm: 13
Rheumatoid arthritis: 16
Measles: 58
Allograft rejection: 9
Natural killer cell mediated cytotoxicity: 145
Legionellosis: 41
Type II diabetes mellitus: 30
Transcriptional misregulation in cancer: 16
Dilated cardiomyopathy (DCM): 31
Signaling pathways regulating pluripotency of stem cells: 48
Bile secretion: 22
p53 signaling pathway: 68
Glutamatergic synapse: 41
Serotonergic synapse: 31
Glioma: 38
Vasopressin-regulated water reabsorption: 14
Jak-STAT signaling pathway: 25
RIG-I-like receptor signaling pathway: 41
Pathways in cancer: 164
Arrhythmogenic right ventricular cardiomyopathy (ARVC): 12
Pertussis: 27
Non-alcoholic fatty liver disease (NAFLD): 63
Estrogen signaling pathway: 42
Cocaine addiction: 22
MAPK signaling pathway: 156
Adrenergic signaling in cardiomyocytes: 55
Viral myocarditis: 24
Viral carcinogenesis: 6
Adherens junction: 62
Staphylococcus aureus infection: 22
Systemic lupus erythematosus: 6
Pancreatic secretion: 14
Asthma: 3
Cholinergic synapse: 31
Aldosterone-regulated sodium reabsorption: 18
Acute myeloid leukemia: 38
Axon guidance: 74
Calcium signaling pathway: 32
cGMP-PKG signaling pathway: 57
Focal adhesion: 77
Retrograde endocannabinoid signaling: 12
Melanogenesis: 47
NOD-like receptor signaling pathway: 40
Amyotrophic lateral sclerosis (ALS): 32
Small cell lung cancer: 25
cAMP signaling pathway: 85
Oxytocin signaling pathway: 62
Platelet activation: 66
Prion diseases: 24
Herpes simplex infection: 68
Long-term depression: 30
Amphetamine addiction: 31
Fat digestion and absorption: 6
TGF-beta signaling pathway: 46
Pancreatic cancer: 47
Salivary secretion: 21
Chagas disease (American trypanosomiasis): 44
Notch signaling pathway: 21
FoxO signaling pathway: 89
Prostate cancer: 42
Malaria: 10
Maturity onset diabetes of the young: 31
Pathogenic Escherichia coli infection: 16
GnRH signaling pathway: 50
Taste transduction: 11
Amoebiasis: 17
ErbB signaling pathway: 57
Mineral absorption: 10
Parkinson's disease: 15
Tight junction: 47
In [8]:
kegg_pathways_proteins['Prostate cancer']
Out[8]:
{'O15111',
 'O15530',
 'O43889',
 'P00533',
 'P01111',
 'P01112',
 'P01133',
 'P04637',
 'P06400',
 'P06493',
 'P10398',
 'P10415',
 'P10745',
 'P14210',
 'P15976',
 'P19838',
 'P24385',
 'P25963',
 'P28482',
 'P29323',
 'P35222',
 'P38936',
 'P42345',
 'P46527',
 'P49841',
 'P55211',
 'P62993',
 'Q00987',
 'Q01094',
 'Q02750',
 'Q07889',
 'Q12778',
 'Q15118',
 'Q6A1A2',
 'Q8WYR1',
 'Q92793',
 'Q92934',
 'Q99801',
 'Q9BXP2',
 'Q9NWQ8',
 'Q9UJU2',
 'Q9Y243'}
In [9]:
set([pa.up(iprot)['label'] for iprot in kegg_pathways_proteins['Prostate cancer'] if iprot in pa.nodDct])
Out[9]:
{'AKT3',
 'ARAF',
 'BAD',
 'BCL2',
 'CASP9',
 'CCND1',
 'CDK1',
 'CDKN1A',
 'CDKN1B',
 'CHUK',
 'CREB3',
 'CREBBP',
 'CTNNB1',
 'E2F1',
 'EGF',
 'EGFR',
 'EPHB2',
 'FOXO1',
 'GATA1',
 'GRB2',
 'GSK3B',
 'HGF',
 'HRAS',
 'LEF1',
 'MAP2K1',
 'MAPK1',
 'MDM2',
 'MTOR',
 'NFKB1',
 'NFKBIA',
 'NKX3-1',
 'NRAS',
 'PAG1',
 'PDK1',
 'PDPK1',
 'PDPK2P',
 'RB1',
 'SOS1',
 'TP53'}
In [10]:
# Load the ipython display and image module
from IPython.display import Image
from IPython.display import display

display(Image('http://www.genome.jp/kegg/pathway/hsa/hsa05215.png'))
In [11]:
pc_proteins = [i for i in kegg_pathways_proteins['Prostate cancer'] if i in pa.nodDct]
pc_subgraph = pa.graph.induced_subgraph(pc_proteins)
In [15]:
plot1 = igraph.plot(pc_subgraph, vertex_label = pc_subgraph.vs['label'], layout = pc_subgraph.layout_auto())
plot1.save('kegg_prostate_cancer.png')
In [14]:
set(pc_subgraph.vs['label'])
Out[14]:
{'AKT3',
 'ARAF',
 'BAD',
 'BCL2',
 'CASP9',
 'CCND1',
 'CDK1',
 'CDKN1A',
 'CDKN1B',
 'CHUK',
 'CREB3',
 'CREBBP',
 'CTNNB1',
 'E2F1',
 'EGF',
 'EGFR',
 'EPHB2',
 'FOXO1',
 'GATA1',
 'GRB2',
 'GSK3B',
 'HGF',
 'HRAS',
 'LEF1',
 'MAP2K1',
 'MAPK1',
 'MDM2',
 'MTOR',
 'NFKB1',
 'NFKBIA',
 'NKX3-1',
 'NRAS',
 'PAG1',
 'PDK1',
 'PDPK1',
 'PDPK2P',
 'RB1',
 'SOS1',
 'TP53'}
In [87]:
# 5 most frequently mutated genes in prostate cancer.
query_nodes = set(['PTEN', 'FOXA1', 'TP53', 'SPOP', 'AR'])
In [88]:
for igene in query_nodes:
    print(igene)
    print(pa.gs(igene)['kegg_pathways'])
    print(pa.gs(igene)['signalink_pathways'])
    print(pa.gs(igene)['signor_pathways'])
    print(pa.gs(igene)['netpath_pathways'])
    print('---')
FOXA1
set([])
set([])
set([])
set([u'Androgen receptor (AR)'])
---
PTEN
set([u'Phosphatidylinositol signaling system', u'Sphingolipid signaling pathway', u'Focal adhesion', u'Tight junction', u'Hepatitis B', u'FoxO signaling pathway', u'p53 signaling pathway'])
set(['HH', 'TNF/Apoptosis', 'WNT', 'RTK', 'IIP'])
set([u'MTOR Signaling', u'Insulin Receptor'])
set([u'Androgen receptor (AR)'])
---
SPOP
set([])
set(['HH', 'JAK/STAT'])
set([])
set([])
---
AR
set([u'ErbB signaling pathway', u'Pathways in cancer', u'Hippo signaling pathway'])
set(['IIP', 'autophagy'])
set([])
set([u'Androgen receptor (AR)', u'Interleukin-6 (IL-6)', u' Transforming growth factor beta (TGF-beta) receptor ', u'Alpha6 Beta4 Integrin'])
---
TP53
set([u'HTLV-I infection', u'Melanoma', u'Bladder cancer', u'Sphingolipid signaling pathway', u'Chronic myeloid leukemia', u'Proteoglycans in cancer', u'Apoptosis', u'Neurotrophin signaling pathway', u'Cell cycle', u'p53 signaling pathway', u'Viral carcinogenesis', u'PI3K-Akt signaling pathway', u'Pathways in cancer', u'Hepatitis B', u'Hepatitis C', u'Wnt signaling pathway', u'Prostate cancer', u'Thyroid hormone signaling pathway', u'Glioma', u'MAPK signaling pathway', u'Epstein-Barr virus infection', u'Measles'])
set(['Notch', 'TNF/Apoptosis', 'IIP', 'RTK', 'autophagy'])
set([u'Mitochondrial Control of Apoptosis', u'P38 Signaling'])
set([u' Transforming growth factor beta (TGF-beta) receptor '])
---

Taking a look at the list of pathways each node participates in, we can find a set that connect all of them. For example, 'Androgen receptor (AR)' of 'netpath', 'HH' of 'signalink' and 'Transforming growth factor beta (TGF-beta) receptor' of 'netpath'.

In [89]:
connector_pathways = {'signalink_pathways': set(['HH']), 
                      'netpath_pathways': set([u'Androgen receptor (AR)', 
                                               u' Transforming growth factor beta (TGF-beta) receptor '])}
In [90]:
filter_func = []
filter_func.append(lambda vertex: 'HH' in vertex['signalink_pathways'])
filter_func.append(lambda vertex: u'Androgen receptor (AR)' in vertex['netpath_pathways'])
filter_func.append(lambda vertex: u' Transforming growth factor beta (TGF-beta) receptor ' in vertex['netpath_pathways'])
In [91]:
connector_node_list = pa.graph.vs.select(lambda vertex: any(i(vertex) for i in filter_func))
In [92]:
connector_subgraph = pa.graph.induced_subgraph(connector_node_list)
print('Number of nodes: {}'.format(connector_subgraph.vcount()))
print('Number of edges: {}'.format(connector_subgraph.ecount()))
Number of nodes: 350
Number of edges: 1496
In [31]:
layout = connector_subgraph.layout_fruchterman_reingold(repulserad = connector_subgraph.vcount() ** 2.8, 
    maxiter = 1000, area = connector_subgraph.vcount() ** 2.3)
plot2 = igraph.plot(connector_subgraph, layout = layout, vertex_size = 0.5,
    vertex_border_width = 0, edge_width = 0.2, edge_color = '#33333377', vertex_label_size = 9)
plot2.save('prostate_cancer_pathways.png')
display(Image('prostate_cancer_pathways.png'))
In [94]:
reload(pypath.ig_drawing_vertex)
# reload(pypath.drawing)
pplot3 = pypath.drawing.Plot(connector_subgraph, 'prostate_cancer_pathways.pdf',
    vertex_label = 'label', title_text = 'Pathways connecting prostate cancer related proteins')
pplot3.draw()
	::Calculating fruchterman_reingold layout... (numof nodes/edges: 350/1496) Done in 00 min 01 sec. 
	::Plotting pdf to file pdf/prostate_cancer_pathways.pdf...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-94-11102c856330> in <module>()
      3 pplot3 = pypath.drawing.Plot(connector_subgraph, 'prostate_cancer_pathways.pdf',
      4     vertex_label = 'label', title_text = 'Pathways connecting prostate cancer related proteins')
----> 5 pplot3.draw()
      6 pplot3._has_graph()

/home/denes/Dokumentumok/pw/dev/src/pypath/drawing.pyc in draw(self, return_data, **kwargs)
    301                     vertex_label_dist = self.vertex_label_dist,
    302                     **self.kwargs))
--> 303         self.plots[-1].redraw()
    304         if self.title_text is not None:
    305             self.make_title()

/usr/lib/python2.7/site-packages/igraph/drawing/__init__.pyc in redraw(self, context)
    267                 else:
    268                     ctx.save()
--> 269                 plotter(ctx, bbox, palette, *args, **kwds)
    270                 if opacity < 1.0:
    271                     ctx.pop_group_to_source()

/usr/lib/python2.7/site-packages/igraph/__init__.pyc in __plot__(self, context, bbox, palette, *args, **kwds)
   3150             del kwds["drawer_factory"]
   3151         drawer = drawer_factory(context, bbox)
-> 3152         drawer.draw(self, palette, *args, **kwds)
   3153 
   3154     def __str__(self):

/home/denes/Dokumentumok/pw/dev/src/pypath/ig_drawing.pyc in draw(self, graph, palette, *args, **kwds)
    222 
    223         # Construct the vertex, edge and label drawers
--> 224         vertex_drawer = self.vertex_drawer_factory(context, bbox, palette, layout)
    225         edge_drawer = self.edge_drawer_factory(context, palette)
    226         label_drawer = self.label_drawer_factory(context)

/home/denes/Dokumentumok/pw/dev/src/pypath/ig_drawing_vertex.py in __init__(self, context, bbox, palette, layout)
     65 
     66     def __init__(self, context, bbox, palette, layout):
---> 67         super(DefaultVertexDrawer, self).__init__(self, context, bbox, palette, layout)
     68         self.VisualVertexBuilder = self._construct_visual_vertex_builder()
     69 

TypeError: unbound method __init__() must be called with AbstractCairoVertexDrawer instance as first argument (got DefaultVertexDrawer instance instead)
In [63]:
sns.plt.hist(connector_subgraph.degree(), bins=100)
Out[63]:
(array([ 55.,  88.,  31.,  45.,  15.,  27.,  12.,  12.,  10.,   9.,   5.,
          8.,   8.,   1.,   2.,   1.,   1.,   0.,   3.,   2.,   0.,   0.,
          0.,   1.,   1.,   2.,   0.,   1.,   1.,   3.,   0.,   2.,   0.,
          0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,   0.,
          1.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
          0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
          1.]),
 array([   0.  ,    1.54,    3.08,    4.62,    6.16,    7.7 ,    9.24,
          10.78,   12.32,   13.86,   15.4 ,   16.94,   18.48,   20.02,
          21.56,   23.1 ,   24.64,   26.18,   27.72,   29.26,   30.8 ,
          32.34,   33.88,   35.42,   36.96,   38.5 ,   40.04,   41.58,
          43.12,   44.66,   46.2 ,   47.74,   49.28,   50.82,   52.36,
          53.9 ,   55.44,   56.98,   58.52,   60.06,   61.6 ,   63.14,
          64.68,   66.22,   67.76,   69.3 ,   70.84,   72.38,   73.92,
          75.46,   77.  ,   78.54,   80.08,   81.62,   83.16,   84.7 ,
          86.24,   87.78,   89.32,   90.86,   92.4 ,   93.94,   95.48,
          97.02,   98.56,  100.1 ,  101.64,  103.18,  104.72,  106.26,
         107.8 ,  109.34,  110.88,  112.42,  113.96,  115.5 ,  117.04,
         118.58,  120.12,  121.66,  123.2 ,  124.74,  126.28,  127.82,
         129.36,  130.9 ,  132.44,  133.98,  135.52,  137.06,  138.6 ,
         140.14,  141.68,  143.22,  144.76,  146.3 ,  147.84,  149.38,
         150.92,  152.46,  154.  ]),
 <a list of 100 Patch objects>)
In [ ]: