Enrichment Analysis Visualizer¶

This appyter creates a variety of visualizations for enrichment analysis results for one selected Enrichr library, and may be run as either a standalone appyter from the Appyter Catalog or programmatically from the Enrichr results page.

For simplicity, the only inputs for this appyter are a gene list and one library. Other parameters are set to default values in the cell below. You can download the notebook, change these parameters, and rerun it if you wish.

The pre-processed libraries used to create the scatter plot and hexagonal canvas visualizations can be found here.

A link to the full analysis results on the Enrichr website can be found at the bottom of this page.

In [1]:
# Scatter Plot Imports
from maayanlab_bioinformatics.enrichment import enrich_crisp
import matplotlib as mpl
import matplotlib.colors as colors
import base64

# Bar Chart Imports
import pandas as pd 
import numpy as np
import json
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import time
from matplotlib.ticker import MaxNLocator
from IPython.display import display, FileLink, Markdown, HTML

# Hexagonal Canvas Imports
import json
import math
import uuid
import urllib
from textwrap import dedent
from string import Template
from operator import itemgetter

# Manhattan Plot Imports
import matplotlib.patches as mpatches
import matplotlib.cm as cm

# Bokeh
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import HoverTool, CustomJS, ColumnDataSource, Span
from bokeh.layouts import layout, row, column, gridplot
from bokeh.palettes import all_palettes
output_notebook()
Loading BokehJS ...
In [2]:
gene_list_input = '''NSUN3
POLRMT
NLRX1
SFXN5
ZC3H12C
SLC25A39
ARSG
DEFB29
NDUFB6
ZFAND1
TMEM77
5730403B10RIK
RP23-195K8.6
TLCD1
PSMC6
SLC30A6
LOC100047292
LRRC40
ORC5L
MPP7
UNC119B
PRKACA
TCN2
PSMC3IP
PCMTD2
ACAA1A
LRRC1
2810432D09RIK
SEPHS2
SAC3D1
TMLHE
LOC623451
TSR2
PLEKHA7
GYS2
ARHGEF12
HIBCH
LYRM2
ZBTB44
ENTPD5
RAB11FIP2
LIPT1
INTU
ANXA13
KLF12
SAT2
GAL3ST2
VAMP8
FKBPL
AQP11
TRAP1
PMPCB
TM7SF3
RBM39
BRI3
KDR
ZFP748
NAP1L1
DHRS1
LRRC56
WDR20A
STXBP2
KLF1
UFC1
CCDC16
9230114K14RIK
RWDD3
2610528K11RIK
ACO1
CABLES1
LOC100047214
YARS2
LYPLA1
KALRN
GYK
ZFP787
ZFP655
RABEPK
ZFP650
4732466D17RIK
EXOSC4
WDR42A
GPHN
2610528J11RIK
1110003E01RIK
MDH1
1200014M14RIK
AW209491
MUT
1700123L14RIK
2610036D13RIK
COX15
TMEM30A
NSMCE4A
TM2D2
RHBDD3
ATXN2
NFS1
3110001I20RIK
BC038156
LOC100047782
2410012H22RIK
RILP
A230062G08RIK
PTTG1IP
RAB1
AFAP1L1
LYRM5
2310026E23RIK
C330002I19RIK
ZFYVE20
POLI
TOMM70A
SLC7A6OS
MAT2B
4932438A13RIK
LRRC8A
SMO
NUPL2
TRPC2
ARSK
D630023B12RIK
MTFR1
5730414N17RIK
SCP2
ZRSR1
NOL7
C330018D20RIK
IFT122
LOC100046168
D730039F16RIK
SCYL1
1700023B02RIK
1700034H14RIK
FBXO8
PAIP1
TMEM186
ATPAF1
LOC100046254
LOC100047604
COQ10A
FN3K
SIPA1L1
SLC25A16
SLC25A40
RPS6KA5
TRIM37
LRRC61
ABHD3
GBE1
PARP16
HSD3B2
ESM1
DNAJC18
DOLPP1
LASS2
WDR34
RFESD
CACNB4
2310042D19RIK
SRR
BPNT1
6530415H11RIK
CLCC1
TFB1M
4632404H12RIK
D4BWG0951E
MED14
ADHFE1
THTPA
CAT
ELL3
AKR7A5
MTMR14
TIMM44
SF1
IPP
IAH1
TRIM23
WDR89
GSTZ1
CRADD
2510006D16RIK
FBXL6
LOC100044400
ZFP106
CD55
0610013E23RIK
AFMID
TMEM86A
ALDH6A1
DALRD3
SMYD4
NME7
FARS2
TASP1
CLDN10
A930005H10RIK
SLC9A6
ADK
RBKS
2210016F16RIK
VWCE
4732435N03RIK
ZFP11
VLDLR
9630013D21RIK
4933407N01RIK
FAHD1
MIPOL1
1810019D21RIK
1810049H13RIK
TFAM
PAICS
1110032A03RIK
LOC100044139
DNAJC19
BC016495
A930041I02RIK
RQCD1
USP34
ZCCHC3
H2AFJ
PHF7
4921508D12RIK
KMO
PRPF18
MCAT
TXNDC4
4921530L18RIK
VPS13B
SCRN3
TOR1A
AI316807
ACBD4
FAH
APOOL
COL4A4
LRRC19
GNMT
NR3C1
SIP1
ASCC1
FECH
ABHD14A
ARHGAP18
2700046G09RIK
YME1L1
GK5
GLO1
SBK1
CISD1
2210011C24RIK
NXT2
NOTUM
ANKRD42
UBE2E1
NDUFV1
SLC33A1
CEP68
RPS6KB1
HYI
ALDH1A3
MYNN
3110048L19RIK
RDH14
PROZ
GORASP1
LOC674449
ZFP775
5430437P03RIK
NPY
ADH5
SYBL1
4930432O21RIK
NAT9
LOC100048387
METTL8
ENY2
2410018G20RIK
PGM2
FGFR4
MOBKL2B
ATAD3A
4932432K03RIK
DHTKD1
UBOX5
A530050D06RIK
ZDHHC5
MGAT1
NUDT6
TPMT
WBSCR18
LOC100041586
CDK5RAP1
4833426J09RIK
MYO6
CPT1A
GADD45GIP1
TMBIM4
2010309E21RIK
ASB9
2610019F03RIK
7530414M10RIK
ATP6V1B2
2310068J16RIK
DDT
KLHDC4
HPN
LIFR
OVOL1
NUDT12
CDAN1
FBXO9
FBXL3
HOXA7
ALDH8A1
3110057O12RIK
ABHD11
PSMB1
ENSMUSG00000074286
CHPT1
OXSM
2310009A05RIK
1700001L05RIK
ZFP148
39509
MRPL9
TMEM80
9030420J04RIK
NAGLU
PLSCR2
AGBL3
PEX1
CNO
NEO1
ASF1A
TNFSF5IP1
PKIG
AI931714
D130020L05RIK
CNTD1
CLEC2H
ZKSCAN1
1810044D09RIK
METTL7A
SIAE
FBXO3
FZD5
TMEM166
TMED4
GPR155
RNF167
SPTLC1
RIOK2
TGDS
PMS1
PITPNC1
PCSK7
4933403G14RIK
EI24
CREBL2
TLN1
MRPL35
2700038C09RIK
UBIE
OSGEPL1
2410166I05RIK
WDR24
AP4S1
LRRC44
B3BP
ITFG1
DMXL1
C1D'''
enrichr_library = 'Aging_Perturbations_from_GEO_down'
In [3]:
genes = gene_list_input.split('\n')
genes = [x.strip() for x in genes]
In [4]:
# Enrichr API Function for Manhattan Plot and Bar Chart
# Takes a gene list and Enrichr libraries as input
def Enrichr_API(enrichr_gene_list, all_libraries):

    all_terms = []
    all_pvalues =[] 
    all_adjusted_pvalues = []

    for library_name in all_libraries : 
        ENRICHR_URL = 'http://maayanlab.cloud/Enrichr/addList'
        genes_str = '\n'.join(enrichr_gene_list)
        description = ''
        payload = {
            'list': (None, genes_str),
            'description': (None, description)
        }

        response = requests.post(ENRICHR_URL, files=payload)
        if not response.ok:
            raise Exception('Error analyzing gene list')

        data = json.loads(response.text)
        time.sleep(0.5)
        ENRICHR_URL = 'http://maayanlab.cloud/Enrichr/enrich'
        query_string = '?userListId=%s&backgroundType=%s'
        user_list_id = data['userListId']
        short_id = data["shortId"]
        gene_set_library = library_name
        response = requests.get(
            ENRICHR_URL + query_string % (user_list_id, gene_set_library)
         )
        if not response.ok:
            raise Exception('Error fetching enrichment results')

        data = json.loads(response.text)

        short_results_df  = pd.DataFrame(data[library_name][0:10])
        all_terms.append(list(short_results_df[1]))
        all_pvalues.append(list(short_results_df[2]))
        all_adjusted_pvalues.append(list(short_results_df[6]))
        
        results_df  = pd.DataFrame(data[library_name])
        # adds library name to the data frame so the libraries can be distinguished
        results_df['library'] = library_name.replace('_', '')

    return [results_df, short_results_df, all_terms, all_pvalues, all_adjusted_pvalues, str(short_id)]
In [5]:
# Scatter Plot Parameters
significance_value = 0.05

# Bar Chart Parameters
figure_file_format = ['png', 'svg']
output_file_name = 'Enrichr_results_bar'
color = 'lightskyblue'
final_output_file_names = ['{0}.{1}'.format(output_file_name, file_type) for file_type in figure_file_format]

# Hexagonal Canvas Parameters
canvas_color = 'Blue'
num_hex_colored = 10

# Manhattan Plot Parameters
manhattan_colors = ['#003f5c', '#7a5195', '#ef5675', '#ffa600']

Scatter Plot¶

The scatterplot is organized so that simliar gene sets are clustered together. The larger blue points represent significantly enriched terms - the darker the blue, the more significant the term and the smaller the p-value. The gray points are not significant.

Hovering over points will display the associated gene set name and the p-value. You may have to zoom in using the toolbar next to the plot in order to see details in densely-populated portions. Plots can also be downloaded as an svg using the save function on the toolbar.

For creating and comparing up to 9 scatter plots at once, use the standalone Scatter Plot Visualization Appyter.

In [6]:
# Scatter Plot Functions

def download_library(library_name):
    # Download pre-processed library data
    try:
        df = pd.read_csv('https://raw.githubusercontent.com/MaayanLab/Enrichr-Viz-Appyter/master/Enrichr-Processed-Library-Storage/Scatterplot/Libraries/' + library_name + '.csv')
    except:
        display(Markdown("Failed to retrieve the selected pre-processed library."))
        return -1, -1, -1

    name = df['Name'].tolist()
    gene_list = df['Genes'].tolist()
    library_data = [list(a) for a in zip(name, gene_list)]
    return genes, library_data, df

# Enrichment analysis
def get_library_iter(library_data):
    for member in library_data:
        term = member[0]
        try:
            gene_set = member[1].split(' ')
        except:
            continue
        yield term, gene_set

def get_enrichment_results(genes, library_data):
    return sorted(enrich_crisp(genes, get_library_iter(library_data), 20000, True), key=lambda r: r[1].pvalue)

def get_pvalue(row, unzipped_results, all_results):
    if row['Name'] in list(unzipped_results[0]):
        index = list(unzipped_results[0]).index(row['Name'])
        return all_results[index][1].pvalue
    else:
        return 1
    
# Call enrichment results and return a plot and dataframe for Scatter Plot
def get_plot(library_name):
    genes, library_data, df = download_library(library_name)

    # library not supported
    if genes == -1:
        return -1 ,-1

    all_results = get_enrichment_results(genes, library_data)
    unzipped_results = list(zip(*all_results))

    if len(all_results) == 0:
        print("There are no enriched terms for your input gene set in the ", library_name, " library.")
        my_colors = ['#808080'] * len(df.index)

        source = ColumnDataSource(
            data=dict(
                x = df['x'],
                y = df['y'],
                gene_set = df['Name'],
                colors = my_colors,
                sizes = [6] * len(df.index)
            )
        )

        hover_emb = HoverTool(names=["df"], tooltips="""
            <div style="margin: 10">
                <div style="margin: 0 auto; width:200px;">
                    <span style="font-size: 12px; font-weight: bold;">Gene Set:</span>
                    <span style="font-size: 12px">@gene_set</span>
                </div>
            </div>
            """)
    else:
        # add p value to the dataframe
        df['p value'] = df.apply (lambda row: get_pvalue(row, unzipped_results, all_results), axis=1)

        # normalize p values for color scaling
        cmap = mpl.cm.get_cmap('Blues_r')
        norm = colors.Normalize(vmin = df['p value'].min(), vmax=significance_value*2)

        my_colors = []
        my_sizes = []
        for index, row in df.iterrows():
            if row['p value'] < significance_value:
                my_colors += [mpl.colors.to_hex(cmap(norm(row['p value'])))]
                my_sizes += [12]
            else:
                my_colors += ['#808080']
                my_sizes += [6]

        source = ColumnDataSource(
                data=dict(
                    x = df['x'],
                    y = df['y'],
                    gene_set = df['Name'],
                    p_value = df['p value'],
                    colors = my_colors,
                    sizes = my_sizes
                )
            )

        hover_emb = HoverTool(names=["df"], tooltips="""
            <div style="margin: 10">
                <div style="margin: 0 auto; width:200px;">
                    <span style="font-size: 12px; font-weight: bold;">Gene Set:</span>
                    <span style="font-size: 12px">@gene_set</span>
                    <span style="font-size: 12px; font-weight: bold;">p-value:</span>
                    <span style="font-size: 12px">@p_value</span>
                </div>
            </div>
            """)

    tools_emb = [hover_emb, 'pan', 'wheel_zoom', 'reset', 'save']

    plot_emb = figure(plot_width=700, plot_height=700, tools=tools_emb)

    # hide axis labels and grid lines
    plot_emb.xaxis.major_tick_line_color = None
    plot_emb.xaxis.minor_tick_line_color = None
    plot_emb.yaxis.major_tick_line_color = None
    plot_emb.yaxis.minor_tick_line_color = None
    plot_emb.xaxis.major_label_text_font_size = '0pt'
    plot_emb.yaxis.major_label_text_font_size = '0pt' 

    plot_emb.circle('x', 'y', size = 'sizes', alpha = 0.7, line_alpha = 0, 
                    line_width = 0.01, source = source, fill_color = 'colors', name = "df")

    plot_emb.output_backend = "svg"
    
    return plot_emb, df    
In [7]:
# Display Scatter Plot
plot, df = get_plot(enrichr_library)
if plot == -1:
    display(Markdown("Unable to create scatter plot visualization."))
else:
    show(plot)

Bar Chart¶

The bar chart shows the top 10 enriched terms in the chosen library, along with their corresponding p-values. Colored bars correspond to terms with significant p-values (<0.05). An asterisk (*) next to a p-value indicates the term also has a significant adjusted p-value (<0.05).

The bar chart can be downloaded as an image using the links below the figure.

For creating customized bar charts for multiple libraries at once, use the standalone Bar Chart Appyter.

In [8]:
# Bar Chart Functions
# Takes all terms, all p-values, all adjusted p-values, plot title, Enrichr libraries, and specified figure format
def enrichr_figure(all_terms, all_pvalues, all_adjusted_pvalues, plot_names, all_libraries, bar_color): 
    # Bar colors
    if bar_color != 'lightgrey':
        bar_color_not_sig = 'lightgrey'
        edgecolor=None
        linewidth=0
    else:
        bar_color_not_sig = 'white'
        edgecolor='black'
        linewidth=1    

    plt.figure(figsize=(24, 12))
    
    i = 0
    bar_colors = [bar_color if (x < 0.05) else bar_color_not_sig for x in all_pvalues[i]]
    fig = sns.barplot(x=np.log10(all_pvalues[i])*-1, y=all_terms[i], palette=bar_colors, edgecolor=edgecolor, linewidth=linewidth)
    fig.axes.get_yaxis().set_visible(False)
    fig.set_title(all_libraries[i].replace('_', ' '), fontsize=26)
    fig.set_xlabel('−log₁₀(p‐value)', fontsize=25)
    fig.xaxis.set_major_locator(MaxNLocator(integer=True))
    fig.tick_params(axis='x', which='major', labelsize=20)
    if max(np.log10(all_pvalues[i])*-1)<1:
        fig.xaxis.set_ticks(np.arange(0, max(np.log10(all_pvalues[i])*-1), 0.1))
    for ii,annot in enumerate(all_terms[i]):
        if all_adjusted_pvalues[i][ii] < 0.05:
            annot = '  *'.join([annot, str(str(np.format_float_scientific(all_pvalues[i][ii], precision=2)))]) 
        else:
            annot = '  '.join([annot, str(str(np.format_float_scientific(all_pvalues[i][ii], precision=2)))])

        title_start= max(fig.axes.get_xlim())/200
        fig.text(title_start, ii, annot, ha='left', wrap = True, fontsize = 26)

    fig.spines['right'].set_visible(False)
    fig.spines['top'].set_visible(False)

    for plot_name in plot_names:
        plt.savefig(plot_name, bbox_inches = 'tight')
    
    # Show plot 
    plt.show()  
In [9]:
# Display Bar Chart
results = Enrichr_API(genes, [enrichr_library])
enrichr_figure(results[2], results[3], results[4], final_output_file_names, [enrichr_library], color)
# Download Bar Chart
for i, file in enumerate(final_output_file_names):
    display(FileLink(file, result_html_prefix=str('Download ' + figure_file_format[i] + ': ')))
Download png: Enrichr_results_bar.png
Download svg: Enrichr_results_bar.svg

Hexagonal Canvas¶

Each hexagon in the hexagonal canvas plot represents one gene set from the selected library. The hexagons are colored based on the Jaccard similarity index between the input gene list and the gene set represented by the hexagon, with brighter color indicating higher similarity. Hexagons that are grouped together represent similar gene sets.

Hovering over a hexagon will display the name of the gene set and the associated similarity index.

For creating customized hexagonal canvas plots for up to two libraries at once, use the standalone Hexagonal Canvas Appyter.

In [10]:
# Hexagonal Canvas Functions

def library_processing():
    # Downloads library data for the hexagonal canvas
    # Library data is pre-annealed so the canvas will have the most similar gene sets closest together
    raw_library_data = []

    try:
        library_name = enrichr_library
        with urllib.request.urlopen('https://raw.githubusercontent.com/MaayanLab/Enrichr-Viz-Appyter/master/Enrichr-Processed-Library-Storage/Annealing/Annealed-Libraries/' + library_name + '.txt') as f:
            for line in f.readlines():
                raw_library_data.append(line.decode('utf-8').split("\t\t"))
        name = []
        gene_list = []
    except:
        display(Markdown("Failed to retrieve the selected annealed library."))
        return [], -1, -1

    for i in range(len(raw_library_data)):
        name += [raw_library_data[i][0]]
        raw_genes = raw_library_data[i][1].split('\t')
        gene_list += [raw_genes[:-1]]

    library_data = [list(a) for a in zip(name, gene_list)]

    # raw_library_data: a 2D list where the first element is the name and the second element is a list of genes associated with that name

    jaccard_indices = []
    indices = []

    for gene_set in library_data:
        intersection = [value for value in gene_set[1] if value in genes]
        index = len(intersection)/(len(gene_set[1]) + len(genes))
        jaccard_indices += [[gene_set[0], index]]
        indices += [round(index, 5)]

    # determine the dimensions of the canvas
    x_dimension = math.ceil(math.sqrt(len(indices)))
    y_dimension = math.ceil(math.sqrt(len(indices)))

    # zip name, gene_list, indices, and blank list for neighbor score then add dummy entries to the zipped list
    anneal_list = list(zip(name, gene_list, indices))

    return anneal_list, x_dimension, y_dimension

def unzip_list(anneal_list):
    unzipped_list = zip(*anneal_list)
    return list(unzipped_list)

# define a list of colors for the hexagonal canvas
def get_color(anneal_list, cut_off_value, x_dimension, y_dimension):

    # Deal with cut_off_value (only color the most significant 10/20 hexagons)
    if cut_off_value == 2.0:
        sort_list = sorted(anneal_list, key=itemgetter(2), reverse=True)
        cut_off_value = sort_list[int(num_hex_colored)-1][2]

    r_value = 0
    g_value = 0
    b_value = 0

    if canvas_color == 'Red':
        r_value = 0.0
        g_value = 0.8
        b_value = 0.8
    if canvas_color == 'Yellow':
        r_value = 0.0
        g_value = 0.3
        b_value = 1.0
    if canvas_color == 'Purple':
        r_value = 0.5
        g_value = 1.0
        b_value = 0.0
    if canvas_color == 'Pink':
        r_value = 0.0
        g_value = 1.0
        b_value = 0.2
    if canvas_color == 'Orange':
        r_value = 0.0
        g_value = 0.45
        b_value = 1.0
    if canvas_color == 'Green':
        r_value = 1.0
        g_value = 0.0
        b_value = 1.0
    if canvas_color == 'Blue':
        r_value = 1.0
        g_value = 0.9
        b_value = 0.0

    color_list = []

    unzipped_anneal_list = unzip_list(anneal_list)

    max_index = max(unzipped_anneal_list[2])

    if max_index != 0:
        scaled_list = [i/max_index for i in unzipped_anneal_list[2]]
    else:
        scaled_list = unzipped_anneal_list[2]

    for i in range(x_dimension*y_dimension):
        if i < len(unzipped_anneal_list[2]) and float(unzipped_anneal_list[2][i]) >= cut_off_value:
            color_list += [mpl.colors.to_hex((1-scaled_list[i]*r_value, 
            1-scaled_list[i]*g_value, 1-scaled_list[i]*b_value))]
        elif i < len(unzipped_anneal_list[2]):
            color_list += [mpl.colors.to_hex((1-scaled_list[i], 
            1-scaled_list[i], 1-scaled_list[i]))]
        else:
            color_list += ["#FFFFFF"]
    return color_list, max_index, cut_off_value

def init_chart():
  chart_id = 'mychart-' + str(uuid.uuid4())
  display(HTML('<script src="/static/components/requirejs/require.js"></script>'))
  display(HTML(Template(dedent('''
  <script>
  require.config({
    paths: {
      'd3': 'https://cdnjs.cloudflare.com/ajax/libs/d3/5.16.0/d3.min',
      'd3-hexbin': 'https://d3js.org/d3-hexbin.v0.2.min',
    },
    shim: {
      'd3-hexbin': ['d3']
    }
  })

  // If we configure mychart via url, we can eliminate this define here
  define($chart_id, ['d3', 'd3-hexbin'], function(d3, d3_hexbin) {
    return function (figure_id, numA, numB, colorList, libraryList, indices) {
      var margin = {top: 50, right: 20, bottom: 20, left: 50},
        width = 850 - margin.left - margin.right,
        height = 350 - margin.top - margin.bottom;

      // append the svg object to the body of the page
      var svG = d3.select('#' + figure_id)
                  .attr("width", width + margin.left + margin.right)
                  .attr("height", height + margin.top + margin.bottom)
                  .append("g")
                  .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
      
      //The number of columns and rows of the heatmap
      var MapColumns = numA,
          MapRows = numB;

      //The maximum radius the hexagons can have to still fit the screen
      var hexRadius = d3.min([width/((MapColumns + 0.5) * Math.sqrt(3)), height/((MapRows + 1/3) * 1.5)]);

      //Calculate the center position of each hexagon
      var points = [];
      for (var i = 0; i < MapRows; i++) {
          for (var j = 0; j < MapColumns; j++) {
              var x = hexRadius * j * Math.sqrt(3)
              //Offset each uneven row by half of a "hex-width" to the right
              if(i%2 === 1) x += (hexRadius * Math.sqrt(3))/2
              var y = hexRadius * i * 1.5
              points.push([x,y])
          }
      }

      //Set the hexagon radius
      var hexbin = d3_hexbin.hexbin().radius(hexRadius);

      svG.append("g")
        .selectAll(".hexagon")
        .data(hexbin(points))
        .enter().append("path")
        .attr("class", "hexagon")
        .attr("d", function (d) {
            return "M" + d.x + "," + d.y + hexbin.hexagon();
        })
        .attr("stroke", "black")
        .attr("stroke-width", "1px")
        .style("fill", function (d,i) { return colorList[i]; })
        .on("mouseover", mover)
        .on("mouseout", mout)
        .append("svg:title")
        .text(function(d,i) { return libraryList[i].concat(" ").concat(indices[i]); });

      // Mouseover function
      function mover(d) {
      d3.select(this)
        .transition().duration(10)  
        .style("fill-opacity", 0.3)
      };

      // Mouseout function
      function mout(d) { 
      d3.select(this)
        .transition().duration(10)
        .style("fill-opacity", 1)
      };

  }

  })
  </script>
  ''')).substitute({ 'chart_id': repr(chart_id) })))
  return chart_id

def Canvas(numA, numB, colorList, libraryList, indices):
  chart_id = init_chart()
  display(HTML(Template(dedent('''
  <svg id=$figure_id></svg>
  <script>
  require([$chart_id], function(mychart) {
    mychart($figure_id, $numA, $numB, $colorList, $libraryList, $indices)
  })
  </script>
  ''')).substitute({
      'chart_id': repr(chart_id),
      'figure_id': repr('fig-' + str(uuid.uuid4())),
      'numA': repr(numA),
      'numB': repr(numB),
      'colorList': repr(colorList),
      'libraryList': repr(libraryList),
      'indices': repr(indices)
  })))
In [11]:
# Display Hexagonal Canvas
anneal_list, x_dimension, y_dimension = library_processing()
if x_dimension < 0:
    display(Markdown("Unable to create hexagonal canvas visualization."))
else:
    color_list, scaling_factor, cut_off_value = get_color(anneal_list, 2.0, x_dimension, y_dimension)
    unzipped_anneal_list = unzip_list(anneal_list)
    Canvas(x_dimension, y_dimension, color_list, list(unzipped_anneal_list[0]), list(unzipped_anneal_list[2]))

Manhattan Plot¶

In the Manhattan plot below, each line on the x-axis denotes a single gene set from the selected library, while the y-axis measures the −log₁₀(p‐value) for each gene set.

Hovering over a point will display the name of the gene set and the associated p-value. You can also zoom, pan, and save the plot as an svg using the toolbar on the right.

For creating customized static and dynamic Manhattan plots to compare multiple libraries at once, use the standalone Manhattan Plot Appyter.

In [12]:
# Manhattan Plot Functions

# Processes Enrichr data for Manhattan plots
def get_data(genes):
    # Process Enrichr data
    sorted_data = pd.DataFrame({"Gene Set": [], "-log(p value)": [], "Library": []})

    # get enrichr results from the library selected
    results_df = Enrichr_API(genes, [enrichr_library])[0]

    all_terms = []
    all_pvalues = []
    library_names = []

    all_terms.append(list(results_df[1]))
    all_pvalues.append(list(results_df[2]))
    library_names.append(list(results_df['library']))

    x=np.log10(all_pvalues[0])*-1
    sorted_terms = list(zip(all_terms[0], x, library_names[0]))
    sorted_terms = sorted(sorted_terms, key = itemgetter(0))
    unzipped_sorted_list = list(zip(*sorted_terms))

    data = pd.DataFrame({"Gene Set": unzipped_sorted_list[0], "-log(p value)": unzipped_sorted_list[1], "Library": unzipped_sorted_list[2]})

    sorted_data = pd.concat([sorted_data, data])

    # group data by library
    groups = sorted_data.groupby("Library")
    return sorted_data, groups

# Create Manhattan Plots
def manhattan(sorted_data):
    # split data frame into smaller data frames by library
    list_of_df = []
    for library_name in [enrichr_library]:
        library_name = library_name.replace('_', '')
        df_new = sorted_data[sorted_data['Library'] == library_name]
        list_of_df += [df_new]

    list_of_xaxis_values = []
    for df in list_of_df:  
        list_of_xaxis_values += df["Gene Set"].values.tolist()

    # define the output figure and the features we want
    p = figure(x_range = list_of_xaxis_values, plot_height=300, plot_width=750, tools='pan, box_zoom, hover, reset, save')

    # loop over all libraries
    r = []
    color_index = 0
    for df in list_of_df:
        if color_index >= len(manhattan_colors):
            color_index = 0 

        # calculate actual p value from -log(p value)
        actual_pvalues = []
        for log_value in df["-log(p value)"].values.tolist():
            actual_pvalues += ["{:.5e}".format(10**(-1*log_value))]

        # define ColumnDataSource with our data for this library
        source = ColumnDataSource(data=dict(
            x = df["Gene Set"].values.tolist(),
            y = df["-log(p value)"].values.tolist(),
            pvalue = actual_pvalues,
        ))
    
        # plot data from this library
        r += [p.circle(x = 'x', y = 'y', size=5, fill_color=manhattan_colors[color_index], line_color = manhattan_colors[color_index], line_width=1, source = source)]
        color_index += 1

    p.background_fill_color = 'white'
    p.xaxis.major_tick_line_color = None 
    p.xaxis.major_label_text_font_size = '0pt'
    p.y_range.start = 0
    p.yaxis.axis_label = '-log(p value)'

    p.hover.tooltips = [
        ("Gene Set", "@x"),
        ("p value", "@pvalue"),
    ]
    p.output_backend = "svg"
    
    # returns the plot
    return p
In [13]:
# Display Manhattan Plot
sorted_data, groups = get_data(genes)
show(manhattan(sorted_data))

Volcano Plot¶

The volcano plot shows the significance of each gene set from the selected library versus its odds ratio. Each point represents a single geneset; the x-axis measures the odds ratio (0, inf) calculated for the gene set, while the y-axis gives the -log(p-value) of the gene set.

Larger blue points represent significant terms (p-value < 0.05); smaller gray points represent non-significant terms. The darker the blue color of a point, the more significant it is.

Hovering over points will display the corresponding gene set term, the p-value, and the odds ratio. You may have to zoom in using the toolbar next to the plot in order to see details in densely-populated portions. Plots can also be downloaded as an svg using the save function on the toolbar.

In [14]:
def get_library(lib_name):
    '''
    Returns a dictionary mapping each term from the input library to 
    its associated geneset. 
    '''
    raw_lib_data = []

    with urllib.request.urlopen('https://maayanlab.cloud/Enrichr/geneSetLibrary?mode=text&libraryName=' + lib_name) as f:
        for line in f.readlines():
            raw_lib_data.append(line.decode("utf-8").split("\t\t"))

    name = []
    gene_list = []
    lib_data = {}

    for i in range(len(raw_lib_data)):
        name += [raw_lib_data[i][0]]
        raw_genes = raw_lib_data[i][1].replace('\t', ' ')
        gene_list += [raw_genes[:-1]]
    
    lib_data = {a[0]:a[1].split(' ') for a in zip(name, gene_list)}
    return lib_data


def volcano_plot(library_name, lib):
    '''
    Make volcano plot of odds ratio vs. significance for input library.
    '''
    enrich_results = enrich_crisp(genes, lib, 21000, True)

    res_df = pd.DataFrame(
        [ [
            term, 
            res.pvalue, 
            res.odds_ratio
        ] for (term, res) in enrich_results ], 
        columns=['term', 'pvalue', 'odds_ratio']
    )

    res_df['log_pval'] = np.negative(np.log10(res_df['pvalue']))

    cmap = mpl.cm.get_cmap('Blues_r')
    cnorm = colors.Normalize(vmin = res_df['pvalue'].min(), vmax = 0.1)

    my_colors = []
    my_sizes = []
    for row in res_df.itertuples():
        if row.pvalue < 0.05:
            my_colors += [mpl.colors.to_hex(cmap(cnorm(row.pvalue)))]
            my_sizes += [12]
        else:
            my_colors += ['#808080']
            my_sizes += [6]

    source = ColumnDataSource(
        data=dict(
            x = res_df['odds_ratio'],
            y = res_df['log_pval'],
            gene_set = res_df['term'],
            p_value = res_df['pvalue'],
            odds_r = res_df['odds_ratio'],
            colors = my_colors,
            sizes = my_sizes
        )
    )

    hover_emb = HoverTool(
        names=["res_df"], 
        tooltips="""
        <div style="margin: 10">
            <div style="margin: 0 auto; width:200px;">
                <span style="font-size: 12px; font-weight: bold;">Term:</span>
                <span style="font-size: 12px">@gene_set<br></span>
                <span style="font-size: 12px; font-weight: bold;">P-Value:</span>
                <span style="font-size: 12px">@p_value<br></span>
                <span style="font-size: 12px; font-weight: bold;">Odds Ratio:</span>
                <span style="font-size: 12px">@odds_r<br></span>
            </div>
        </div>
        """
    )

    tools_emb = [hover_emb, 'pan', 'wheel_zoom', 'reset', 'save']

    plot_emb = figure(
        plot_width = 700, 
        plot_height = 700,
        tools=tools_emb
    )

    plot_emb.circle(
        'x', 'y', size = 'sizes', 
        alpha = 0.7, line_alpha = 0, 
        line_width = 0.01, source = source, 
        fill_color = 'colors', name = "res_df"
    )

    plot_emb.xaxis.axis_label = "Odds Ratio"
    plot_emb.yaxis.axis_label = "-log10(p-value)"

    plot_emb.output_backend = "svg"
    
    return plot_emb
In [15]:
lib_data = get_library(enrichr_library)
if lib_data == {}:
    display(Markdown('Failed to access library, please try again later.'))
else:
    plot = volcano_plot(enrichr_library, lib_data)
    show(plot)

Table of significant p-values¶

A downloadable table displaying the names, p-values, and q-values of significant terms in the selected library.

In [16]:
# Output a table of significant p-values and q-values
def get_qvalues(df):
    qvals = []
    res_df = pd.DataFrame(results[0]).set_index(1)
    for name in df['Name'].to_list():
        qvals.append(res_df.loc[name][6])
    return qvals

def create_download_link(df, title = "Download CSV file of this table", filename = "data.csv"):  
    csv = df.to_csv(index = False)
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload, title=title, filename=filename)
    return HTML(html)

if plot != -1 and 'p value' in df.columns:
    sorted_df = df.sort_values(by = ['p value'])
    filtered_df = sorted_df[sorted_df['p value'] <= significance_value].reset_index()
    filtered_df['q value'] = get_qvalues(filtered_df)
    if len(filtered_df) != 0:
        display(HTML(f"<strong>Table of significant p-values for {enrichr_library.replace('_', ' ')}</strong>"))
        display(HTML(filtered_df[['Name', 'p value', 'q value']].to_html(index = False)))
        display(create_download_link(filtered_df[['Name', 'p value', 'q value']]))
Table of significant p-values for Aging Perturbations from GEO down
Name p value q value
Mouse brown fat 5 months vs 24 months GSE25325 aging:284 0.000050 0.011067
Rat liver 11 months vs 24 months GSE11097 aging:200 0.000088 0.011067
Mouse liver 10 months vs 22 months GSE3150 aging:326 0.000286 0.021188
Rat liver 11 months vs 18 months GSE11097 aging:199 0.000334 0.021188
Mouse brown fat 5 months vs 24 months GSE25325 aging:282 0.001681 0.078467
Rat liver 12 months vs 24 months GSE11097 aging:194 0.001850 0.078467
Mouse liver 6 months vs 26 months GSE20426 aging:378 0.002700 0.088974
Mouse kidney 6 months vs 14 months GSE15129 aging:319 0.002785 0.088974
Mouse liver 6 months vs 14 months GSE15129 aging:322 0.003612 0.102311
Rat liver 6 months vs 24 months GSE11097 aging:198 0.004336 0.106810
Mouse liver 5 months vs 24 months GSE25325 aging:288 0.004602 0.106810
Mouse kidney 25 weeks vs 100 weeks GSE41018 aging:399 0.006483 0.138199
Rat liver 6 months vs 18 months GSE11097 aging:197 0.007036 0.138535
Rat liver 4 months vs 24 months GSE11097 aging:191 0.008270 0.151024
Mouse liver 6 months vs 26 months GSE20425 aging:370 0.014051 0.240273
Mouse liver 6 months vs 26 months GSE20426 aging:376 0.016272 0.261096
Mouse kidney 6 months vs 14 months GSE15129 aging:318 0.019528 0.294696
Mouse liver 5 months vs 24 months GSE25325 aging:292 0.022467 0.321064
Mouse liver 6 months vs 22 months GSE3129 aging:333 0.035314 0.478661
Rat liver 4 months vs 12 months GSE11097 aging:192 0.039147 0.504413
Mouse liver 5 months vs 24 months GSE25325 aging:291 0.048567 0.596796
Download CSV file of this table

Link to Enrichr¶

In [17]:
# Get complete enrichment analysis results from Enrichr 
url = 'https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=' + results[5]
display(HTML(f'<span><a href="https://amp.pharm.mssm.edu/Enrichr/enrich?dataset={results[5]}">Access the complete enrichment analysis on the Enrichr website. </a></span>'))
Access the complete enrichment analysis on the Enrichr website.
In [ ]: