Date created: Apr 15, 2021
Last modified: Oct 15, 2021
Abstract
This example shows how to generate a Manhattan plot for a given human genome.
Manhattan plot is used to visualize large datasets, with large number of data points, and a significant distribution (variance) for some of these points (features). This is a primary visualization tool in genom-wide association studies (GWAS).
For a basic example of a Manhattan plot used in genomics reasearch see: https://en.wikipedia.org/wiki/Manhattan_plot .
The below example is heavely based on the Vignette (documentation, example) from the R Bioconducotr Sushi
package by Douglas H. Phanstiel, Alan P. Boyle, Carlos L. Araya, and Mike Snyder (https://www.bioconductor.org/packages/release/bioc/vignettes/Sushi/inst/doc/Sushi.pdf).
Sushi
packageIn oder to produce a Manhattan plot with R Bioconductor, one can use the Sushi
package (https://www.bioconductor.org/packages/release/bioc/vignettes/Sushi/inst/doc/Sushi.pdf):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Sushi")
## Bioconductor version 3.12 (BiocManager 1.30.16), R 4.0.3 (2020-10-10)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'Sushi'
library('Sushi')
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: biomaRt
hg18
chromosomesSushu_hg18_genome
stores the information on the length of human chromosomes as described by NCBI36/hg18 genome build (see: https://www.bioconductor.org/packages/release/bioc/manuals/Sushi/man/Sushi.pdf).
We can display the number of base pairs in each chromosome (in millions, i.e., 1*106):
data(Sushi_hg18_genome)
data.frame(Sushi_hg18_genome$V1, Sushi_hg18_genome$V2/1000000)
## Sushi_hg18_genome.V1 Sushi_hg18_genome.V2.1e.06
## 1 chr1 247.24972
## 2 chr10 135.37474
## 3 chr11 134.45238
## 4 chr12 132.34953
## 5 chr13 114.14298
## 6 chr14 106.36858
## 7 chr15 100.33892
## 8 chr16 88.82725
## 9 chr17 78.77474
## 10 chr18 76.11715
## 11 chr19 63.81165
## 12 chr2 242.95115
## 13 chr20 62.43596
## 14 chr21 46.94432
## 15 chr22 49.69143
## 16 chr3 199.50183
## 17 chr4 191.27306
## 18 chr5 180.85787
## 19 chr6 170.89999
## 20 chr7 158.82142
## 21 chr8 146.27483
## 22 chr9 140.27325
For the reference number of base pairs per chromosome see: https://www.ncbi.nlm.nih.gov/books/NBK22266/.
bed
dataBed data representing human genes with coordinates from NCBI36/hg18 genome build (see: https://www.bioconductor.org/packages/release/bioc/manuals/Sushi/man/Sushi.pdf).
The beginning of the GWAS bed data frame:
data(Sushi_GWAS.bed)
head(Sushi_GWAS.bed)
## chr.hg18 pos.hg18 pos.hg18.1 rsid pval.GC.DBP V6
## 1 chr1 1695996 1695996 rs6603811 0.003110 .
## 2 chr1 1696020 1696020 rs7531583 0.000824 .
## 3 chr1 1698661 1698661 rs12044597 0.001280 .
## 4 chr1 1711339 1711339 rs2272908 0.001510 .
## 5 chr1 1712792 1712792 rs3737628 0.001490 .
## 6 chr1 1736016 1736016 rs12408690 0.004000 .
Number of coordinates (number of rows):
nrow(Sushi_GWAS.bed)
## [1] 32760
Note the names of the columns:
names(Sushi_GWAS.bed)
## [1] "chr.hg18" "pos.hg18" "pos.hg18.1" "rsid" "pval.GC.DBP"
## [6] "V6"
A âgenomeâ parameter is required to set how many base pairs there are in each chromosome. The number of base pairs is reflected in the distance for each chromosome on the X axis.
Please note that p-values (pval.GC.DBP
) are stored in the 5th column of the data frame (i.e., Sushi_GWAS.bed[,5]
).
plotManhattan(bedfile=Sushi_GWAS.bed,
pvalues=Sushi_GWAS.bed[,5],
col=SushiColors(6),
genome=Sushi_hg18_genome,
cex=0.75)
labelgenome(genome=Sushi_hg18_genome,n=4,scale="Mb",
edgeblankfraction=0.20,cex.axis=.5)
axis(side=2,las=2,tcl=.2)
mtext("log10(P)",side=2,line=1.75,cex=1,font=2)
mtext("chromosome",side=1,line=1.75,cex=1,font=2)