Argument | Description |
---|---|
object | A profileplyr object |
group | How the ranges will be grouped. If this is a character string, then it must match a column name of the range metadata, and this column will be used for grouping of any exported deepTools matrix. If this is a GRanges, or GRangesList, then the ranges will be subset based on overlap with these GRanges. If this is a list, each element should contain ether 1) a character vector of genes, and ranges will be subset based on overlap with these genes, as determined by the annotations made by annotateRanges() or annotateRanges_great() functions, or 2) a data frame with the gene symbols as the rownames. Any additional columns of this dataframe will be added to the range metadata. |
GRanges_names | Relevant for ‘GRanges’ mode. These are the names that will be assigned to the ranges that overlap each GRanges object |
levels | This will set the levels of the grouping column set by ‘rowGroupsInUse’ (if the grouping column is not a factor, it will be converted to one). If levels are not provided, they will remain unchanged if the grouping column was already a factor, or will use default leveling (e.g. alphabetical) if grouping column is not already a factor variable. |
include_nonoverlapping | Relevant for ‘GRanges’ mode. This should be indicated (default is TRUE). A logical argument, if FALSE the regions from the original deepTools matrix that do not overlap with the user defined regions will be left out of the returned profileplyr object. |
separateDuplicated | Relevant for ‘GRanges’ mode. This should be indicated (default is FALSE). A logical argument, if TRUE then regions that overlap multiple inputs toe ‘regions’ argument will be separated and made into their own group. All possible combinations of region overlaps will be tested, so it is not recommended to have more than 3 groups if this option is TRUE. If FALSE, then regions that overlap each individual ‘region’ input will be in the output, and if one region overlaps multiple ‘region’ inputs, then it will be duplicated in the output and will show up in the section for each group. |
inherit_groups | A logical whether that groups the exist in the profileplyr object in the ‘object’ argument should be included in the default grouping scheme for the output object of this function. The default is TRUE. If false, only the GRanges or gene list overlap annotation will be used for heatmap grouping. |
mcols(proplyr_object_subset)$newColumn <- rnorm(n = nrow(proplyr_object_subset))
params(proplyr_object_subset)$rowGroupsInUse
## [1] "sgGroup"
switchGroup <- groupBy(proplyr_object_subset,
group = "newColumn")
params(switchGroup)$rowGroupsInUse
## [1] "newColumn"
library(rtracklayer)
GATA4_1 <- import.bed("GSM1505644_Gata4_022113_meso.bed.peak.txt.gz")
GATA4_2 <- import.bed("GSM1505645_Gata4_041914_meso.bed.peak.txt.gz")
GATA4_HC <- GATA4_1[GATA4_1 %over% GATA4_2]
seqlevelsStyle(GATA4_HC) <- "UCSC"
When a profileplyr object is grouped by GRanges a column is added to the range metadata titled “GR_overlap_names” that indicates which GRanges that range overlaps with (or an indication that there was no overlap). The ‘rowGroupsInUse’ slot of the params is automatically changed to this column, and this is reflected in the heatmap on the next slide.
proplyr_object_subset <- groupBy(proplyr_object_subset,
group = GATA4_HC,
GRanges_names = "GATA4_peaks",
include_nonoverlapping = TRUE)
table(mcols(proplyr_object_subset)$GR_overlap_names)
##
## GATA4_peaks no_overlap
## 124 876
generateEnrichedHeatmap(proplyr_object_subset,
include_group_annotation = TRUE,
color_by_sample_group = "chip",
ylim = "chip",
show_heatmap_legend = rep(FALSE, 6)
)
A profileplyr object can also be subset or annotated based on overlap of its gene annotations (from annotateRanges() or annotateRanges_great()) with a list of genes from the user. This argument must be a list(), and the elements of list can either be a character vector of gene symbols or a list of data frames with the gene symbols as the rownames. Additional columns in these data frames will be inculded as columns in the range metadata and can also be used for additional range annotation.
First we will read in two genes lists, one for genes going up in the mesoderm compared to the ESCs, and one for the genes that go down (or up in ESC). This analysis was done with DESeq2 using the following counts tables downloaded from ENCODE: * HUES64 ESC: Replicate 1 and Replicate 2 * Mesoderm: Replicate 1 and Replicate 2
library(dplyr)
UP_in_meso_genes <- read.table("UP_in_meso_genes.txt",
header = FALSE,
stringsAsFactors = FALSE) %>%
.[,1]
head(UP_in_meso_genes)
## [1] "HAPLN1" "COL3A1" "ANXA1" "ITGA1" "LOX" "COL1A1"
DOWN_in_meso_genes <- read.table("DOWN_in_meso_genes.txt",
header = FALSE,
stringsAsFactors = FALSE) %>%
.[,1]
head(UP_in_meso_genes)
## [1] "HAPLN1" "COL3A1" "ANXA1" "ITGA1" "LOX" "COL1A1"
We combine these gene sets into a single list, and again use the groupBy() function to group the ranges. This will create a column in the range metadata shoing these overlaps, and the’rowgroupsInUse’ will by default be switched to this column. Remember that you can always switch this grouping to any column you’d like with the same groupBy() function, as shown above.
# IMPORTANT: The profileplyr object must have been annotated with either annotateRanges() or annotateRanges_great() before subsetting by a gene list
proplyr_object_subset <- annotateRanges_great(proplyr_object_subset,
species = "hg19")
geneList_for_grouping <- list(meso_UP= UP_in_meso_genes,
meso_DOWN = DOWN_in_meso_genes)
proplyr_object_subset <- groupBy(proplyr_object_subset,
group = geneList_for_grouping)
## A column has been added to the range metadata with the column name 'GL_overlap_names' that specifies the GRanges each range overlaps with, but the inherited groups are not included.
mcols(proplyr_object_subset)
## DataFrame with 843 rows and 13 columns
## name score sgGroup giID names
## <character> <numeric> <factor> <factor> <factor>
## giID808 NA 0 K27ac_top10_HUES64.bed giID808 NA
## giID785 NA 0 K27ac_top10_HUES64.bed giID785 NA
## giID736 NA 0 K27ac_top10_HUES64.bed giID736 NA
## giID714 NA 0 K27ac_top10_HUES64.bed giID714 NA
## giID714 NA 0 K27ac_top10_HUES64.bed giID714 NA
## ... ... ... ... ... ...
## giID983 NA 0 K27ac_top10_HUES64.bed giID983 NA
## giID983 NA 0 K27ac_top10_HUES64.bed giID983 NA
## giID999 NA 0 K27ac_top10_HUES64.bed giID999 NA
## giID999 NA 0 K27ac_top10_HUES64.bed giID999 NA
## giID1000 NA 0 K27ac_top10_HUES64.bed giID1000 NA
## newColumn overlap_matrix GR_overlap_names name.1
## <numeric> <matrix> <factor> <character>
## giID808 -0.453881995410842 0:1 no_overlap NA
## giID785 0.776292813554967 0:1 no_overlap NA
## giID736 0.325036775903307 0:1 GATA4_peaks 158
## giID714 -0.0157360582609343 0:1 no_overlap NA
## giID714 -0.0157360582609343 1:0 no_overlap NA
## ... ... ... ... ...
## giID983 0.331649349787874 0:1 no_overlap NA
## giID983 0.331649349787874 1:0 no_overlap NA
## giID999 -0.252095261459177 0:1 no_overlap NA
## giID999 -0.252095261459177 0:1 no_overlap NA
## giID1000 1.2475850575946 0:1 no_overlap NA
## score.1 SYMBOL distanceToTSS GL_overlap_names
## <numeric> <character> <numeric> <ordered>
## giID808 NA PUSL1 6780 meso_DOWN
## giID785 NA RER1 -112760 meso_DOWN
## giID736 59.31 RERE 146478 meso_DOWN
## giID714 NA SLC25A33 -167481 meso_DOWN
## giID714 NA SPSB1 79121 meso_UP
## ... ... ... ... ...
## giID983 NA HCFC1 -690 meso_DOWN
## giID983 NA TMEM187 170 meso_UP
## giID999 NA NLGN4Y -721757 meso_DOWN
## giID999 NA TMSB4Y 98422 meso_DOWN
## giID1000 NA KDM5D 105 meso_DOWN
generateEnrichedHeatmap(proplyr_object_subset,
include_group_annotation = TRUE,
color_by_sample_group = "chip",
ylim = "chip",
show_heatmap_legend = rep(FALSE, 6)
)
The profileplyr object can always be exported to deepTools using the ‘export_deepToolsMat’ function, however, remember that while the grouping will be preserved in deepTools, the extra annotation columns will not.
export_deepToolsMat(proplyr_object_subset,
con = "groupBy_geneList_deeptools",
overwrite = TRUE)
## [1] "groupBy_geneList_deeptools"
##system("plotHeatmap -m groupBy_geneList_deeptools.gz -o groupBy_geneList_deeptools.pdf --yMax 50 5 100 50 5 150 --zMax 100 20 180 100 20 180")
The following table is simply a DESeq2 results table subset for gene symbol and the stat column indicating direction and extent of gene expression changes. By using a data frame with this column, we can then use this information as an extra annotation column in our range heatmap.
meso_vs_esc_stat <- readRDS("meso_vs_esc_stat.RData")
head(meso_vs_esc_stat)
## meso_vs_esc_stat
## HAPLN1 44.41412
## COL3A1 43.80409
## ANXA1 40.33583
## ITGA1 35.59385
## LOX 35.55025
## COL1A1 35.05454
After this grouping, a column will be added to the range metadata called “GL_overlap_names”, in addition to the other columns in the user-supplied data frame.
note that in addition to the ‘GL_overlap_names’ column showing overlap (which isn’t important since we used all genes here), there is a column showing the expression changes for all these genes in meso vs esc
proplyr_object_subset <- groupBy(proplyr_object_subset,
group = list(meso_vs_esc_stat), # note that this must be a list, and you could have more than one data frame here
include_nonoverlapping = FALSE)
#it will be interesting to see the gene expression pattern within the clusters we have used before, so we will cluster the ranges after grouping by gene list
set.seed(0)
proplyr_object_subset <- clusterRanges(proplyr_object_subset,
kmeans_k = 3)
tail(colnames(mcols(proplyr_object_subset)))
## [1] "SYMBOL" "distanceToTSS" "GL_overlap_names"
## [4] "GL_overlap_names.1" "meso_vs_esc_stat" "cluster"
Here we use the gene expression information to generate a heatmap with the expression of genes in mesoderm vs HUES64 cells for each cluster
generateEnrichedHeatmap(proplyr_object_subset,
include_group_annotation = TRUE,
extra_annotation_columns = c("meso_vs_esc_stat"),
color_by_sample_group = "chip",
ylim = "chip",
show_heatmap_legend = rep(FALSE, 6)
)
You’ll notice in the previous figure that the color scale for the expression data is unbalanced so that 0 is not right in the middle, resulting in most ranges aving a red color. And we want a different color for genes that go up and another color for genes that go down, with genes that don’t change being white. The colors of the extra annotation columns can be changed with the ‘extra_anno_color’ argument, which takes a list containing elements taht correspond to each extra annotation heatmap on the right side of the figure.
The list must be the same length as the number of extra columns, and each element can either be NULL (keep default coloring), a character or numeric vector with two or three colors depending on whether you want a two or three color progression, or a function that calls colorRamp(). Below we will generate a function to make a three color scheme, with 0 in the being white. We also make the max and min values the 98th and 2nd percentiles to avoid the extreme values driving the colors.
library(circlize)
meso_vs_esc_statonly <- mcols(proplyr_object_subset)$meso_vs_esc_stat
q_98 <- quantile(meso_vs_esc_statonly, 0.98)
q_01 <- quantile(meso_vs_esc_statonly, 0.02)
meso_vs_esc_stat_color <- (colorRamp2(c(q_01, 0, q_98),
c("blue", "white", "red")))
This function will be used for the gene expression column, and we will change the colors of the GATA4 overlap column as well so that no overlap is white.
generateEnrichedHeatmap(proplyr_object_subset,
include_group_annotation = TRUE,
extra_annotation_columns = c("meso_vs_esc_stat"),
color_by_sample_group = "chip",
ylim = "chip",
show_heatmap_legend = rep(FALSE, 6),
extra_anno_color = list(meso_vs_esc_stat_color)
)