Argument Description
object A profileplyr object
fun the function used to summarize the ranges (e.g. rowMeans or rowMax)
output Must be either “matrix”, “long”, or “object”.
keep_all_mcols If output is ‘long’ and this is set to TRUE, then all metadata columns in the rowRanges will be included in the output. If FALSE (default value), then only the column indicated in the ‘rowGroupsInUse’ slot of the metadata will be included in the output dataframe.
sampleData_columns_for_longPlot If output is set to ‘long’, then this argument can be used to add information stored in sampleData(object) to the summarized data frame. This needs to be a character vector with elements matching coumn names in sampleData(object).

Example

Matrix output for heatmaps

If the ‘output’ argument is set to ‘matrix’, then only a matrix will be returned with a single column for each sample containing the bins summarized as indicated with the ‘fun’ argument. The row names of this matrix is a unique identifier for each range containing the chromosome, start, end, and group.

proplyr_object_subset_sumMat <- profileplyr::summarize(proplyr_object_subset, 
                                                       fun = rowMeans, 
                                                       output = "matrix") 
proplyr_object_subset_sumMat[1:3, ]
##                                                 K27ac_esc K4me1_esc
## chr21_34696445_34697205_K27ac_top10_HUES64.bed  4.0326237  0.327227
## chr21_36207782_36209962_K27ac_top10_HUES64.bed  0.9740337  2.296254
## chr7_150734773_150737527_K27ac_top10_HUES64.bed 0.5168534  0.308683
##                                                    K4me3_esc K27ac_meso
## chr21_34696445_34697205_K27ac_top10_HUES64.bed  104.73912667  15.517333
## chr21_36207782_36209962_K27ac_top10_HUES64.bed    0.35388878   8.081078
## chr7_150734773_150737527_K27ac_top10_HUES64.bed   0.09848487  11.528448
##                                                 K4me1_meso  K4me3_meso
## chr21_34696445_34697205_K27ac_top10_HUES64.bed   0.2341562 122.2847307
## chr21_36207782_36209962_K27ac_top10_HUES64.bed   3.6760741   0.9602948
## chr7_150734773_150737527_K27ac_top10_HUES64.bed  2.6178709   0.3257963

This matrix can be used directly in other heatmap generating packages, including heatmap or pheatmap.

library(pheatmap)
pheatmap(proplyr_object_subset_sumMat,
         scale = "row", 
         cluster_cols = FALSE, 
         show_rownames = FALSE)

Long output for ggplot

If the ‘output’ argument is set to ‘long’, then the output will be a long data frame that can be used for plotting with ggplot. The grouping column of the range metadata as specified by ‘params(proplyrObject)$rowGroupsInUse’ will automatically be included in the data frame. If the other range metadata columns should be included in the data frame, then the ‘keep_all_mcols’ argument should be set to TRUE. Additionally, columns specifying the range, as well as the sample and the summarized signal that correspond to that range are included by default.

proplyr_object_subset_long <- profileplyr::summarize(proplyr_object_subset, 
                                                     fun = rowMeans, 
                                                     output = "long") 
proplyr_object_subset_long[1:3, ]
##                  sgGroup          combined_ranges    Sample    Signal
## 1 K27ac_top10_HUES64.bed  chr21_34696445_34697205 K27ac_esc 4.0326237
## 2 K27ac_top10_HUES64.bed  chr21_36207782_36209962 K27ac_esc 0.9740337
## 3 K27ac_top10_HUES64.bed chr7_150734773_150737527 K27ac_esc 0.5168534

This data frame can then be used directly with ggplot for plotting.

Note: It is often helpful to log transform the signal to more clearly see trends in the signal that is quantified.

library(ggplot2)
ggplot(proplyr_object_subset_long, aes(x = Sample, y = log(Signal))) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

profileplyr object output with summarized matrix

Lastly, if the ‘output’ argument is set to ‘object’, then a profileplyr object containing the summarized matrix will be returned. This will allow for further grouping or manipulation of the summarized ranges with other profileplyr functions, as opposed to using the binned ranges that are often used in later examples.

proplyr_object_subset_summ <- profileplyr::summarize(proplyr_object_subset, 
                                                     fun = rowMeans, 
                                                     output = "object")
assays(proplyr_object_subset_summ)[[1]][1:3, ]
##         K27ac_esc K4me1_esc    K4me3_esc K27ac_meso K4me1_meso  K4me3_meso
## giID1   4.0326237  0.327227 104.73912667  15.517333  0.2341562 122.2847307
## giID10  0.9740337  2.296254   0.35388878   8.081078  3.6760741   0.9602948
## giID100 0.5168534  0.308683   0.09848487  11.528448  2.6178709   0.3257963
rowRanges(proplyr_object_subset_summ)[1:3]
## GRanges object with 3 ranges and 5 metadata columns:
##           seqnames              ranges strand |        name     score
##              <Rle>           <IRanges>  <Rle> | <character> <numeric>
##     giID1    chr21   34696445-34697205      + |        <NA>         0
##    giID10    chr21   36207782-36209962      + |        <NA>         0
##   giID100     chr7 150734773-150737527      + |        <NA>         0
##                          sgGroup     giID    names
##                         <factor> <factor> <factor>
##     giID1 K27ac_top10_HUES64.bed    giID1     <NA>
##    giID10 K27ac_top10_HUES64.bed   giID10     <NA>
##   giID100 K27ac_top10_HUES64.bed  giID100     <NA>
##   -------
##   seqinfo: 24 sequences from an unspecified genome; no seqlengths