Converting PharmacoSet Drug Response Data into gDR object
Jermiah Joseph
jermiah.joseph@uhn.caBartosz Czech
bartosz.w.czech@gmail.com Source:../../vignettes/ConvertingPharmacoSetToGDR.Rmd
ConvertingPharmacoSetToGDR.Rmd
library(PharmacoGx)
#> Loading required package: CoreGx
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
#>
#> Attaching package: 'PharmacoGx'
#> The following objects are masked from 'package:CoreGx':
#>
#> .parseToRoxygen, amcc, connectivityScore, cosinePerm, gwc, mcc
library(gDRimport)
Overview
The gDRimport
package is a part of the gDR suite. It
helps to prepare raw drug response data for downstream processing. It
mainly contains helper functions for importing/loading/validating dose
response data provided from different scanner sources. In collaboration
with the BHKLab, gDRimport
also provides functions that can
convert a PharmacoGx::PharamcoSet
object into a gDR object.
With this functionality, users familiar with the gDR suite of packages
and methods can utilize the publically available, curated datasets from
the PharmacoGx database. The main step in this process is to extract the
drug dose-response data from the PharmacoSets and transform them into a
data.table
that can be used as input for the
gDRcore::runDrugResponseProcessingPipeline
.
Loading a PharmacoSet (PSet)
Whereas a user might already have a pharmacoset loaded in their R session, if they wish to obtain a different pharmacoset or use the same script in the future we provide a helper function to do so. It helps to have a user directory in which to store all pharmacosets, and by passing this directory into the function as a parameter, the function will also check to see if the PSet exists in the user-defined directory. This is to ensure that the PSet is not being re-downloaded if it already has.
pset <- getPSet("Tavor_2020")
pset
Converting PharmacoSet to data.table for gDR pipeline
PharamcoSets
hold data pertaining to the cell lines
(@sample slot), drugs (@treatment slot), and dose response experiments
(@treatmentResponse slot). The dose
response data is stored in a treatmentResponseExperiment
object and the function gDRimport::convert_pset_to_df
extracts this information to build a data.table
that can be
used as input to the gDR pipeline.
# Store treatment response data in df_
dt <- convert_pset_to_df(pharmacoset = pset)
str(dt)
#> Classes 'data.table' and 'data.frame': 34516 obs. of 7 variables:
#> $ Barcode : chr "PCM-0103090_1_130695_2018-08-28" "PCM-0103090_1_130695_2018-08-28" "PCM-0103090_1_130695_2018-08-28" "PCM-0103090_1_130695_2018-08-28" ...
#> $ ReadoutValue : num 75.7 63.1 75.9 87.2 78.8 ...
#> $ Concentration : num 0.0021 0.0052 0.01 0.0299 0.0798 ...
#> $ Clid : chr "130695" "130695" "130695" "130695" ...
#> $ DrugName : chr "Ivosidenib" "Ivosidenib" "Ivosidenib" "Ivosidenib" ...
#> $ Duration : num 48 48 48 48 48 48 48 48 48 48 ...
#> $ ReferenceDivisionTime: logi NA NA NA NA NA NA ...
#> - attr(*, ".internal.selfref")=<externalptr>
Subsetting to extract relevant information
Most canonical PharmacoSets
have data pertaining to many
cell lines and their response to many drugs (drug-combination data is
available in some but its conversion to gDR is not currently supported).
As such, in the interest of time and resources, it may be useful to
subset the data before providing it as input for the gDR pipeline.
# example subset using only 1 cell line
subset_cl <- dt$Clid[1]
x <- dt[Clid == subset_cl]
x
#> Barcode ReadoutValue Concentration Clid
#> <char> <num> <num> <char>
#> 1: PCM-0103090_1_130695_2018-08-28 75.733 0.0021 130695
#> 2: PCM-0103090_1_130695_2018-08-28 63.094 0.0052 130695
#> 3: PCM-0103090_1_130695_2018-08-28 75.935 0.0100 130695
#> 4: PCM-0103090_1_130695_2018-08-28 87.159 0.0299 130695
#> 5: PCM-0103090_1_130695_2018-08-28 78.766 0.0798 130695
#> ---
#> 589: PCM-0064526_5_130695_2018-08-28 71.707 1.4960 130695
#> 590: PCM-0064526_5_130695_2018-08-28 60.488 3.9900 130695
#> 591: PCM-0064526_5_130695_2018-08-28 25.366 9.9750 130695
#> 592: PCM-0064526_5_130695_2018-08-28 5.976 24.9400 130695
#> 593: PCM-0064526_5_130695_2018-08-28 100.000 0.0000 130695
#> DrugName Duration ReferenceDivisionTime
#> <char> <num> <lgcl>
#> 1: Ivosidenib 48 NA
#> 2: Ivosidenib 48 NA
#> 3: Ivosidenib 48 NA
#> 4: Ivosidenib 48 NA
#> 5: Ivosidenib 48 NA
#> ---
#> 589: Azacitidine 48 NA
#> 590: Azacitidine 48 NA
#> 591: Azacitidine 48 NA
#> 592: Azacitidine 48 NA
#> 593: vehicle 48 NA
Running drug response pipeline with data
The subsetted data can now be used as input for the
gDRcore::runDrugResponseProcessingPipeline()
. The output of
this function is a MultiAssayExperiment
object which can be
accessed with gDRutils::convert_se_assay_to_dt()
# RUN DRUG RESPONSE PROCESSING PIPELINE
se <- gDRcore::runDrugResponseProcessingPipeline(x)
se
# Convert Summarized Experiments to data.table
# Available SEs : "RawTreatred", "Controls", "Normalized", "Averaged", "Metrics"
str(gDRutils::convert_se_assay_to_dt(se[[1]], "Averaged"))
str(gDRutils::convert_se_assay_to_dt(se[[1]], "Metrics"))
SessionInfo
sessionInfo()
#> R version 4.3.0 (2023-04-21)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] gDRimport_1.5.4 PharmacoGx_3.6.0
#> [3] CoreGx_2.6.1 SummarizedExperiment_1.32.0
#> [5] Biobase_2.62.0 GenomicRanges_1.54.1
#> [7] GenomeInfoDb_1.38.8 IRanges_2.36.0
#> [9] S4Vectors_0.40.2 MatrixGenerics_1.14.0
#> [11] matrixStats_1.4.1 BiocGenerics_0.48.1
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.9
#> [3] MultiAssayExperiment_1.28.0 magrittr_2.0.3
#> [5] rmarkdown_2.29 fs_1.6.5
#> [7] zlibbioc_1.48.2 ragg_1.2.7
#> [9] vctrs_0.6.5 memoise_2.0.1
#> [11] magicaxis_2.4.5 RCurl_1.98-1.16
#> [13] htmltools_0.5.8.1 S4Arrays_1.2.1
#> [15] plotrix_3.8-4 SparseArray_1.2.4
#> [17] sass_0.4.9 pracma_2.4.4
#> [19] KernSmooth_2.23-20 bslib_0.8.0
#> [21] htmlwidgets_1.6.4 desc_1.4.3
#> [23] plyr_1.8.9 testthat_3.2.1
#> [25] cachem_1.1.0 igraph_2.1.2
#> [27] mime_0.12 lifecycle_1.0.4
#> [29] piano_2.18.0 pkgconfig_2.0.3
#> [31] Matrix_1.6-5 R6_2.5.1
#> [33] fastmap_1.2.0 GenomeInfoDbData_1.2.11
#> [35] shiny_1.9.1 digest_0.6.37
#> [37] colorspace_2.1-1 rprojroot_2.0.4
#> [39] pkgload_1.3.4 textshaping_0.3.7
#> [41] SnowballC_0.7.1 fansi_1.0.6
#> [43] abind_1.4-8 coop_0.6-3
#> [45] compiler_4.3.0 downloader_0.4
#> [47] marray_1.80.0 backports_1.5.0
#> [49] BiocParallel_1.36.0 gDRutils_1.5.3
#> [51] bench_1.1.3 qs_0.27.2
#> [53] gplots_3.2.0 maps_3.4.2.1
#> [55] MASS_7.3-58.4 DelayedArray_0.28.0
#> [57] gtools_3.9.5 caTools_1.18.3
#> [59] tools_4.3.0 NISTunits_1.0.1
#> [61] httpuv_1.6.15 relations_0.6-14
#> [63] glue_1.8.0 promises_1.3.2
#> [65] grid_4.3.0 checkmate_2.3.2
#> [67] cluster_2.1.4 reshape2_1.4.4
#> [69] fgsea_1.28.0 generics_0.1.3
#> [71] gtable_0.3.6 sm_2.2-6.0
#> [73] data.table_1.16.4 RApiSerialize_0.1.4
#> [75] stringfish_0.16.0 utf8_1.2.4
#> [77] XVector_0.42.0 RANN_2.6.2
#> [79] pillar_1.9.0 stringr_1.5.1
#> [81] limma_3.58.1 BumpyMatrix_1.10.0
#> [83] later_1.4.1 dplyr_1.1.4
#> [85] lattice_0.21-8 tidyselect_1.2.1
#> [87] knitr_1.49 xfun_0.49
#> [89] shinydashboard_0.7.2 statmod_1.5.0
#> [91] brio_1.1.4 DT_0.33
#> [93] visNetwork_2.1.2 stringi_1.8.4
#> [95] yaml_2.3.10 boot_1.3-28.1
#> [97] evaluate_1.0.1 codetools_0.2-19
#> [99] lsa_0.73.3 tibble_3.2.1
#> [101] cli_3.6.3 RcppParallel_5.1.7
#> [103] xtable_1.8-4 systemfonts_1.0.5
#> [105] munsell_0.5.1 jquerylib_0.1.4
#> [107] Rcpp_1.0.13-1 mapproj_1.2.11
#> [109] parallel_4.3.0 sets_1.0-25
#> [111] pkgdown_2.0.7 ggplot2_3.5.1
#> [113] assertthat_0.2.1 bitops_1.0-9
#> [115] slam_0.1-55 celestial_1.4.6
#> [117] scales_1.3.0 purrr_1.0.2
#> [119] crayon_1.5.3 rlang_1.1.4
#> [121] cowplot_1.1.3 fastmatch_1.1-4
#> [123] shinyjs_2.1.0