Overview
The gDRcore
is part of the gDR
suite. The
package provides a set of tools to process and analyze drug response
data.
Introduction
Data model
The data model is built on the MultiAssayExperiment (MAE) structure. Within an MAE, each SummarizedExperiment (SE) contains a different experiment type (e.g., single-agent or combination treatment). Columns of the MAE are defined by the cell lines and any modifications to them and are shared with the SEs. Rows are defined by the treatments (e.g., drugs, perturbations) and are specific to each SE. Assays of the SE are the different levels of data processing (raw, control, normalized, averaged data, as well as metrics). Each nested element of the assays of the SEs comprises the series themselves as a table (data.table in practice). Although not all elements need to have a series or the same number of elements, the attributes (columns of the table) should be consistent across the SE.
Drug processing
For drug response data, the input files need to be merged such that
each measurement (data) is associated with the correct metadata (cell
line properties and treatment definition). Metadata can be added with
the function cleanup_metadata
if the right reference
databases are in place.
Required columns
To process the data through
runDrugResponseProcessingPipeline
, the input data should
contain the required columns as well as optional columns.
For single-agent experiments, the required columns are: * Gnumber * DrugName * drug_moa * Concentration * clid * CellLineName * Tissue * ReferenceDivisionTime * parental_identifier * subtype * Duration * ReadoutValue
For combination experiments, additional required fields are: * Gnumber_2 * DrugName_2 * drug_moa_2 * Concentration_2
gDR supports the inclusion of any additional metadata in the long table for the pipeline. However, the most common supported by default are:
- Barcode (or Plate)
- BackgroundValue
- WellRow
- WellColumn
gDR pipeline
When the data and metadata are merged into a long table, the wrapper
function runDrugResponseProcessingPipeline
can be used to
generate an MAE with processed and analyzed data.
.
In practice, runDrugResponseProcessingPipeline
performs
the following steps:
-
create_SE
: Creates the structure of the MAE and the associated SEs by assigning metadata into the row and column attributes. The assignment is performed in the functionsplit_SE_components
(see details below for the assumptions made when building SE structures).create_SE
also dispatches the raw data and controls into the right nested tables. Note that data may be duplicated between different SEs to make them self-contained. -
normalize_SE
: Normalizes the raw data based on the control. Calculation of the GR value is based on a cell line division time provided by the reference database if no pre-treatment control is provided. If both pieces of information are missing, GR values cannot be calculated. Additional normalization can be added as new rows in the nested table. -
average_SE
: Averages technical replicates that are stored in the same nested table. -
fit_SE
: Fits the dose-response curves and calculates response metrics for each normalization type. -
fit_SE.combinations
: Calculates synergy scores for drug combination data and, if the data is appropriate, fits along the two drugs and matrix-level metrics (e.g., isobolograms) are calculated. This is also performed for each normalization type independently.
.
The functions used to process the data have parameters for specifying the names of the variables and assays. Additional parameters are available to personalize the processing steps, such as forcing the nesting (or not) of an attribute and specifying attributes that should be considered as technical replicates or not.
Use Cases
Data preprocessing
Please familiarize yourself with the gDRimport
package,
which contains a variety of tools to prepare input data for
gDRcore
.
This example is based on the artificial dataset called
data1
available within the gDRimport
package.
gDR
requires three types of data that should be used as the
raw input: Template, Manifest, and RawData. More information about these
three types of data can be found in our general documentation.
td <- gDRimport::get_test_data()
The provided dataset needs to be merged into one
data.table
object to be able to run the gDR pipeline. This
process can be done using two functions:
gDRimport::load_data()
and
gDRcore::merge_data()
.
Running gDR pipeline
We provide an all-in-one function that splits data into appropriate
data types, creates the SummarizedExperiment object for each data type,
splits data into treatment and control assays, normalizes, averages,
calculates gDR metrics, and finally, creates the MultiAssayExperiment
object. This function is called
runDrugResponseProcessingPipeline
.
mae <- runDrugResponseProcessingPipeline(input_df)
mae
#> A MultiAssayExperiment object of 2 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 2:
#> [1] combination: SummarizedExperiment with 2 rows and 6 columns
#> [2] single-agent: SummarizedExperiment with 3 rows and 6 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files
And we can subset the MultiAssayExperiment to receive the SummarizedExperiment specific to any data type, e.g.
mae[["single-agent"]]
#> class: SummarizedExperiment
#> dim: 3 6
#> metadata(5): identifiers experiment_metadata Keys fit_parameters
#> .internal
#> assays(5): RawTreated Controls Normalized Averaged Metrics
#> rownames(3): G00002_drug_002_moa_A_168 G00004_drug_004_moa_A_168
#> G00011_drug_011_moa_B_168
#> rowData names(4): Gnumber DrugName drug_moa Duration
#> colnames(6): CL00011_cellline_BA_breast_cellline_BA_unknown_26
#> CL00012_cellline_CA_breast_cellline_CA_unknown_30 ...
#> CL00015_cellline_FA_breast_cellline_FA_unknown_42
#> CL00018_cellline_IB_breast_cellline_IB_unknown_54
#> colData names(6): clid CellLineName ... subtype ReferenceDivisionTime
Data extraction
Extraction of the data from either MultiAssayExperiment
or SummarizedExperiment
objects into more user-friendly
structures, as well as other data transformations, can be done using
gDRutils
. We encourage reading the gDRutils
vignette to familiarize yourself with these functionalities.
SessionInfo
sessionInfo()
#> R version 4.3.0 (2023-04-21)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] gDRcore_1.7.3 gDRtestData_1.7.1 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] farver_2.1.2 bitops_1.0-9
#> [3] fastmap_1.1.1 RCurl_1.98-1.17
#> [5] BumpyMatrix_1.10.0 TH.data_1.1-3
#> [7] digest_0.6.34 lifecycle_1.0.4
#> [9] gDRutils_1.7.7 survival_3.5-5
#> [11] magrittr_2.0.3 compiler_4.3.0
#> [13] rlang_1.1.6 sass_0.4.8
#> [15] drc_3.0-1 tools_4.3.0
#> [17] plotrix_3.8-4 yaml_2.3.8
#> [19] data.table_1.17.4 knitr_1.45
#> [21] lambda.r_1.2.4 S4Arrays_1.2.1
#> [23] DelayedArray_0.28.0 RColorBrewer_1.1-3
#> [25] abind_1.4-8 multcomp_1.4-28
#> [27] BiocParallel_1.36.0 purrr_1.0.4
#> [29] BiocGenerics_0.48.1 desc_1.4.3
#> [31] grid_4.3.0 stats4_4.3.0
#> [33] scales_1.4.0 MASS_7.3-58.4
#> [35] gtools_3.9.5 MultiAssayExperiment_1.28.0
#> [37] SummarizedExperiment_1.32.0 cli_3.6.5
#> [39] mvtnorm_1.3-3 rmarkdown_2.25
#> [41] crayon_1.5.3 ragg_1.2.7
#> [43] readxl_1.4.3 cachem_1.0.8
#> [45] stringr_1.5.1 splines_4.3.0
#> [47] zlibbioc_1.48.2 gDRimport_1.7.2
#> [49] assertthat_0.2.1 parallel_4.3.0
#> [51] formatR_1.14 BiocManager_1.30.22
#> [53] cellranger_1.1.0 XVector_0.42.0
#> [55] matrixStats_1.5.0 vctrs_0.6.5
#> [57] Matrix_1.6-5 sandwich_3.1-1
#> [59] jsonlite_2.0.0 carData_3.0-5
#> [61] bookdown_0.37 car_3.1-3
#> [63] IRanges_2.36.0 S4Vectors_0.40.2
#> [65] Formula_1.2-5 systemfonts_1.0.5
#> [67] testthat_3.2.1 jquerylib_0.1.4
#> [69] rematch_2.0.0 glue_1.8.0
#> [71] pkgdown_2.0.7 codetools_0.2-19
#> [73] stringi_1.8.7 futile.logger_1.4.3
#> [75] GenomeInfoDb_1.38.8 GenomicRanges_1.54.1
#> [77] tibble_3.2.1 pillar_1.10.2
#> [79] htmltools_0.5.7 brio_1.1.4
#> [81] GenomeInfoDbData_1.2.11 R6_2.6.1
#> [83] textshaping_0.3.7 evaluate_0.23
#> [85] lattice_0.21-8 Biobase_2.62.0
#> [87] futile.options_1.0.1 backports_1.5.0
#> [89] memoise_2.0.1 bslib_0.6.1
#> [91] SparseArray_1.2.4 checkmate_2.3.2
#> [93] xfun_0.42 fs_1.6.3
#> [95] MatrixGenerics_1.14.0 zoo_1.8-14
#> [97] pkgconfig_2.0.3