Introduction
The gDRtestData package includes a curated subset of DepMap Public 24Q4 data, serving multiple purposes:
- test data: validate gDR analysis functions with realistic genomic datasets
- example data: demonstrate gDR package capabilities in documentation and vignettes
- reproducible research: enable reproducible examples across the gDR platform.
This vignette describes the included DepMap datasets, their contents, and how to use them.
Data Source & Citation
Orginal source: DepMap Portal
Release: DepMap Public 24Q4 (Loaded May 26, 2026)
Citation: DepMap, Broad (2024). DepMap 24Q4 Public. Figshare+. Dataset. https://doi.org/10.25452/figshare.plus.27993248.v1
Data Overview
The DepMap 24Q4 release contains new cell models and data from:
- Whole Genome/Exome Sequencing (Copy Number and Mutation)
- RNA Sequencing (Expression and Fusions)
- Genome-wide CRISPR knockout screens.
The following datasets are included in the gDRtestData
pacakge:
| Dataset | Type | Dimensions | Description | Dataset url |
|---|---|---|---|---|
| Models | Metadata | ~1,000 cell lines | Cell line information and annotations | url |
| CRISPRGeneEffect | Functional | ~1,000 × ~18,000 | CRISPR knockout gene effect scores (integrated via Chronos) | url |
| Expression | Omics | ~1,000 × ~19,000 | Gene expression (log2 TPM, protein-coding genes) | url |
| Mutations (Hotspot) | Somatic | ~1,000 × ~3,000 | Binary matrix of hotspot mutations | url |
| Mutations (Damaging) | Somatic | ~1,000 × ~3,000 | Binary matrix of damaging mutations | url |
| OmicsCNGene | CNV | ~1,000 × ~20,000 | Gene-level copy number estimates | url |
Data Dictionary
Models (Cell Line Metadata)
file name Model.csv
| Aspect | Details |
|---|---|
| Rows | Individual cell lines |
| Columns | Metadata columns (see below) |
| Values | Cell line annotations and metadata |
| Data Type | Mixed (character, numeric) |
| Interpretation | Comprehensive metadata for each cell line model |
| NA Handling | Missing values indicate information not available for that model |
Column Details:
Column Summary:
-
ModelID: Unique cell line identifier -
CCLEName: Cell line name from CCLE database -
CellLineName: Common cell line name -
DepmapModelType,OncotreeLineage,OncotreePrimaryDisease,OncotreeSubtype: Cancer classification (from Oncotree) -
OncotreeCode: Oncotree classification code -
PrimaryOrMetastasis: Tumor site (Primary/Metastatic/Recurrence) -
Age: Age at sampling -
AgeCategory: Age category at time of sampling (Adult/Pediatric/Fetus/Unknown) -
Sex: Sex at sampling (Female/Male/Unknown) -
PatientRace: Patient-reported race
Notes:
- Classification: Oncotree taxonomy for cancer models
- Quality: Authenticated, high-quality cell lines only
- Completeness: Some fields may be NA; indicates information not available
- Use: Primary reference for cell line metadata; join with other datasets via ModelID
Somatic Mutations (Hotspot)
file name: OmicsSomaticMutationsMatrixHotspot.csv
| Aspect | Details |
|---|---|
| Rows | Cell line identifiers |
| Columns | NCBI gene IDs |
| Values | Binary (0/1); 1 = hotspot mutation present |
| Definition | Mutations in known cancer hotspots (COSMIC, OncoKB) |
| Sequencing | Whole exome sequencing (WES) |
Note: Recurrent mutations at known oncogenic positions.
Somatic Mutations (Damaging)
file name: OmicsSomaticMutationsMatrixDamaging.csv
| Aspect | Details |
|---|---|
| Rows | Cell line identifiers |
| Columns | NCBI gene IDs |
| Values | Binary (0/1); 1 = damaging mutation present |
| Definition | Frame-shift, stop-gain, or splice-site mutations |
| Quality | High confidence damaging variants |
Note: Loss-of-function mutations (frameshifts, nonsense, etc.)
CRISPR Gene Effect
file name: CRISPRGeneEffect.csv
| Aspect | Details |
|---|---|
| Rows | Cell line identifiers |
| Columns | NCBI gene IDs (Entrez format) as column names |
| Values | CRISPR knockout effect scores |
| Scale | -1 to +1 (typically); negative = essential gene in that cell line |
| Interpretation | Lower values indicate genes more essential for cell viability |
| NA Handling | Missing values indicate insufficient screen coverage |
Note:
- Method: Genome-wide CRISPR/Cas9 knockout screens
- Scale: Dependency scores (probability of essentiality)
- Processing: Already normalized and quality-filtered by DepMap
Gene Expression
file name: OmicsExpressionProteinCodingGenesTPMLogp1.csv
| Aspect | Details |
|---|---|
| Rows | Cell line identifiers |
| Columns | NCBI gene IDs (protein-coding genes only) |
| Values | Expression levels (numeric) |
| Scale | Log2(TPM + 1); already log-transformed |
| Range | Typically 0-20 (log2 scale) |
| Quality | RNA-seq from standardized Broad CCL protocols |
Note:
- Only protein-coding genes included
- Already log-transformed (TPM + 1 pseudocount)
- Row-wise and gene-wise normalization already applied by DepMap
Copy Number Variation (CNV)
file name: OmicsCNGene.csv
| Aspect | Details |
|---|---|
| Rows | Cell line identifiers |
| Columns | NCBI gene IDs |
| Values | Numeric (continuous); gene-level CN estimates |
| Scale | Log2 ratio relative to diploid reference (typically -2 to +3) |
| Method | SNP microarray or WES-derived CN calling |
| Interpretation | 0 = diploid (2 copies); <0 = deletion; >0 = amplification |
Important Limitations & Disclaimers
Data Subset: This package includes a curated subset for testing/examples. For comprehensive analyses, download the full DepMap Portal data.
Licensing & Usage: DepMap data is publicly available but has specific usage terms. Verify compliance with your intended use: https://depmap.org/portal/documentation/
Citation: Always cite both DepMap (original source) and gDRtestData package.
SessionInfo
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] gDRtestData_1.11.4 BiocStyle_2.40.0
#>
#> loaded via a namespace (and not attached):
#> [1] cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [4] xfun_0.58 otel_0.2.0 textshaping_1.0.5
#> [7] data.table_1.18.4 jsonlite_2.0.0 htmltools_0.5.9
#> [10] ragg_1.5.2 sass_0.4.10 rmarkdown_2.31
#> [13] evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0
#> [16] yaml_2.3.12 lifecycle_1.0.5 bookdown_0.46
#> [19] BiocManager_1.30.27 compiler_4.6.0 fs_2.1.0
#> [22] systemfonts_1.3.2 digest_0.6.39 R6_2.6.1
#> [25] bslib_0.11.0 tools_4.6.0 pkgdown_2.2.0
#> [28] cachem_1.1.0 desc_1.4.3