Divide the columns of an input data.table into treatment metadata, condition metadata, experiment metadata, and assay data for further analysis. This will most commonly be used to identify the different components of a SummarizedExperiment object.
Arguments
- df_
data.table with drug-response data
- nested_keys
character vector of keys to exclude from the row or column metadata, and to instead nest within an element of the matrix. See details.
- combine_on
integer value of
1
or2
, indicating whether unrecognized columns should be combined on row or column respectively. Defaults to1
.
Details
Named list containing the following elements:
- "treatment_md":
treatment metadata
- "condition_md":
condition metadata
- "data_fields":
all data.table column names corresponding to fields nested within a BumpyMatrix cell
- "experiment_md":
metadata that is constant for all entries of the data.table
- "identifiers_md":
key identifier mappings
The nested_keys
provides the user the opportunity to specify that they would not
like to use that metadata field as a differentiator of the treatments, and instead, incorporate it
into a nested DataFrame
in the BumpyMatrix matrix object.
In the event that if any of the nested_keys
are constant throughout the whole data.table,
they will still be included in the DataFrame of the BumpyMatrix and not in the experiment_metadata.
Columns within the df_
will be identified through the following logic:
First, the known data fields and any specified nested_keys
are extracted.
Following that, known cell and drug metadata fields are detected,
and any remaining columns that represent constant metadata fields across all rows are extracted.
Next, any cell line metadata will be heuristically extracted.
Finally, all remaining columns will be combined on either the rows or columns as specified by
combine_on
.
Examples
split_SE_components(data.table::data.table(clid = "CL1", Gnumber = "DrugA"))
#> $condition_md
#> DataFrame with 1 row and 1 column
#> clid
#> <character>
#> CL1 CL1
#>
#> $treatment_md
#> DataFrame with 1 row and 1 column
#> Gnumber
#> <character>
#> DrugA DrugA
#>
#> $data_fields
#> character(0)
#>
#> $experiment_md
#> DataFrame with 0 rows and 0 columns
#>
#> $identifiers_md
#> $identifiers_md$duration
#> [1] "Duration"
#>
#> $identifiers_md$cellline
#> [1] "clid"
#>
#> $identifiers_md$cellline_name
#> [1] "CellLineName"
#>
#> $identifiers_md$cellline_tissue
#> [1] "Tissue"
#>
#> $identifiers_md$cellline_ref_div_time
#> [1] "ReferenceDivisionTime"
#>
#> $identifiers_md$cellline_parental_identifier
#> [1] "parental_identifier"
#>
#> $identifiers_md$cellline_subtype
#> [1] "subtype"
#>
#> $identifiers_md$drug
#> [1] "Gnumber"
#>
#> $identifiers_md$drug_name
#> [1] "DrugName"
#>
#> $identifiers_md$drug_moa
#> [1] "drug_moa"
#>
#> $identifiers_md$untreated_tag
#> [1] "vehicle" "untreated"
#>
#> $identifiers_md$masked_tag
#> [1] "masked"
#>
#> $identifiers_md$well_position
#> [1] "WellRow" "WellColumn"
#>
#> $identifiers_md$concentration
#> [1] "Concentration"
#>
#> $identifiers_md$template
#> [1] "Template" "Treatment"
#>
#> $identifiers_md$barcode
#> [1] "Barcode" "Plate"
#>
#> $identifiers_md$drug2
#> [1] "Gnumber_2"
#>
#> $identifiers_md$drug_name2
#> [1] "DrugName_2"
#>
#> $identifiers_md$drug_moa2
#> [1] "drug_moa_2"
#>
#> $identifiers_md$concentration2
#> [1] "Concentration_2"
#>
#> $identifiers_md$drug3
#> [1] "Gnumber_3"
#>
#> $identifiers_md$drug_name3
#> [1] "DrugName_3"
#>
#> $identifiers_md$drug_moa3
#> [1] "drug_moa_3"
#>
#> $identifiers_md$concentration3
#> [1] "Concentration_3"
#>
#> $identifiers_md$data_source
#> [1] "data_source"
#>
#> $identifiers_md$replicate
#> [1] "Replicate"
#>
#> $identifiers_md$normalization_type
#> [1] "normalization_type"
#>
#>