Divide the columns of an input data.table into treatment metadata, condition metadata, experiment metadata, and assay data for further analysis. This will most commonly be used to identify the different components of a SummarizedExperiment object.
Arguments
- df_
 data.table with drug-response data
- nested_keys
 character vector of keys to exclude from the row or column metadata, and to instead nest within an element of the matrix. See details.
- combine_on
 integer value of
1or2, indicating whether unrecognized columns should be combined on row or column respectively. Defaults to1.
Details
Named list containing the following elements:
- "treatment_md":
 treatment metadata
- "condition_md":
 condition metadata
- "data_fields":
 all data.table column names corresponding to fields nested within a BumpyMatrix cell
- "experiment_md":
 metadata that is constant for all entries of the data.table
- "identifiers_md":
 key identifier mappings
The nested_keys provides the user the opportunity to specify that they would not
like to use that metadata field as a differentiator of the treatments, and instead, incorporate it
into a nested DataFrame in the BumpyMatrix matrix object.
In the event that if any of the nested_keys are constant throughout the whole data.table,
they will still be included in the DataFrame of the BumpyMatrix and not in the experiment_metadata.
Columns within the df_ will be identified through the following logic:
First, the known data fields and any specified nested_keys are extracted.
Following that, known cell and drug metadata fields are detected,
and any remaining columns that represent constant metadata fields across all rows are extracted.
Next, any cell line metadata will be heuristically extracted.
Finally, all remaining columns will be combined on either the rows or columns as specified by
combine_on.
Examples
split_SE_components(data.table::data.table(clid = "CL1", Gnumber = "DrugA"))
#> $condition_md
#> DataFrame with 1 row and 1 column
#>            clid
#>     <character>
#> CL1         CL1
#> 
#> $treatment_md
#> DataFrame with 1 row and 1 column
#>           Gnumber
#>       <character>
#> DrugA       DrugA
#> 
#> $data_fields
#> character(0)
#> 
#> $experiment_md
#> DataFrame with 0 rows and 0 columns
#> 
#> $identifiers_md
#> $identifiers_md$duration
#> [1] "Duration"
#> 
#> $identifiers_md$cellline
#> [1] "clid"
#> 
#> $identifiers_md$cellline_name
#> [1] "CellLineName"
#> 
#> $identifiers_md$cellline_tissue
#> [1] "Tissue"
#> 
#> $identifiers_md$cellline_ref_div_time
#> [1] "ReferenceDivisionTime"
#> 
#> $identifiers_md$cellline_parental_identifier
#> [1] "parental_identifier"
#> 
#> $identifiers_md$cellline_subtype
#> [1] "subtype"
#> 
#> $identifiers_md$drug
#> [1] "Gnumber"
#> 
#> $identifiers_md$drug_name
#> [1] "DrugName"
#> 
#> $identifiers_md$drug_moa
#> [1] "drug_moa"
#> 
#> $identifiers_md$untreated_tag
#> [1] "vehicle"   "untreated"
#> 
#> $identifiers_md$masked_tag
#> [1] "masked"
#> 
#> $identifiers_md$well_position
#> [1] "WellRow"    "WellColumn"
#> 
#> $identifiers_md$concentration
#> [1] "Concentration"
#> 
#> $identifiers_md$template
#> [1] "Template"  "Treatment"
#> 
#> $identifiers_md$barcode
#> [1] "Barcode" "Plate"  
#> 
#> $identifiers_md$drug2
#> [1] "Gnumber_2"
#> 
#> $identifiers_md$drug_name2
#> [1] "DrugName_2"
#> 
#> $identifiers_md$drug_moa2
#> [1] "drug_moa_2"
#> 
#> $identifiers_md$concentration2
#> [1] "Concentration_2"
#> 
#> $identifiers_md$drug3
#> [1] "Gnumber_3"
#> 
#> $identifiers_md$drug_name3
#> [1] "DrugName_3"
#> 
#> $identifiers_md$drug_moa3
#> [1] "drug_moa_3"
#> 
#> $identifiers_md$concentration3
#> [1] "Concentration_3"
#> 
#> $identifiers_md$data_source
#> [1] "data_source"
#> 
#> $identifiers_md$replicate
#> [1] "Replicate"
#> 
#> $identifiers_md$normalization_type
#> [1] "normalization_type"
#> 
#>