Table 4 Metadata for cell line compendia.
From: Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
Metadata field | Column name | Example values |
---|---|---|
Treehouse dataset identifier | th_dataset_id | “TH03_0028_S03”, “G20461”, “TARGET-52-NAAELV-50A-01R” |
Disease that caused the tumor from which the cell line is derived | disease | one of 171 values, e.g. “medulloblastoma”, “embryonal rhabdomyosarcoma” |
Age in years | age | 0.25–94; NA |
Sex | sex | “female”, “male”, or “unknown” |
Treehouse code for the source of a group of datasets | study_id | “CCLE”, “TARGET”, “TH27”, “THR13” |
DepMap ID (previously Model ID) | depmap_model_id | “ACH-000058”, “ACH-000372”, “NA” |
Identifier used by the source to refer to the cell line | study_dataset_id | “HSC-3”, “KMM-1” |
CCLE ID | ccle_id | “HSC3_UPPER_AERODIGESTIVE_TRACT”, “HS895T_SKIN”, “NA” |
Unique identifier associated with the RNA-Seq dataset | rnaseq_uuid | “f4f9be10-fb95-40db-868f-d7714c9b4203”, “NA” |
RNA-Seq file name | rnaseq_file_name | “G20461.HSC-3.2.bam”, “NA” |
American Type Culture Collection (ATCC) cell line identifier | atcc_identifier | “HTB-24”, “CRL-7598”, “NA” |
Anatomical collection site | tissue | “upper_aerodigestive_tract”, “pancreas”, “NA” |
Corresponding TCGA cohort | corresponding_tcga_cohort | “DLBC”, “ESCA”, “NA” |
Histology | histology | “lymphoid_neoplasm”, “anaplastic astrocytoma”, “NA” |
Organism | organism | “Homo sapiens” |