Table 4 Metadata for cell line compendia.

From: Consistently processed RNA sequencing data from 50 sources enriched for pediatric data

Metadata field

Column name

Example values

Treehouse dataset identifier

th_dataset_id

“TH03_0028_S03”, “G20461”, “TARGET-52-NAAELV-50A-01R”

Disease that caused the tumor from which the cell line is derived

disease

one of 171 values, e.g. “medulloblastoma”, “embryonal rhabdomyosarcoma”

Age in years

age

0.25–94; NA

Sex

sex

“female”, “male”, or “unknown”

Treehouse code for the source of a group of datasets

study_id

“CCLE”, “TARGET”, “TH27”, “THR13”

DepMap ID (previously Model ID)

depmap_model_id

“ACH-000058”, “ACH-000372”, “NA”

Identifier used by the source to refer to the cell line

study_dataset_id

“HSC-3”, “KMM-1”

CCLE ID

ccle_id

“HSC3_UPPER_AERODIGESTIVE_TRACT”, “HS895T_SKIN”, “NA”

Unique identifier associated with the RNA-Seq dataset

rnaseq_uuid

“f4f9be10-fb95-40db-868f-d7714c9b4203”, “NA”

RNA-Seq file name

rnaseq_file_name

“G20461.HSC-3.2.bam”, “NA”

American Type Culture Collection (ATCC) cell line identifier

atcc_identifier

“HTB-24”, “CRL-7598”, “NA”

Anatomical collection site

tissue

“upper_aerodigestive_tract”, “pancreas”, “NA”

Corresponding TCGA cohort

corresponding_tcga_cohort

“DLBC”, “ESCA”, “NA”

Histology

histology

“lymphoid_neoplasm”, “anaplastic astrocytoma”, “NA”

Organism

organism

“Homo sapiens”