Skip Navigation
Close
Warning:
National Institute of Health The Cancer Genome Atlas National Cancer Institute National Human Genome Research Institute
Home > About the Data > Data Levels and Data Types

Data Levels and Data Types

The table below shows the relationship of TCGA data types to data levels as well as information on important metadata.

Please see the TCGA Data Primer for a detailed guide on TCGA data types.

*Red indicates data that are in the controlled-access data tier. The phrase "Controlled-access" is also used to denote such data. Data in this tier require user authorization in order to access them. Please see the Access Tiers page for more information.

Relationship of Data Levels to Data Types

Clinical Data

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
Clinical data All Clinical Available clinical information for each participant (may include demographic information, treatment information, survival data, etc)

File type: .xml
Compiled patient clinical information for each cancer study

File type: tab-delimited "biotab" (.txt)
n/a The BCR data dictionary describes the clinical and biospecimen data elements in TCGA. Additional information can be found in the NCI Clinical Data Elements (CDE) Browser. Data Matrix: Select 'Clinical' for Data Type

Bulk Download: Select 'Complete Clinical Set' for Data Type

File Search: Select 'Clinical' for Data Category
Biospecimen data All Clinical Information on how samples from each participant were processed by the Biospecimen Core Resource Center (BCR)

File type: .xml
Compiled patient biospecimen information for each cancer study

File type: tab-delimited "biotab" (.txt)
n/a The BCR data dictionary describes the clinical and biospecimen data elements in TCGA. Additional information can be found in the NCI Clinical Data Elements (CDE) Browser. Data Matrix: Select 'Clinical' for Data Type

Bulk Download: Select 'Complete Clinical Set' for Data Type

File Search: Select 'Clinical' for Data Category
Pathology Reports All Pathology Reports Pathology reports for a subset of participants

File type: .pdf
n/a n/a Available reports are listed in the biospecimen biotab and xml files Bulk Download: Select 'Pathology Reports' for Platform.

Pathology reports cannot be retrieved via the Data Matrix.

File Search: Select 'Clinical' for Data Category.

Images

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
Diagnostic image All Diagnostic image Tissue images used to diagnose participant

File type: .svs (image viewer)
n/a n/a Available images are listed in the biospecimen biotab and xml files Bulk Download: Select 'Diagnostic Images' for Platform. Images cannot be retrieved via the Data Matrix nor File Search.
Tissue image All Tissue Slide Images Images of tissue samples from each participant that were used for TCGA analyses

File type: .svs (image viewer)
n/a n/a Available images are listed in the biospecimen biotab and xml files Bulk Download: Select "Tissue Slide Images" for Data Type.

Images cannot be retrieved via the Data Matrix.

File Search: Select 'Clinical' for Data Category.
Radiological image (available at The Cancer Imaging Archive) See The Cancer Imaging Archive wiki for list of available images n/a Pre-surgical radiological imaging from a subset of participants (e.g. MRI, CT, PET, etc)

File type: DICOM (.dcm)
n/a n/a n/a See The Cancer Imaging Archive site.

Microsatellite Instability (MSI)

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
COAD, READ, UCEC Fragment Analysis Results Markers indicating presence or absence of a MSI shift, allele homozygosity/heterozygosity, and loss of heterozygosity (LOH) observed in the tumor sample for each participant

File types: fragment analysis trace file (.fsa) and tab-delimited (.txt) file summarizing the trace file

(Controlled-access)
n/a Classifications of microsatellite instability detected for each participant's tumor sample

File type: auxiliary.xml
Level 1 data are submitted as part of a standard MAGE-TAB archive

Level 3 data are contained in the BCR clinical data archives (see above)
Level 1: Data Matrix & Bulk Download: Select 'Fragment Analysis Results' for Data Type

File Search: Select 'Other' for Data Category

DNA Sequencing

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
Whole exome sequence (available at the Cancer Genomics Hub) All n/a Whole exome sequence for both tumor and normal sample for each participant

File type: binary alignment file (.bam)

(Controlled-access)
n/a n/a Experimental protocol, including primer information, is contained in the metadata .xml file associated with each .bam file on CGHub See CGHub site
Whole genome sequence (available at the Cancer Genomics Hub) All n/a Whole genome sequence for both tumor and normal sample for select participants

File type: binary alignment file (.bam)

(Controlled-access)
n/a n/a Experimental protocol, including primer information, is contained in the metadata .xml file associated with each .bam file on CGHub See CGHub site
Sequence traces (may be available at the NCBI Trace Archive) GBM, OV Trace-Sample Relationship Raw sequence output from older sequencing technologies

File type: sequence chromatogram format (.scf)
n/a n/a Trace-sample relationship (.tr) files map NCBI trace IDs to TCGA biospecimen barcodes See NCBI Trace Archive
Mutations All Somatic Mutations (coding, splice site, and validated non-coding somatic variants) Whole exome sequence - see above Somatic mutation calls for each participant

File type: mutation annotation file (.somatic.maf)
n/a The mutation data do not have a standard MAGE-TAB archive associated with them yet. The latest mutation file specifications are available on the wiki. Data Matrix & Bulk Download: Select 'Somatic Mutations' for Data Type

File Search: Select 'DNA Mutations' for Data Category

Publication MAF Search
Protected Mutations (including germline variants and unvalidated non-coding somatic variants) Whole genome and exome sequence - see above Somatic and germline mutation calls for each participant

File types: variant call file (.vcf), mutation annotation file (.protected.maf)

(Controlled-access)
n/a The mutation data do not have a standard MAGE-TAB archive associated with them yet. The latest mutation file specifications are available on the wiki. Data Matrix & Bulk Download: Select 'Protected Mutations' for Data Type

File Search: Select 'DNA Mutations' for Data Category

miRNA Sequencing

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
miRNA sequence (available at the Cancer Genomics Hub) All except GBM n/a miRNA sequence for each participant's tumor sample

File type: binary alignment file (.bam)

(Controlled-access)
n/a n/a Experimental protocol, including primer information, is contained in the metadata .xml file associated with each .bam file on CGHub See CGHub site
miRNA All except GBM miRNASeq miRNA sequence for each participant's tumor sample - see above n/a The calculated expression for all reads aligning to a particular miRNA, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'miRNASeq' For Data Type

File Search: Select 'miRNA Expression' for Data Category
Isoform All except GBM miRNASeq miRNA sequence for each participant's tumor sample - see above n/a The calculated expression for each individual miRNA sequence isoform observed, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'miRNASeq' For Data Type

File Search: Select 'miRNA Expression' for Data Category

Protein Expression

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
All Expression - Protein High resolution images of protein array slides (up to 1000 participant tumor samples per slide) and raw signals per slide

File types: .tiff (image viewer) for images and tab-delimited (.txt) for signals
Dilution curves for each sample

File type: tab-delimited (.txt)
Normalized protein expression for each gene, per sample

File type: tab-delimited (.txt)
Array design files, antibody annotations, and the experimental protocol are included in MAGE-TAB archive Data Matrix & Bulk Download: Select 'Expression-Protein' For Data Type

File Search: Select 'Protein Expression' for Data Category

mRNA Sequencing

mRNA sequencing has two versions - V1 and V2. Version 2 differs from Version 1 by the algorithm used to generate the data.
Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
mRNA sequence (available at the Cancer Genomics Hub) All n/a mRNA sequence for each participant's tumor sample

File type: binary alignment file (.bam) and sequence reads (.fastq)

(Controlled-access)
n/a n/a Experimental protocol, including primer information, is contained in the metadata .xml file associated with each .bam file on CGHub See CGHub site
Exon All RNASeqV1, RNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a particular composite exon of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'RNASeqV1/RNASeqV2' For Data Type

File Search: Select 'mRNA Expression' for Data Category. Only the latest platform, RNAseqV2, results are returned under this category.
Gene All RNASeqV1, RNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'RNASeqV1/RNASeqV2' For Data Type

File Search: Select 'mRNA Expression' for Data Category. Only the latest platform, RNAseqV2, results are returned under this category.
Splice Junction All RNASeqV1, RNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a particular composite splice junction of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'RNASeqV1/RNASeqV2' For Data Type

File Search: Select 'mRNA Expression' for Data Category. Only the latest platform, RNAseqV2, results are returned under this category.
Isoform All RNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The normalized expression signal of individual isoforms (transcripts), per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'RNASeqV1/RNASeqV2' For Data Type

File Search: Select 'mRNA Expression' for Data Category. Only the latest platform, RNAseqV2, results are returned under this category.

Total RNA Sequencing

This is a pilot project on a limited number of tumor samples for each of the applicable cancer types.
Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
mRNA Sequencing sequence (available at Cancer Genomics Hub) Applicable to some tumor types TotalRNASeqV2 mRNA sequence for each participant's tumor sample

File type: binary alignment file (.bam) and sequence reads (.fastq)

(Controlled-access)
n/a n/a Experimental protocol, including primer information, is contained in the metadata .xml file associated with each .bam file on CGHub See CGHub site
Exon Applicable to some tumor types TotalRNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a particular composite exon of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive See CGHub site for Total RNA sequence data.

Data Matrix & Bulk Download: Select 'TotalRNASeqV2' For Data Type

File Search: Select 'Other' for Data Category.
Gene Applicable to some tumor types TotalRNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive See CGHub site for Total RNA sequence data.

Data Matrix & Bulk Download: Select 'TotalRNASeqV2' For Data Type

File Search: Select 'Other' for Data Category.
Splice Junction Applicable to some tumor types TotalRNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The calculated expression signal of a particular composite splice junction of a gene, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive See CGHub site for Total RNA sequence data.

Data Matrix & Bulk Download: Select 'TotalRNASeqV2' For Data Type

File Search: Select 'Other' for Data Category.
Isoform Applicable to some tumor types TotalRNASeqV2 mRNA sequence for each participant's tumor sample - see above n/a The normalized expression signal of individual isoforms (transcripts), per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive See CGHub site for Total RNA sequence data.

Data Matrix & Bulk Download: Select 'TotalRNASeqV2' For Data Type

File Search: Select 'Other' for Data Category.

Array-based Expression

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
Gene BRCA, COAD, GBM, KIRC, KIRP, LAML, LGG, LUAD, LUSC, OV, READ, UCEC Expression - Gene Raw signals per probe for each participant's tumor sample

File type: tab-delimited (.txt)
Normalized signals per probe or probe set for each participant's tumor sample

File type: tab-delimited (.txt)
Expression calls for genes, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'Expression-Gene'' For Data Type

File Search: Select 'Other' for Data Category
Exon GBM, OV, LUSC Expression - Exon Raw signals per probe for each participant's tumor sample

File type: binary (.CEL)

(Controlled-access)
Normalized signals per probe or probe set for each participant's tumor sample

File type: .tab-delimited (.txt)

(Controlled-access)
Expression calls for exons/variants, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'Expression-Gene'' For Data Type

File Search: Select 'Other' for Data Category
miRNA GBM, OV Expression - miRNA Raw signals per probe for each participant's tumor sample

File type: tab-delimited (.txt)
Normalized signals per probe or probe set for each participant's tumor sample

File type: tab-delimited (.txt)
Expression calls for miRNAs, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'Expression-Gene'' For Data Type

File Search: Select 'Other' for Data Category

DNA Methylation

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
Bisulfite sequencing Applicable to some tumor types Methylation - Bisulfite Sequencing Whole genome sequence after bisulfite treatment for each tumor sample

File type: binary alignment file (.bam) at CGHub (Controlled-access)
Methylation and mutation calls for each sample

File type: variant call file (.vcf).

These VCF files are slightly different from the standard TCGA VCF format.

(Controlled-access)
Whole genome methylation calls for each CpG site, per sample

File type: tab-delimited (.bed)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive Data Matrix & Bulk Download: Select 'Methylation-Bisulfite Sequencing' For Data Type

File Search: Select 'Other' for Data Category
Array based All Array-based DNA Methylation Raw signal intensities of probes for each participant's tumor sample

File type: tab-delimited (.txt) and binary (.idat)
Calculated beta values per sample

File type: tab-delimited (.txt)
Calculated beta values mapped to genome, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'DNA Methylation' For Data Type

File Search: Select 'DNA Methylation' for Data Category

Copy Number

Data Subtype Cancer Types Applicable Data Type Name Level 1 Level 2 Level 3 Important Metadata How to Retrieve Data Files
SNP All CNV (SNP Array) Raw data for each participant's tumor sample

File types: binary (.CEL), binary (.idat), and tab-delimited (.txt)

(Controlled-access)
Unnormalized SNP, copy number, and LOH data, per sample

File type: tab-delimited (.txt)

(Controlled-access)
Normalized copy number and purity/ploidy data, per sample

File type: tab-delimited (.txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'CNV (SNP)' for Data Type

File Search: Select 'DNA Copy Number' for Data Category
Array GBM, OV, LUSC CNV (CN Array) Raw signals per probe for each participant's tumor sample

File type: tab-delimited (.txt)
Normalized signals for copy number alterations of aggregated regions, per probe or probe set, per tumor sample

File type: tab-delimited (.tsv and .mat)
Copy number alterations for aggregated/segmented regions, per sample

File type: tab-delimited (.tsv and .txt)
Experimental protocol, including calculation methods, is included in the MAGE-TAB archive

Probe information is contained in the Array design files for each platform
Data Matrix & Bulk Download: Select 'CNV (Array)' for Data Type

File Search: Select 'DNA Copy Number' for Data Category
Low-Pass DNA Sequencing Applicable to some tumor types CNV (Low Pass DNASeq) Low pass, whole genome sequence of both tumor and normal samples for each participant and analysis of differences in read counts between the tumor and normal sample. Available at CGHub.

File type: binary alignment file (.bam)

(Controlled-access)
DNA variants for each participant

File type: variant calling file (.vcf)

(Controlled-access)
Regions with differences in genome coverage (number of reads) between normal and tumor samples for each participant

File type: tab-delimited (.tsv)
Experimental protocol, including calculation methods, is included in the DESCRIPTION file of the MAGE-TAB archive Data Matrix & Bulk Download: Select 'CNV (Low-Pass DNA Seq)" for Data Type

File Search: Select 'Other' for Data Category