Skip Navigation
National Institute of Health The Cancer Genome Atlas National Cancer Institute National Human Genome Research Institute
Home > Publications > Comprehensive molecular portraits of human breast tumors

Comprehensive molecular portraits of human breast tumors

Nature, September 27, 2012 [doi:10.1038/nature11412]


We analyzed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, mRNA arrays, microRNA sequencing and reverse phase protein arrays. Our ability to integrate information across platforms provided key insights into previously-defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at > 10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the Luminal A subtype. We identified two novel protein expression-defined subgroups, possibly contributed by stromal/microenvironmental elements, and integrated analyses identified specific signaling pathways dominant in each molecular subtype including a HER2/p-HER2/HER1/p-HER1 signature within the HER2-Enriched expression subtype. Comparison of Basal-like breast tumors with high-grade Serous Ovarian tumors showed many molecular commonalities, suggesting a related etiology and similar therapeutic opportunities. The biologic finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biologic subtypes of breast cancer.

Associated Data Files

These data represent a data freeze from November 11, 2011. Please note that more recent data are available via the TCGA Data Portal.

Sample Lists - below are links to cumulative list of samples for the publication

Final Full BRCA Sample Summary
Individual Analysis Sample Summaries

Mutations - Level 2 MAF archives containing exome-based somatic mutations

File used in manuscript

  • Somatic MAF archive [tar.gz] (controlled access)
  • Somatic MAF archive md5 (controlled access)
  • Mutations - Publicly accessible MAF archives

    The mutations in this MAF archive are a subset of those in the controlled access archive. The mutations include only those that have been verified as somatic. Mutations that could not be verified as somatic have been removed.

  • Somatic MAF archive [tar.gz] (public access)
  • Somatic MAF archive md5 (public access)
  • Mutations - Germline MAF File (controlled access)

    This file contain germline mutations and is access controlled. If you need to obtain access, see this link for instructions

    RNA Expression

    Data Matrix Files
  • - Full Expression Data set consisting of 522 primary tumors, 3 metastatic tumors, and 22 tumor-adjacent normal samples. Data was median centered by genes.
  • - Data freeze with data on 5/6 platforms (mRNA, miRNA, methylation, copy number, and whole exome sequencing). 463 primary tumors and 3 metastatic tumors. Data was median centered by genes.
  • - Data freeze with data on 6 platforms (mRNA, miRNA, methylation, copy number, protein, and whole exome sequencing). 348 primary tumors. Data was median centered by genes.
  • BRCA.547.PAM50.SigClust.Subtypes.txt -PAM50 and SigClust Subtype Assignments.
  • Level 3 Data Archives
    Level 2 Data Archives
    Level 1 Data Archives

    SNP and Copy Number

    miRNA Expression

    Data Matrix Files
    miRNA expression, precursor miRNA genes, normalized to reads per million mapped miRNAs
  • BRCA.780.precursor.txt - all samples
  • BRCA.466.precursor.txt - data freeze 5 platforms
  • BRCA.348.precursor.txt - data freeze 6 platforms
  • miRNA expression, mature/star miRNA strands, normalized to reads per million mapped miRNAs
  • BRCA.780.mimat.txt - all samples
  • BRCA.466.mimat.txt - data freeze 5 platforms
  • BRCA.348.mimat.txt - data freeze 6 platforms
  • Filtered Data set and Results for NMF Clustering


    Reverse Phase Protein Array (RPPA)

    Data Sets
  • rppa-171Ab-403samp.gct
  • rppaData-403Samp-171Ab-notTrimmed.txt
  • rppaData-403Samp-171Ab-Trimmed.txt - Data scaled by antibody and extremes values were trimmed (1.5% lowest and 1.5% highest) to increase the contrast of the expression.
  • RPPA Subtype Calls
  • Level 3 Data Archives
    Level 2 Data Archives
    Level 1 Data Archives

    Exome Sequence BAM File References

  • BRCA.CGhub.bam.files.txt - Bam files at CGHub
  • - Aliquot barcodes for files not yet at CGHub
  • Clinical - below are links to clinical data for the publication.

    Firehose Pipeline Archives

  • 2012012400.tar (6.5G) (controlled access)
  • 2012012400.tar.md5 (controlled access)
  • 2012011000.tar (6.5G) (controlled access)
  • 2012011000.tar.md5 (controlled access)
  • 2011102600.tar (3.13G) (controlled access)
  • 2011102600.tar.md5 (controlled access)
  • Views of the Data

    TCGA Data Portal

  • BRCA Data Access Matrix - Paste the BRCA sample list described in Sample Lists above to filter the Matrix for this publication.
  • Analytical Tools
  • Institute for Systems Biology

    Memorial Sloan Kettering Cancer Center


    MD Anderson

    Additional Information