Conexiant
Login
  • The Analytical Scientist
  • The Cannabis Scientist
  • The Medicine Maker
  • The Ophthalmologist
  • The Pathologist
  • The Traditional Scientist
The Pathologist
  • Explore Pathology

    Explore

    • Latest
    • Insights
    • Case Studies
    • Opinion & Personal Narratives
    • Research & Innovations
    • Product Profiles

    Featured Topics

    • Molecular Pathology
    • Infectious Disease
    • Digital Pathology

    Issues

    • Latest Issue
    • Archive
  • Subspecialties
    • Oncology
    • Histology
    • Cytology
    • Hematology
    • Endocrinology
    • Neurology
    • Microbiology & Immunology
    • Forensics
    • Pathologists' Assistants
  • Training & Education

    Career Development

    • Professional Development
    • Career Pathways
    • Workforce Trends

    Educational Resources

    • Guidelines & Recommendations
    • App Notes

    Events

    • Webinars
    • Live Events
  • Events
    • Live Events
    • Webinars
  • Profiles & Community

    People & Profiles

    • Power List
    • Voices in the Community
    • Authors & Contributors
  • Multimedia
    • Video
    • Podcasts
Subscribe
Subscribe

False

The Pathologist / Issues / 2014 / Oct / Proteome Mapping: A Word of Caution
Omics Bioinformatics Quality assurance and quality control

Proteome Mapping: A Word of Caution

By Michael Tress 10/03/2014 1 min read

Share

When the two proteome maps appeared in Nature, the numbers certainly raised some eyebrows. My colleagues and I are part of the GENCODE consortium, which is annotating the human genome, so we are very interested in large-scale proteomics information. We were also in the process of publishing our own analysis (1), and we were surprised by what these papers were reporting. How had they managed to find more protein products from genes than any previous experiment of this kind, finding several thousand more genes than the entire combined efforts of the worldwide human genome project, all without any kind of technological breakthrough?

When we looked at our data, we noticed we had not identified any peptides for olfactory receptors (ORs). Further, other databases, such as PeptideAtlas and PRIDE-Q (2), which I consider to contain high quality data, also identified very few ORs. We therefore reasoned that a study which identifies multiple ORs (Pandey’s group found 108, Kuster’s 200) is likely to be unreliable. We decided to investigate. We carried out a quality test on the ORs the groups had found, and this produced some concerning results. For example, the Pandey data shows that ORs are most highly expressed in the liver (3). For us, this confirmed what we had suspected – the data was not reliable.

We carried out a reanalysis of the peptides detected in both experiments, and found reliable evidence for between 7,500 and 8,000 of the genes they identified. The Pandey group’s data was entirely their own, published previously in the Journal of Proteomics Research. The Kuster group carried out comparable experiments on a similar number of tissues (using CellZome technology), but in their paper they also included results from a reanalysis of the spectra from previously published large-scale experiments. However, they did not provide the results of their re-analyses, meaning we could only analyze the CellZome data, which is 25 percent (roughly 4,500 peptides) of the Kuster data (although the CellZome data alone identifies genes for 36 ORs).

The Pandey group reported 17,296 genes and the Kuster group over 18,000. I personally believe that the Mann Group (4, 5) identified as many if not more protein coding genes than the Pandey group and Kuster’s CellZome experiments. We carried out a comparable analysis of these experiments at the same time as the proteome map data, and after filtering the peptides we found that the various studies had identified 8,050 (Nagaraj et al), 8,929 (Geiger et al), 7,972 (Kuster CellZome) and 7,458 (Pandey). This led us to conclude that the two proteome maps contained questionable data.

Our analysis identified many factors which I think contributed to this data inflation: the inclusion of poor quality spectra, using a single peptide to identify multiple genes, confusion between leucine and isoleucine, the use of two search engines to increase the peptide coverage rather than to increase the reliability of the peptide spectrum matches, and the combination of multiple experiments (which ratchets up false positive rates). Some of the problems we identified only affected one of the two data sets and some affected both (3).

These two studies stand out because they analyzed a wide range of human tissues, rather than cell lines. It’s possible that research groups carrying out tissue-specific experiments will use this data as a gold standard, and even now will be writing proposals based on it. This concerns me because I think, at best, this data will not aid good scientific research. At worst, I suspect using this data could be a poor use of time and resources. In situations like these, the onus is on the authors to provide information that is as accurate as possible. Large-scale evidence for cross-tissue peptide expression would be a real step forward for proteomics. However, the information provided by these draft proteome maps cannot be used without first filtering out large amounts of possibly unreliable data.

Have an opinion on this topic? Please feel free to join the debate, by posting a comment below.

Newsletters

Receive the latest pathology news, personalities, education, and career development – weekly to your inbox.

Newsletter Signup Image

References

  1. I. Ezkurdia et al., “Multiple Evidence Strands Suggest That There May Be As Few As 19,000 Human Protein-Coding Genes”, Human. Mol. Genet. (2014) [Epub ahead of print]. J.A. Vizcaino et al., “The Proteomics Identifications (PRIDE) Database and Associated Tools: Status in 2013”, Nucleic Acids Res., 41 (D1), D1063-D1069 (2013). I. Ezkurdia et al., “Analyzing the First Drafts of the Human Proteome”, J. Prot. Res., 13, 3854-55 (2014). T. Geiger et al., “Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of most Proteins”, Mol. Cell. Prot., 11(3), 1-11 (2012) doi: 10.1074/mcp.M111.014050.5. N. Nagaraj et al., “Deep Proteome and Transcriptome Mapping of a Human Cancer Cell Line”, Mol. Syst. Biol., 7, 548 (2011).

About the Author(s)

Michael Tress

Michael Tress is a staff scientist at the Spanish Cancer Research Centre (CNIO), Madrid, Spain.

More Articles by Michael Tress

Explore More in Pathology

Dive deeper into the world of pathology. Explore the latest articles, case studies, expert insights, and groundbreaking research.

False

Advertisement

Recommended

False

Related Content

Turning Tides
Omics
Turning Tides

January 9, 2024

3 min read

A new study shows evidence for sustained human-to-human transmission of mpox since 2016

“Pop” Goes the Sensor
Omics
“Pop” Goes the Sensor

January 19, 2022

1 min read

A new device could help scientists identify signs of arrhythmia, heart attack, and cardiac fibrosis

Improving Risk Stratification
Omics
Improving Risk Stratification

February 3, 2022

1 min read

Two genes have been identified that may be linked to prostate cancer outcomes

The Ultimate Vision for Rare Disease
Omics
The Ultimate Vision for Rare Disease

February 28, 2022

1 min read

Genomics and computational pathology can take rare disease diagnostics to the next level

False

The Pathologist
Subscribe

About

  • About Us
  • Work at Conexiant Europe
  • Terms and Conditions
  • Privacy Policy
  • Advertise With Us
  • Contact Us

Copyright © 2025 Texere Publishing Limited (trading as Conexiant), with registered number 08113419 whose registered office is at Booths No. 1, Booths Park, Chelford Road, Knutsford, England, WA16 8GS.