Conexiant
Login
  • The Analytical Scientist
  • The Cannabis Scientist
  • The Medicine Maker
  • The Ophthalmologist
  • The Pathologist
  • The Traditional Scientist
The Pathologist
  • Explore Pathology

    Explore

    • Latest
    • Insights
    • Case Studies
    • Opinion & Personal Narratives
    • Research & Innovations
    • Product Profiles

    Featured Topics

    • Molecular Pathology
    • Infectious Disease
    • Digital Pathology

    Issues

    • Latest Issue
    • Archive
  • Subspecialties
    • Oncology
    • Histology
    • Cytology
    • Hematology
    • Endocrinology
    • Neurology
    • Microbiology & Immunology
    • Forensics
    • Pathologists' Assistants
  • Training & Education

    Career Development

    • Professional Development
    • Career Pathways
    • Workforce Trends

    Educational Resources

    • Guidelines & Recommendations
    • App Notes

    Events

    • Webinars
    • Live Events
  • Events
    • Live Events
    • Webinars
  • Profiles & Community

    People & Profiles

    • Power List
    • Voices in the Community
    • Authors & Contributors
  • Multimedia
    • Video
    • Podcasts
Subscribe
Subscribe

False

The Pathologist / Issues / 2018 / Sep / Translational Proteomics: Solving the Reproducibility Riddle
Profession Omics Molecular Pathology

Translational Proteomics: Solving the Reproducibility Riddle

Data analysis in proteomics is not fit for purpose – here’s how we can get on track

By David Chiang 09/13/2018 1 min read

Share

Proteomics, with its unlimited potential for biomedicine, has so far fallen short. I believe the reason is simple: sophisticated big data is being processed by simplistic bioinformatics with underpowered computers. Novices are dazzled by thousands of proteins characterized at the push of a button. But experts find that it is mostly common proteins that are correctly identified, much of the quantitation is suspect, and – critically – it is hard to tell whether an identification is correct. How can we improve the utility of proteomics for identifying important low-abundance proteins? The trick is to borrow data analysis from numerical data mining in physics, not abstract statistics. Let’s say we run a pneumonia sample to identify pathogens from proteins with a mass spectrometer. We process a gigabyte file with 50K raw spectra with a fast PC program that identifies and quantifies peptides and proteins from 20 percent of the spectra at 1 percent error. When analysis is so easy, who needs hypotheses or data understanding? We just need “better” software – defined as faster and cheaper and reporting more proteins. Of course, this assumes 1 percent error is enough, a self-estimated error is always robust, and quantity means quality – all of which are obviously incorrect.

As an analogy, imagine a novel space telescope with revolutionary accuracy, which eases data analysis; no cosmologist would acquire ad hoc imaging data and then shop for prototype software that identifies the most stars for publication, sight unseen. This unscientific approach would find thousands of bright stars but give irreproducible discoveries of faint ones. Content-rich physical data are heterogeneous, with varying signal-to-noise. Deep data require exponentially more computing to mathematically scrub. Experts can best interpret tricky data. But it’s impossible to uncover one-in-a-million breakthrough data points from particle colliders, telescopes, and now mass spectrometers without computers. Such data require semi-interactive divide-and-conquer – using servers to run overnight “what if” scripts to isolate interesting data pockets for interactive analysis. For clinical research, 1 percent error is hopelessly imprecise. In infection research, where 99 percent of detected proteins are human, 1 percent false discovery rate (FDR) could mean no pathogen information. For every 10K peptides identified, 100 are incorrectly assigned to corrupt quantitation of 100 proteins. Clinical research requires 100 percent accuracy for a few low-abundance proteins, not 99 percent including thousands of irrelevant abundant ones. It requires a precision paradigm centered on raw data, not probability models.

In conventional proteomics, data interpretation is outsourced to calculations few understand. Researchers choose a subjective search engine, rely on subjective probabilities to judge peptide IDs, depend on Bayesian inference to aggregate peptide IDs to identify a protein, and evaluate results quality with a single error estimate. A precise and rigorous abstraction requires three changes. First, simplify protein inference by representing each protein with its longest identified peptide (ideally long enough to be protein-unique). Second, peptide ID filtering should use only physical mass data, not model-based parameters, such as search scores. Finally, the search engine must be demoted from a central role to merely an “educated guesser” of peptide ID hypotheses to be mass-filtered. For example, in infection research, we develop a hypothesis, acquire data, and then interpret data. The experimental goal is to identify and characterize at least one critical peptide from its noisy spectrum. Importantly, this analysis can be manually validated by an expert. We may hypothesize a certain pathogen, design a data-independent acquisition (DIA) experiment to maximize the odds of finding certain protein-identifying peptides, then do perhaps a dozen runs to try to capture literally one-in-a-million spectra relevant to our hypothesis. Deep research is inherently a numbers game; new technologies just help increase the odds. In my view, the narrative that omics means hypothesis-free science is fundamentally flawed. The role of computers and artificial intelligence is to assist – not to replace – scientists who formulate hypotheses and interpret data.

Newsletters

Receive the latest pathology news, personalities, education, and career development – weekly to your inbox.

Newsletter Signup Image

About the Author(s)

David Chiang

David Chiang is Chairman of Sage-N Research, Inc., Milpitas, USA.

More Articles by David Chiang

Explore More in Pathology

Dive deeper into the world of pathology. Explore the latest articles, case studies, expert insights, and groundbreaking research.

False

Advertisement

Recommended

False

False

The Pathologist
Subscribe

About

  • About Us
  • Work at Conexiant Europe
  • Terms and Conditions
  • Privacy Policy
  • Advertise With Us
  • Contact Us

Copyright © 2025 Texere Publishing Limited (trading as Conexiant), with registered number 08113419 whose registered office is at Booths No. 1, Booths Park, Chelford Road, Knutsford, England, WA16 8GS.