A multi-institutional study has combined spatial transcriptomics, pseudotime modeling, and machine learning to identify diagnostic biomarkers for prostate cancer that are measurable using standard clinical methods. The research, led by scientists at the Karolinska Institute and partners, sought to address a persistent challenge in oncology: the lack of reliable, accessible biomarkers for early cancer detection.
The team analyzed spatial transcriptomic data from prostate tissue and used a computational method called pseudotime analysis to track how cells change over time. This helped identify genes linked to cancer development. Out of more than 19,000 transcriptomic spots analyzed, 45 genes were consistently connected to both pseudotime and cancer grade.
Candidate biomarkers were evaluated across multiple datasets, including single-cell RNA sequencing, immunohistochemistry (IHC), bulk RNA-seq, and proteomics from blood, tissue, and urine samples. These markers demonstrated consistent differential expression between cancerous and noncancerous samples. For example, SPON2 protein expression, examined via IHC, was low in normal and hyperplastic prostate tissues but increased in high-grade tumors.
To test the diagnostic utility of these markers, the researchers developed predictive models using protein expression data. A urine-based model, using proteins derived from the candidate genes, achieved an area under the curve (AUC) of 0.92 – outperforming serum PSA (AUC 0.63) and randomly selected proteins (AUC 0.88). In contrast, models using plasma proteomics yielded more modest diagnostic performance (AUC 0.69), suggesting that urine may be a more informative biofluid for prostate cancer diagnostics.
The urine-based models also showed potential for cancer grading, achieving a higher correlation with Gleason grade groups than PSA alone. This highlights a possible role for the identified biomarkers not only in detection but also in stratification of disease severity.
Unlike purely data-driven approaches, the biomarkers were selected based on their mechanistic link to disease progression, including associations with copy number aberrations, hallmark cancer pathways, and known drug targets. This biological grounding may enhance clinical confidence in diagnostic decisions.
Study limitations acknowledged by the authors include modest sample sizes in some validation datasets and potential variability in marker expression. They call for prospective studies to further assess the diagnostic performance of the candidate biomarkers, particularly in urine-based assays. All datasets and code have been made publicly available to facilitate independent validation and adaptation of the workflow for other cancer types.