Researchers at the University of Tokyo and RIKEN have developed a new computational tool to analyze complex microbiome data, aiming to better identify links between microbes and metabolites that may influence health and disease. The method, called VBayesMM, is a variational Bayesian neural network designed to improve prediction accuracy and interpretability in multiomics datasets that combine genomic and metabolomic information.
Microbiome studies often rely on integrating diverse data types, such as DNA sequencing and mass spectrometry, to understand how gut microorganisms affect metabolic processes. However, existing analytical methods struggle with the high dimensionality of these datasets. Traditional statistical tools, including sparse partial least squares regression, can oversimplify relationships, while earlier neural network models such as MMvec may fail to capture uncertainty in their predictions.
VBayesMM addresses these challenges by using a Bayesian inference framework with a “spike-and-slab” prior – a statistical approach that highlights the most relevant microbial species while minimizing background noise. This enables the model to estimate co-occurrence probabilities between microbes and metabolites and quantify the uncertainty of those relationships.
The researchers tested the model on four public datasets from mouse and human studies, including samples related to obstructive sleep apnea, high-fat diet, gastric cancer, and colorectal cancer. These datasets combined 16S rRNA or shotgun metagenomic sequencing with mass spectrometry–based metabolomic profiling. VBayesMM’s performance was compared with that of MMvec, MiMeNet, and sparse PLS approaches using symmetric mean absolute percentage error (SMAPE) as a measure of accuracy.
Across all datasets, VBayesMM showed consistently lower SMAPE values, indicating better predictive performance. In the sleep apnea dataset, for instance, the model achieved around 35 percent SMAPE compared with 47 percent for MMvec and 69 percent for the traditional sPLS method. It also maintained stability when applied to large-scale analyses, such as colorectal cancer data with more than 57,000 microbial taxa, though it required longer computational times.
The model identified core microbial groups that contribute most strongly to metabolite variations, including Lachnospiraceae and Ruminococcaceae, which are known to influence bile acid metabolism and inflammatory pathways. These findings suggest that VBayesMM can help researchers isolate biologically meaningful microbial–metabolite relationships without bias.
While the model’s authors note that further refinement is needed to improve its efficiency in unbalanced datasets, they emphasize that VBayesMM offers a scalable, open-source framework for integrating large microbiome datasets. By quantifying predictive uncertainty and focusing on key microbial contributors, the method may support more precise diagnostics and experimental design in microbiome research, providing pathologists and laboratory scientists with a reproducible tool for exploring microbial–metabolic interactions in health and disease.
