Study evaluates LMMs for lung cancer CT scan diagnostics.
Models tested include GPT-4V, LLaVA-1.5, and BiomedCLIP.
GPT-4V had the highest agreement of 75 percent.
Accuracy for NCCN risk assessment was under 50 percent.
LLaVA-1.5 produced the most consistent descriptive results.
Models exhibited hallucinations and alignment issues.
Future improvements needed through collaboration with radiology professionals.
Which Generative AI Model is the Best Diagnostician?
Comparative evaluation highlights current limitations and potential of LMMs in chest CT diagnostics
08/11/2025
News
1 min read