jump to content jump to footer
PET scans of torso with relevant false positives highlighted by red dotted circles, true positives in green

LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

The recent study “Artificial intelligence for TNM staging in NSCLC – a critical appraisal of segmentation utility in [¹⁸F]FDG PET/CT” provides a critical evaluation of the clinical value of artificial intelligence (AI)–based segmentation in non-small cell lung cancer (NSCLC).

While the majority of publications in this field primarily focus on technical performance metrics, this translational study specifically investigates whether seemingly strong segmentation results translate into clinically meaningful outcomes—namely accurate lesion detection, correct TNM/UICC classification, and ultimately, informed treatment decisions.

Study Design and Methodology

This retrospective, single-center study analyzed [¹⁸F]FDG PET/CT scans from 306 treatment-naïve patients with newly diagnosed NSCLC.

  • Reference Standard: Manual lesion segmentations generated in consensus by two hybrid imaging experts
    • Reporting and staging performed using the CE-certified structured reporting platform mint Lesion
    • TNM classification according to the 9th edition of the TNM staging system, incorporating multidisciplinary tumor board decisions. TNM/UICC classification was semi-automatically performed using mint Lesion, reviewed by experts, and deemed accurate
       
  • AI Comparison: Lesion segmentations generated using the best-performing algorithm from the autoPET III Challenge
    • Segmentation outputs were classified according to the 9th edition TNM system

The rule-based structured segmentation framework in mint Lesion enables reliable semi-automated TNM and UICC classification. In addition, mint Lesion provided the technical foundation of the study by enabling manual lesion segmentation and structured data export for downstream analysis.

Key Study Findings

Technical Segmentation Performance
  • Mean Dice Similarity Coefficient (DSC): 0.64
  • Systematic volumetric overestimation by the AI algorithm
    (mean volume difference: +56.1 mL compared with manual segmentation)
Lesion Detection
  • Very high lesion-level sensitivity: 95.8%
    • T category: 96.7%
    • N category: 95.9%
    • M category: 94.8%
Precision and Sources of Error
  • Moderate precision in the M category (PPV: 73.7%)
  • Most frequent error source: false-positive distant metastases
  • 70.4% of false-positive M lesions represented clinically relevant but benign or non-oncologic findings, including:
    • Degenerative musculoskeletal changes
    • Inflammatory processes such as pneumonia
Impact on Clinical Staging
  • UICC stage concordance with the reference standard in only 67.4% of patients
  • Upstaging observed in 88 of 306 cases
  • Primary drivers of staging discrepancies:
    • False-positive M lesions
    • Undersegmentation in the hilar region

Clinical Interpretation and Conclusions

The study concludes that, despite excellent lesion detection sensitivity (95.8%), the best-performing autoPET III algorithm achieved only 67.6% concordance in UICC staging, indicating substantial limitations for autonomous clinical use.

The clinical relevance of segmentation errors varied considerably, with false-positive lesions leading to upstaging identified as the main cause of staging discrepancies.

Key takeaway: High technical performance does not necessarily equate to clinical reliability.

Accordingly, the authors clearly recommend:

  • AI as a tool for workflow optimization and decision support—not as a replacement
  • Mandatory expert oversight, particularly for:
    • M-stage predictions
    • Complex cases involving multiple lesions

Conclusion

This study underscores the importance of task-oriented AI evaluation that extends beyond conventional segmentation metrics. For the safe clinical integration of AI in oncologic hybrid imaging, structured reporting, transparent workflows, and physician expertise remain indispensable.

For a comprehensive understanding of the methodology, detailed error analysis, and clinical implications, readers are encouraged to consult the original publication.

 

This work was supported by the German Federal Ministry for Research, Technology and Space Affairs (Bundesministerium für Forschung, Technologie und Raumfahrt, BMFTR) under the ‘DataXperiment’ funding initiative (project ID FKZ 01KD2431).

 

Heimer, Maurice M. et al. 2025. „Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [18F]FDG PET/CT”, European Journal of Nuclear Medicine and Molecular Imaging. https://doi.org/10.1007/s00259-025-07677-2.

Radiologist analyzing whole-body MRI scans of prostate cancer bone metastases using AI-assisted tumor load quantification in mint Lesion
How mint Lesion supports radiologists in AI-assisted tumor load quantification for bone metastases in prostate cancer with structured analysis, objective metrics, and longitudinal therapy assessment.
AI-Supported Tumor Load Quantification for Bone Metastasis in Prostate Cancer
To assess treatment response in patients with advanced prostate cancer, radiologists rely on advanced medical imaging. Conventional modalities, such…
Read more
Radiologists participating in a hands-on lung cancer screening workshop using structured reporting software mint Lesion at RÖKO 2026
Interactive hands-on workshop on lung cancer screening according to G-BA guidelines at RÖKO 2026, including structured reporting, double reading workflows, and consensus decision-making
Hands-On Workshop Lung Cancer Screening: From Initial Read to Consenus
From initial read to consensus – structured reporting in practice at RÖKO 2026
Read more
Radiologist analyzing whole-body MRI scans of multiple myeloma using AI-assisted quantification in mint Lesion
How mint Lesion supports radiologists in AI-assisted quantification of bone involvement in multiple myeloma with structured analysis, objective metrics, and longitudinal disease tracking.
AI-Supported Quantification of Bone Involvement in Multiple Myeloma
Radiologists utilize Whole-Body MRI (WB-MRI) as an established imaging method for multiple myeloma staging [1,3]. Because it avoids ionizing…
Read more
scroll-top