jump to content jump to footer
PET scans of torso with relevant false positives highlighted by red dotted circles, true positives in green

LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

The recent study “Artificial intelligence for TNM staging in NSCLC – a critical appraisal of segmentation utility in [¹⁸F]FDG PET/CT” provides a critical evaluation of the clinical value of artificial intelligence (AI)–based segmentation in non-small cell lung cancer (NSCLC).

While the majority of publications in this field primarily focus on technical performance metrics, this translational study specifically investigates whether seemingly strong segmentation results translate into clinically meaningful outcomes—namely accurate lesion detection, correct TNM/UICC classification, and ultimately, informed treatment decisions.

Study Design and Methodology

This retrospective, single-center study analyzed [¹⁸F]FDG PET/CT scans from 306 treatment-naïve patients with newly diagnosed NSCLC.

  • Reference Standard: Manual lesion segmentations generated in consensus by two hybrid imaging experts
    • Reporting and staging performed using the CE-certified structured reporting platform mint Lesion
    • TNM classification according to the 9th edition of the TNM staging system, incorporating multidisciplinary tumor board decisions. TNM/UICC classification was semi-automatically performed using mint Lesion, reviewed by experts, and deemed accurate
       
  • AI Comparison: Lesion segmentations generated using the best-performing algorithm from the autoPET III Challenge
    • Segmentation outputs were classified according to the 9th edition TNM system

The rule-based structured segmentation framework in mint Lesion enables reliable semi-automated TNM and UICC classification. In addition, mint Lesion provided the technical foundation of the study by enabling manual lesion segmentation and structured data export for downstream analysis.

Key Study Findings

Technical Segmentation Performance
  • Mean Dice Similarity Coefficient (DSC): 0.64
  • Systematic volumetric overestimation by the AI algorithm
    (mean volume difference: +56.1 mL compared with manual segmentation)
Lesion Detection
  • Very high lesion-level sensitivity: 95.8%
    • T category: 96.7%
    • N category: 95.9%
    • M category: 94.8%
Precision and Sources of Error
  • Moderate precision in the M category (PPV: 73.7%)
  • Most frequent error source: false-positive distant metastases
  • 70.4% of false-positive M lesions represented clinically relevant but benign or non-oncologic findings, including:
    • Degenerative musculoskeletal changes
    • Inflammatory processes such as pneumonia
Impact on Clinical Staging
  • UICC stage concordance with the reference standard in only 67.4% of patients
  • Upstaging observed in 88 of 306 cases
  • Primary drivers of staging discrepancies:
    • False-positive M lesions
    • Undersegmentation in the hilar region

Clinical Interpretation and Conclusions

The study concludes that, despite excellent lesion detection sensitivity (95.8%), the best-performing autoPET III algorithm achieved only 67.6% concordance in UICC staging, indicating substantial limitations for autonomous clinical use.

The clinical relevance of segmentation errors varied considerably, with false-positive lesions leading to upstaging identified as the main cause of staging discrepancies.

Key takeaway: High technical performance does not necessarily equate to clinical reliability.

Accordingly, the authors clearly recommend:

  • AI as a tool for workflow optimization and decision support—not as a replacement
  • Mandatory expert oversight, particularly for:
    • M-stage predictions
    • Complex cases involving multiple lesions

Conclusion

This study underscores the importance of task-oriented AI evaluation that extends beyond conventional segmentation metrics. For the safe clinical integration of AI in oncologic hybrid imaging, structured reporting, transparent workflows, and physician expertise remain indispensable.

For a comprehensive understanding of the methodology, detailed error analysis, and clinical implications, readers are encouraged to consult the original publication.

 

This work was supported by the German Federal Ministry for Research, Technology and Space Affairs (Bundesministerium für Forschung, Technologie und Raumfahrt, BMFTR) under the ‘DataXperiment’ funding initiative (project ID FKZ 01KD2431).

 

Heimer, Maurice M. et al. 2025. „Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [18F]FDG PET/CT”, European Journal of Nuclear Medicine and Molecular Imaging. https://doi.org/10.1007/s00259-025-07677-2.

Research poster presentation on standardized tumor response assessments using mint Lesion at the NCCN Annual Conference 2026
Presentation of a research poster by Memorial Sloan Kettering Cancer Center on the use of mint Lesion for structured tumor response assessments and clinical research workflows.
NCCN 2026: Memorial Sloan Kettering Cancer Center Presents Research on Structured Tumor Response Assessments with mint Lesion
Big congratulations to Steven Philemond and Alison Chiaramonte of Memorial Sloan Kettering Cancer Center for presenting their research poster “Use of…
Read more
Radiologist using mint Lesion for structured reporting and AI-supported workflows in lung cancer screening
How radiology practices can participate in lung cancer screening in Germany with mint Lesion through structured reporting, AI-supported workflows, and integrated collaboration with second-reading centers.
How Radiology Practices Can Participate in Germany’s Lung Cancer Screening Program with mint Lesion
New opportunities - but also economic uncertainty With the launch of lung cancer screening (LCS) in Germany, radiology practices are facing a new…
Read more
Hospital staff using mint Lesion for interoperable workflows and data management in lung cancer screening
How mint Lesion supports hospitals and screening centers with interoperable infrastructure, AI integration, data management, and scalable workflows for lung cancer screening.
Lung Cancer Screening in Germany: How mint Lesion Supports Hospitals with Infrastructure, Integration, and Scalability
Screening as a Strategic Challenge With the launch of the national lung cancer screening program in 2026, hospitals and screening centers across…
Read more
scroll-top