LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

PET scans of torso with relevant false positives highlighted by red dotted circles, true positives in green

12/2025

LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

The recent study “Artificial intelligence for TNM staging in NSCLC – a critical appraisal of segmentation utility in [¹⁸F]FDG PET/CT” provides a critical evaluation of the clinical value of artificial intelligence (AI)–based segmentation in non-small cell lung cancer (NSCLC).

While the majority of publications in this field primarily focus on technical performance metrics, this translational study specifically investigates whether seemingly strong segmentation results translate into clinically meaningful outcomes—namely accurate lesion detection, correct TNM/UICC classification, and ultimately, informed treatment decisions.

Study Design and Methodology

This retrospective, single-center study analyzed [¹⁸F]FDG PET/CT scans from 306 treatment-naïve patients with newly diagnosed NSCLC.

Reference Standard: Manual lesion segmentations generated in consensus by two hybrid imaging experts
- Reporting and staging performed using the CE-certified structured reporting platform mint Lesion
- TNM classification according to the 9th edition of the TNM staging system, incorporating multidisciplinary tumor board decisions. TNM/UICC classification was semi-automatically performed using mint Lesion, reviewed by experts, and deemed accurate
AI Comparison: Lesion segmentations generated using the best-performing algorithm from the autoPET III Challenge
- Segmentation outputs were classified according to the 9th edition TNM system

The rule-based structured segmentation framework in mint Lesion enables reliable semi-automated TNM and UICC classification. In addition, mint Lesion provided the technical foundation of the study by enabling manual lesion segmentation and structured data export for downstream analysis.

Key Study Findings

Technical Segmentation Performance

Mean Dice Similarity Coefficient (DSC): 0.64
Systematic volumetric overestimation by the AI algorithm
(mean volume difference: +56.1 mL compared with manual segmentation)

Lesion Detection

Very high lesion-level sensitivity: 95.8%
- T category: 96.7%
- N category: 95.9%
- M category: 94.8%

Precision and Sources of Error

Moderate precision in the M category (PPV: 73.7%)
Most frequent error source: false-positive distant metastases
70.4% of false-positive M lesions represented clinically relevant but benign or non-oncologic findings, including:
- Degenerative musculoskeletal changes
- Inflammatory processes such as pneumonia

Impact on Clinical Staging

UICC stage concordance with the reference standard in only 67.4% of patients
Upstaging observed in 88 of 306 cases
Primary drivers of staging discrepancies:
- False-positive M lesions
- Undersegmentation in the hilar region

Clinical Interpretation and Conclusions

The study concludes that, despite excellent lesion detection sensitivity (95.8%), the best-performing autoPET III algorithm achieved only 67.6% concordance in UICC staging, indicating substantial limitations for autonomous clinical use.

The clinical relevance of segmentation errors varied considerably, with false-positive lesions leading to upstaging identified as the main cause of staging discrepancies.

Key takeaway: High technical performance does not necessarily equate to clinical reliability.

Accordingly, the authors clearly recommend:

AI as a tool for workflow optimization and decision support—not as a replacement
Mandatory expert oversight, particularly for:
- M-stage predictions
- Complex cases involving multiple lesions

Conclusion

This study underscores the importance of task-oriented AI evaluation that extends beyond conventional segmentation metrics. For the safe clinical integration of AI in oncologic hybrid imaging, structured reporting, transparent workflows, and physician expertise remain indispensable.

For a comprehensive understanding of the methodology, detailed error analysis, and clinical implications, readers are encouraged to consult the original publication.

This work was supported by the German Federal Ministry for Research, Technology and Space Affairs (Bundesministerium für Forschung, Technologie und Raumfahrt, BMFTR) under the ‘DataXperiment’ funding initiative (project ID FKZ 01KD2431).

Heimer, Maurice M. et al. 2025. „Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [¹⁸F]FDG PET/CT”, European Journal of Nuclear Medicine and Molecular Imaging. https://doi.org/10.1007/s00259-025-07677-2.

Related Resources

Research poster presentation on standardized tumor response assessments using mint Lesion at the NCCN Annual Conference 2026 — Presentation of a research poster by Memorial Sloan Kettering Cancer Center on the use of mint Lesion for structured tumor response assessments and clinical research workflows.

NCCN 2026: Memorial Sloan Kettering Cancer Center Presents Research on Structured Tumor Response Assessments with mint Lesion

05/2026

Big congratulations to Steven Philemond and Alison Chiaramonte of Memorial Sloan Kettering Cancer Center for presenting their research poster “Use of…

Radiologist using mint Lesion for structured reporting and AI-supported workflows in lung cancer screening — How radiology practices can participate in lung cancer screening in Germany with mint Lesion through structured reporting, AI-supported workflows, and integrated collaboration with second-reading centers.

How Radiology Practices Can Participate in Germany’s Lung Cancer Screening Program with mint Lesion

05/2026

New opportunities - but also economic uncertainty With the launch of lung cancer screening (LCS) in Germany, radiology practices are facing a new…

Hospital staff using mint Lesion for interoperable workflows and data management in lung cancer screening — How mint Lesion supports hospitals and screening centers with interoperable infrastructure, AI integration, data management, and scalable workflows for lung cancer screening.

Lung Cancer Screening in Germany: How mint Lesion Supports Hospitals with Infrastructure, Integration, and Scalability

05/2026

Screening as a Strategic Challenge With the launch of the national lung cancer screening program in 2026, hospitals and screening centers across…

Name	Purpose	Lifetime	Type	Provider
CookieConsent	Saves your consent to using cookies.	1 year	HTML	Website
fe_typo_user	Assigns your browser to a session on the server.	session	HTTP	Website

Name	Purpose	Lifetime	Type	Provider
_pk_id	Used to store a few details about the user such as the unique visitor ID.	13 months	HTML	Matomo
_pk_ref	Used to store the attribution information, the referrer initially used to visit the website.	6 months	HTML	Matomo
_pk_ses	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_cvar	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

Study Design and Methodology

Reference Standard: Manual lesion segmentations generated in consensus by two hybrid imaging experts

AI Comparison: Lesion segmentations generated using the best-performing algorithm from the autoPET III Challenge

Key Study Findings

Technical Segmentation Performance

Lesion Detection

Precision and Sources of Error

Impact on Clinical Staging

Clinical Interpretation and Conclusions

Conclusion

Related Resources

NCCN 2026: Memorial Sloan Kettering Cancer Center Presents Research on Structured Tumor Response Assessments with mint Lesion

How Radiology Practices Can Participate in Germany’s Lung Cancer Screening Program with mint Lesion

Lung Cancer Screening in Germany: How mint Lesion Supports Hospitals with Infrastructure, Integration, and Scalability