jump to content jump to footer
PET scans of torso with relevant false positives highlighted by red dotted circles, true positives in green
Representative prediction pitfalls in cases with high DSC

LMU University Hospital: Artificial Intelligence for TNM Staging in NSCLC – How Reliable Are AI-Based Segmentations?

The recent study “Artificial intelligence for TNM staging in NSCLC – a critical appraisal of segmentation utility in [¹⁸F]FDG PET/CT” provides a critical evaluation of the clinical value of artificial intelligence (AI)–based segmentation in non-small cell lung cancer (NSCLC).

While the majority of publications in this field primarily focus on technical performance metrics, this translational study specifically investigates whether seemingly strong segmentation results translate into clinically meaningful outcomes—namely accurate lesion detection, correct TNM/UICC classification, and ultimately, informed treatment decisions.

Study Design and Methodology

This retrospective, single-center study analyzed [¹⁸F]FDG PET/CT scans from 306 treatment-naïve patients with newly diagnosed NSCLC.

  • Reference Standard: Manual lesion segmentations generated in consensus by two hybrid imaging experts

    • Reporting and staging performed using the CE-certified structured reporting platform mint Lesion
    • TNM classification according to the 9th edition of the TNM staging system, incorporating multidisciplinary tumor board decisions. TNM/UICC classification was semi-automatically performed using mint Lesion, reviewed by experts, and deemed accurate
  • AI Comparison: Lesion segmentations generated using the best-performing algorithm from the autoPET III Challenge

    • Segmentation outputs were classified according to the 9th edition TNM system

The rule-based structured segmentation framework in mint Lesion enables reliable semi-automated TNM and UICC classification. In addition, mint Lesion provided the technical foundation of the study by enabling manual lesion segmentation and structured data export for downstream analysis.

Key Study Findings

Technical Segmentation Performance

  • Mean Dice Similarity Coefficient (DSC): 0.64
  • Systematic volumetric overestimation by the AI algorithm
    (mean volume difference: +56.1 mL compared with manual segmentation)

Lesion Detection

  • Very high lesion-level sensitivity: 95.8%
    • T category: 96.7%
    • N category: 95.9%
    • M category: 94.8%

Precision and Sources of Error

  • Moderate precision in the M category (PPV: 73.7%)
  • Most frequent error source: false-positive distant metastases
  • 70.4% of false-positive M lesions represented clinically relevant but benign or non-oncologic findings, including:
    • Degenerative musculoskeletal changes
    • Inflammatory processes such as pneumonia

Impact on Clinical Staging

  • UICC stage concordance with the reference standard in only 67.4% of patients
  • Upstaging observed in 88 of 306 cases
  • Primary drivers of staging discrepancies:
    • False-positive M lesions
    • Undersegmentation in the hilar region

Clinical Interpretation and Conclusions

The study concludes that, despite excellent lesion detection sensitivity (95.8%), the best-performing autoPET III algorithm achieved only 67.6% concordance in UICC staging, indicating substantial limitations for autonomous clinical use.

The clinical relevance of segmentation errors varied considerably, with false-positive lesions leading to upstaging identified as the main cause of staging discrepancies.

Key takeaway: High technical performance does not necessarily equate to clinical reliability.

Accordingly, the authors clearly recommend:

  • AI as a tool for workflow optimization and decision support—not as a replacement
  • Mandatory expert oversight, particularly for:
    • M-stage predictions
    • Complex cases involving multiple lesions

Conclusion

This study underscores the importance of task-oriented AI evaluation that extends beyond conventional segmentation metrics. For the safe clinical integration of AI in oncologic hybrid imaging, structured reporting, transparent workflows, and physician expertise remain indispensable.

For a comprehensive understanding of the methodology, detailed error analysis, and clinical implications, readers are encouraged to consult the original publication.

 

This work was supported by the German Federal Ministry for Research, Technology and Space Affairs (Bundesministerium für Forschung, Technologie und Raumfahrt, BMFTR) under the ‘DataXperiment’ funding initiative (project ID FKZ 01KD2431).

 

Heimer, Maurice M. et al. 2025. „Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [18F]FDG PET/CT”, European Journal of Nuclear Medicine and Molecular Imaging. https://doi.org/10.1007/s00259-025-07677-2.

Screenshot of the mint Lesion interface showing RANO 2.0 configuration, tumor burden calculations, and structured neuro-oncology assessment tools.
mint Lesion fully supports RANO 2.0 implementation with configurable parameters, automated tumor burden calculations, and structured workflows for neuro-oncology clinical trials.
Implementing RANO 2.0 for Neuro-Oncology Clinical Trials in mint Lesion

Tumor response assessment in neuro-oncology clinical trials requires careful attention to measurement protocols and confirmation scan requirements. To…

Read more
Image of a patient getting an MRI scan, signifying how RACOON projects in Germany show how imaging, structured reporting, and AI jointly advance clinical research.
Overview of major RACOON projects in radiology and clinical research.
RACOON – Imaging, Data & Collaboration for Better Decisions

Modern radiology faces a central question: how can imaging and clinical data be combined in a way that leads to more precise diagnoses,…

Read more
Interview with Prof. Timm Denecke about the RACOON-MARDER project and AI-powered early detection of liver cancer using MRI
An in-depth interview with Prof. Timm Denecke about the RACOON-MARDER project
Rethinking Early Detection: How RACOON-MARDER Aims to Spot Liver Cancer Sooner

Hepatocellular carcinoma (HCC) is often diagnosed too late, limiting treatment options and survival. The RACOON-MARDER project aims to change that. By…

Read more
scroll-top