🔬 Dataset¶
📋 Task 1: Single-staging whole-body PET/CT¶
ℹ️ Information¶
The FDG cohort comprises 1014 studies of 501 patients diagnosed with histologically proven malignant melanoma, lymphoma, or lung cancer, along with 513 negative control patients. The PSMA cohort includes pre- and/or post-therapeutic PET/CT images of male individuals with prostate carcinoma, encompassing images with (537) and without PSMA-avid tumor lesions (60). Notably, the training datasets exhibit distinct age distributions: the FDG UKT cohort spans 570 male patients (mean age: 60; std: 16) and 444 female patients (mean age: 58; std: 16), whereas the PSMA MUC cohort tends to be older, with 378 male patients (mean age: 71; std: 8). Additionally, there are variations in imaging conditions between the FDG UKT and PSMA MUC cohorts, particularly regarding the types and number of PET/CT scanners utilized for acquisition. The PSMA MUC dataset was acquired using three different scanner types (Siemens Biograph 64-4R TruePoint, Siemens Biograph mCT Flow 20, and GE Discovery 690), whereas the FDG UKT dataset was acquired using a single scanner (Siemens Biograph mCT).
📥 Download¶
We provide the merged data as NIfTI in nnUNet format which can be downloaded from fdat (120GB):
The download will contain the resampled FDG and PSMA data as NiFTI files. It also contains the files obtained by running the nnUNet fingerprint extractor and a splits file which we use to design/train our baselines.
🎥 PET/CT acquisition protocol¶
FDG dataset: Patients fasted at least 6 h prior to the injection of approximately 350 MBq 18F-FDG. Whole-body PET/CT images were acquired using a Biograph mCT PET/CT scanner (Siemens, Healthcare GmbH, Erlangen, Germany) and were initiated approximately 60 min after intravenous tracer administration. Diagnostic CT scans of the neck, thorax, abdomen, and pelvis (200 reference mAs; 120 kV) were acquired 90 sec after intravenous injection of a contrast agent (90-120 ml Ultravist 370, Bayer AG) or without contrast agent (in case of existing contraindications). PET Images were reconstructed iteratively (three iterations, 21 subsets) with Gaussian post-reconstruction smoothing (2 mm full width at half-maximum). Slice thickness on contrast-enhanced CT was 2 or 3 mm.
PSMA dataset: Examinations were acquired on different PET/CT scanners (Siemens Biograph 64-4R TruePoint, Siemens Biograph mCT Flow 20, and GE Discovery 690). The imaging protocol mainly consisted of a diagnostic CT scan from the skull base to the mid-thigh using the following scan parameters: reference tube current exposure time product of 143 mAs (mean); tube voltage of 100kV or 120 kV for most cases, slice thickness of 3 mm for Biograph 64 and Biograph mCT, and 2.5 mm for GE Discovery 690 (except for 3 cases with 5 mm). Intravenous contrast enhancement was used in most studies (571), except for patients with contraindications (26). The whole-body PSMA-PET scan was acquired on average around 74 minutes after intravenous injection of 246 MBq 18F-PSMA (mean, 369 studies) or 214 MBq 68Ga-PSMA (mean, 228 studies), respectively. The PET data was reconstructed with attenuation correction derived from corresponding CT data. For GE Discovery 690 the reconstruction process employed a VPFX algorithm with voxel size 2.73 mm × 2.73 mm × 3.27 mm, for Siemens Biograph mCT Flow 20 a PSF+TOF algorithm (2 iterations, 21 subsets) with voxel size 4.07 mm × 4.07 mm × 3.00 mm, and for Siemens Biograph 64-4R TruePoint a PSF algorithm (3 iterations, 21 subsets) with voxel size 4.07 mm × 4.07 mm × 5.00 mm.
⌛ Training and test cohort¶
Training cases: 1,014 FDG studies (900 patients) and 597 PSMA studies (378 patients)
Test cases (final evaluation): 200 studies (50 FDG LMU, 50 FDG UKT, 50 PSMA LMU, 50 PSMA UKT)
Test cases (preliminary evaluation): 5 studies
A case (training or test) consists of one 3D whole body FDG-PET volume, one corresponding 3D whole body CT volume, one 3D binary mask of manually segmented tumor lesions on FDG-PET of the size of the PET volume, and a simulated human click. CT and PET were acquired simultaneously on a single PET/CT scanner in one session; thus PET and CT are anatomically aligned up to minor shifts due to physiological motion. A pre-rocessing script for resampling the PET and CT to the same matrix size will be provided. In addition, the human interaction in the form of a foreground (lesion) and background click are pre-simulated (for training and test). The pre-simulated clicks for training are provided in Github together with a script for further (parametrized) click simulations.
Training set¶
FDG training data consists of 1,014 studies acquired at the University
Hospital Tübingen and is made publicly available on
TCIA
in DICOM format:
and on fdat in NIfTI format:
PSMA training data consists of 597 studies acquired the LMU University Hospital Munich and will be made publicly available on TCIA in DICOM format.
The combined PSMA and FDG data is available on fdat in NIfTI format:
If you use this data, please cite:
Gatidis S, Kuestner T. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions (FDG-PET-CT-Lesions) [Dataset]. The Cancer Imaging Archive, 2022. DOI: 10.7937/gkr0-xv29 Jeblick, K., et al. A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (PSMA-PET-CT-Lesions) (Version 1) [Dataset]. The Cancer Imaging Archive, 2024. DOI: 10.7937/r7ep-3x37
Preliminary test set¶
For the self-evaluation of participating pipelines, we provide access to a preliminary test set. The preliminary test set does not reflect the final test set. Algorithm optimization on the preliminary test set will not yield satisfactory results on the final test set!
The access to this preliminary set is restricted and only possible through the docker containers submitted to the challenge, and only available for a limited time during the competition. The purpose of this is that participants can check the implementation and sanity of their approaches.
Final test set¶
The final test set consists of 200 studies, containing 50 FDG LMU, 50 FDG UKT, 50 PSMA LMU, 50 PSMA UKT studies. We will not disclose further details of test data as we aim to avoid fine-tuning of algorithms to the test data domain.
🗃️ Data structure¶
|--- imagesTr |--- tracer_patient1_study1_0000.nii.gz (CT image resampled to PET) |--- tracer_patient1_study1_0001.nii.gz (PET image in SUV) |--- ... |--- labelsTr |--- tracer_patient1_study1.nii.gz (manual annotations of tumor lesions) |--- dataset.json (nnUNet dataset description) |--- dataset_fingerprint.json (nnUNet dataset fingerprint) |--- splits_final.json (reference 5fold split) |--- psma_metadata.csv (metadata csv for psma) |--- fdg_metadata.csv (original metadata csv for fdg)
⚙️ Data pre-processing¶
Please note, that the submission and evaluation interfaces provided by grand-challenge are working with .mha
data. Hence, you will need to read the test images in your submission from an .mha
file. We already provide interfaces and code for this in the baseline algorithms.
✒ Annotation¶
FDG PET/CT training and test data from UKT was annotated by a Radiologist with 10 years of experience in Hybrid Imaging and experience in machine learning research. FDG PET/CT test data from LMU was annotated by a radiologist with 8 years of experience in hybrid imaging. PSMA PET/CT training and test data from LMU as well as PSMA PET/CT test data from UKT was annotated by a single reader and reviewed by a radiologist with 5 years of experience in hybrid imaging.
The following annotation protocol was defined:
Step 1: Identification of tracer-avid tumor lesions by visual assessment of PET and CT information together with the clinical examination reports.
Step 2: Manual free-hand segmentation of identified lesions in axial slices.
📋 Task 2: Longitudinal CT screening¶
ℹ️ Information¶
The cohort consists of melanoma patients undergoing longitudinal CT screening examinations in an oncologic context for diagnosis, staging, or therapy response assessment. The CT cohort comprises whole-body imaging in >300 patients (female: 170, mean age: 64y, std age: 15y) of two imaging timepoints: baseline staging, and follow-up scans after therapy treatment. Training data was acquired at a single site (UKT).
📥 Download¶
Database release starting in mid April
🎥 CT acquisition protocol¶
Patients were scanned with the inhouse whole-body staging protocol for a scan field from skull base to the middle of the femur with patients laid in a supine position, arms raised above the head. Scanning was performed during the portal-venous phase after administration of body-weight adapted contrast medium through the cubital vein. Attenuation-based tube current modulation (CARE Dose, reference mAs 240) and tube voltage (120 kV) were applied. The following scan parameters were used:
SOMATOM Force: collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 0.6
Sensation64: collimation 64 × 0.6 mm, rotation time 0.5 s, pitch 0.6
SOMATOM Definition Flash: collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 1.0
SOMATOM Definition AS: collimation 64 × 0.6 mm, rotation time 0.5 s, pitch 0.6
Biograph128: collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 0.8
Slice thickness as well as increment were set to 3 mm. A medium smooth kernel was used for image reconstruction.
⌛ Training and test cohort¶
Training cases: >300 studies (>300 patients)
Test cases (final evaluation): 140 studies (70 UKT, 70 UM Mainz)
Test cases (preliminary evaluation): 5 studies
A case (training or test) consists of one 3D CT volume, and one 3D binary mask of manually segmented tumor lesions on the CT volume, in two imaging sessions/time points: baseline and follow-up after therapy treatment. The human interaction in the form of a center lesion click in the follow-up scan is pre-simulated. The pre-simulated clicks for training are provided in Github together with a script for further (parametrized) click simulations.
Training set¶
Annotated longitudinal CT of two imaging time points in >300 studes was acquired at the University Hospital Tübingen and is made publicly available on fdat in NIfTI format: Database release starting in mid April!
Preliminary test set¶
For the self-evaluation of participating pipelines, we provide access to a preliminary test set. The preliminary test set does not reflect the final test set. Algorithm optimization on the preliminary test set will not yield satisfactory results on the final test set!
The access to this preliminary set is restricted and only possible through the docker containers submitted to the challenge, and only available for a limited time during the competition. The purpose of this is that participants can check the implementation and sanity of their approaches.
Final test set¶
Test data will be drawn in part (50%) from the same sources and distributions as the training data. The other part will be drawn from another center: University Hospital Mainz (50%). At this moment we will not disclose details of the test data as we aim to avoid fine-tuning of algorithms to the test data domain. The distribution of test data will be made public after the challenge deadline.
🗃️ Data structure¶
⚙️ Data pre-processing¶
Please note, that the submission and evaluation interfaces provided by grand-challenge are working with .mha
data. Hence, you will need to read the test images in your submission from an .mha
file. We already provide interfaces and code for this in the baseline algorithms.
✒ Annotation¶
All data were manually annotated by two experienced radiologists. To this end, tumor lesions were manually segmented on the CT image data using dedicated software.
The following annotation protocol was defined:
Step 1: Identification of tumor lesions by visual assessment of CT information together with the clinical
examination reports.
Step 2: Manual free-hand segmentation of identified lesions in axial slices.
Step 3: Baseline and follow-up segmentations are viewed side-by-side to mark the matching lesions.