¹¹institutetext: Technical University Munich, Germany ²²institutetext: Helmholtz Munich, Germany ³³institutetext: Munich Center of Machine Learning (MCML), Germany ⁴⁴institutetext: Universitat de Barcelona, Spain ⁵⁵institutetext: King’s College London, UK

Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks

Stefan M. Fischer 112233 Lina Felsner 1122 Richard Osuala 112244 Johannes Kiechle 112233 Daniel M. Lang 1122 Jan C. Peeken 1122 Julia A. Schnabel 11223355

Abstract

In this work, we introduce Progressive Growing of Patch Size, a resource-efficient implicit curriculum learning approach for dense prediction tasks. Our curriculum approach is defined by growing the patch size during model training, which gradually increases the task’s difficulty. We integrated our curriculum into the nnU-Net framework and evaluated the methodology on all 10 tasks of the Medical Segmentation Decathlon. With our approach, we are able to substantially reduce runtime, computational costs, and CO₂ emissions of network training compared to classical constant patch size training. In our experiments, the curriculum approach resulted in improved convergence. We are able to outperform standard nnU-Net training, which is trained with constant patch size, in terms of Dice Score on 7 out of 10 MSD tasks while only spending roughly 50% of the original training runtime. To the best of our knowledge, our Progressive Growing of Patch Size is the first successful employment of a sample-length curriculum in the form of patch size in the field of computer vision. Our code is publicly available at https://github.com/compai-lab/2024-miccai-fischer.

Keywords:

Segmentation nnU-Net Curriculum Learning Resource Efficiency Medical Segmentation Decathlon

1 Introduction

Automatic medical image segmentation is dominated by deep learning based methods. Most research focuses on the development of new architectural concepts, introducing convolution-based [15, 7], transformer-based [3, 13] or hybrid approaches [5, 4] to improve downstream segmentation performance. In contrast, few works in the medical imaging domain focus on the actual training process of networks, and models are still trained mainly by random training data sampling. Inspired by humans, Bengio et al. [2] introduced the concept of curriculum learning, teaching models to first solve simple tasks and, subsequent to that, concentrate on harder tasks. They showed that ordering samples from easy-to-hard speeds up convergence and can, therefore, also improve performance compared to training with random sampling. Those training techniques can reduce the high costs of model training in terms of time, energy, and carbon footprint [17].

Different approaches have been used in the image processing field to establish such a curriculum sample ordering. Human annotations or expert knowledge can be included as problem-specific measures [2, 10, 19]. Task difficulty can also be directly linked to sample class membership, which was utilized for fracture classification [10]. Another frequently applied measure in the medical domain is the inter-rater expert agreement, which considerably increases annotation costs [10, 19].

A more universal approach is rating the task difficulty automatically, or by incorporating processes that increase the task difficulty synthetically. Sample difficulty estimation can directly be computed from the network’s training sample loss. Jesson et al. [9] utilized hard-negative mining and an adaptive oversampling scheme for lung cancer segmentation. Missing modalities in MRI processing have been the motivation for Havei et al. [6] to train a brain tumor segmentation model that is robust against such missing data scenarios. In their curriculum learning approach, they randomly drop MRI channels with increasing probability. In the field of natural image generation, Kerras et al. [11] defined an implicit curriculum by progressively growing their image generator and discriminator layer-wise during training. With each additional layer, the output resolution is doubled, resulting in more difficult tasks. This increased convergence speed and reduced training runtime while achieving higher model performance compared to standard training. Zhao et al. [22] directly transferred this idea to semantic segmentation for their task of cervical nuclei segmentation using a U-net architecture, which grows starting from the bottleneck.

Task difficulty can also be directly defined by sample-length, which is already used in the natural language processing field in the form of sentence length [18, 21, 14]. In contrast, in the computer vision domain, the sample length, which would refer to image size or patch size, has, to the best of our knowledge, not used yet. An indirect utilization of image size is used in the Progressive Growing of GANs and its adaption to segmentation, but changes in image size are a result of the addition of network layers [11, 22]. In contrast to that growing image resolution curriculum that would start segmenting a lesion from a low-resolution version of the full image, we assume that starting with segmenting a foreground object within a smaller region-of-interest patch is easier. This was our motivation to build our curriculum on image size instead of image resolution. In this work, we successfully apply the curriculum approach of growing patch size. Our main contributions in this work are:

1.

Introduction of Progressive Growing of Patch Size curriculum for semantic segmentation and integration into the nnU-Net framework
2.

Empirical verification of reduced computational costs and improved convergence compared to classical constant patch size training
3.

Validation of the robustness of our approach on the 10 tasks of the Medical Segmentation Decathlon (MSD) [1]

2 Methods and Materials

2.1 Progressive Growing of Patch Size as Curriculum

We establish our curriculum by changing the patch size of the input volume. We assume training on smaller patches is an easier task than full images or volumes, as the frequency of foreground pixels decreases in larger patches. Furthermore, growing the patch size also results in an increased global context that can be inferred from that patch.

For our approach, we start training the network with a minimal patch size and then linearly increase the patch size to a maximal patch size during training, training at each patch size for the same number of iterations. During the model inference, the maximal possible patch size is used. A sketch of our curriculum is shown in Fig. 1.

Refer to caption — Figure 1: Proposed Progressive Growing of Patch Size curriculum. Our curriculum is illustrated for the lung cancer segmentation with cancer bounding boxes in yellow. A fully convolutional network is able to handle inputs of different sizes. Training with our curriculum starts by training the network with minimal patch size and, with training progress, increasing the patch size until the final patch size is reached. The ratio between foreground and background voxels is bigger for small patch sizes and decreases for bigger patch sizes. In contrast, the global context that can be inferred from the patch is growing with the patch size. For inference, the maximal patch size is used.

2.2 Implementation with nnU-Net

Fully convolutional networks are able to process inputs of varying sizes, which is the basis for our approach. While the GPU memory restricts the maximal patch size we can set, the network architecture itself restricts the minimal patch sizes that can be processed. Starting from the smallest possible patch size, we increase the patch size in the smallest possible steps to keep the similarity between different patch size stages as close as possible.

For our experiments, we use the current state-of-the-art medical segmentation network nnU-Net, which follows the U-net architecture [15] and is a fully self-configuring pipeline, as detailed in [7]. The experiments were performed on the 3D patch-based segmentation version of nnU-Net (3d_fullres), which has shown, on average, the best performances in the MSD in [8]. Models trained with the standard training of the nnU-Net are referred to as models trained with the Constant Patch Size Training Scheme (CPS). For our Progressive Growing of Patch Size Training Scheme (PGPS), we only change the patch size compared to CPS.

2.3 Utilizing Smaller Patch Sizes for Bigger Batch Sizes

Training a network with a smaller patch size reduces GPU memory consumption. Thus, PGPS has lower GPU memory consumption during most of the training compared to CPS. This enables the use of larger batch sizes in the lower patch size phases of PGPS at low GPU memory costs. With a higher batch size, we expect an increase in performance and convergence speed. Besides, we force a foreground-background ratio of one, which is the default effective ratio of nnU-Net that uses a batch size of two, a result of it’s maximal patch size heuristic.

3 Experiments and Results

3.1 Lung Cancer Segmentation

3.1.1 Experiments

We compare nnU-Net instances trained via CPS (default nnU-Net) and nnU-Net instances trained via our PGPS on the MSD task of lung cancer segmentation. To evaluate the effect of growing patch size, we also train models with Random Patch Size Sampling (RPSS). For that, at each training iteration, a random patch size of all patch sizes used for PGPS is picked, and a training iteration with the chosen patch size is performed. At testing time, all configurations use the same patch size, which is the nnU-Net default patch size.

To evaluate the convergence properties of the three different curricula, we train models for different numbers of training iterations per epoch. The standard nnU-Net 250 iterations per epoch refers to 100% scenario. Moreover, we also train models with only 10%, 25%, 50% of iterations per epoch. For each model training, we track the average training runtime and compute the mean Dice Score and the general number of voxels shown to the network during training.

Performance regarding training iterations, iterated voxels, and overall training runtime for trained models are plotted in Fig. 2. Exact values are given in the Supplemental Table 3. Performance of the 100% iterations per epoch training of the CPS are taken from [8].

Table 1: Performance of Constant Patch Size Training Scheme (CPS), Progressive Growing of Patch Size Training Scheme (PGPS), and PGPS with increased batch size (PGPS+) on Medical Segmentation Decathlon. CPS refers to standard nnU-Net training. (Dice Score: Evaluated in 5-fold Cross-Validation as in [8]; P-value: One-sided paired T-test on validation Dice Scores of all samples against CPS (* refers to significant difference [P-value

<

0.05]); Runtime: Training runtime per fold on one NVIDIA A100-SXM4-80GB GPU; Voxels Shown: total number of voxels iterated during training normalized to CPS training; CO₂-eq: mean CO₂-equivalent for training a single nnU-Net instance)

MSD Task	Scheme	Dice Score	P-value	Runtime	Voxels Shown	CO₂-eq
	CPS	0.7411	-*	27.53 h	100.00 %	10.44 kg
Brain	PGPS	0.7412	0.5972*	11.29 h	38.08 %	4.31 kg
	PGPS+	0.7421	0.1173*	13.83 h	51.78 %	5.21 kg
	CPS	0.9328	-*	15.04 h	100.00 %	6.24 kg
Heart	PGPS	0.9321	0.2567*	6.44 h	35.35 %	2.64 kg
	PGPS+	0.9328	0.4763*	7.06 h	39.35 %	2.89 kg
	CPS	0.7971	-*	11.09 h	100.00 %	4.70 kg
Liver	PGPS	0.7891	0.0992*	5.14 h	38.08 %	2.14 kg
	PGPS+	0.7938	0.2369*	6.70 h	51.78 %	2.77 kg
	CPS	0.8891	-*	2.14 h	100.00 %	0.90 kg
Hippocampus	PGPS	0.8911	0.0073*	1.37 h	35.05 %	0.54 kg
	PGPS+	0.8907	0.0343*	1.42 h	35.23 %	0.56 kg
	CPS	0.7537	-*	10.14 h	100.00 %	4.33 kg
Prostate	PGPS	0.7566	0.4021*	5.18 h	30.32 %	2.11 kg
	PGPS+	0.7531	0.4679*	5.75 h	36.33 %	2.31 kg
	CPS	0.7211	-*	13.55 h	100.00 %	5.59 kg
Lung	PGPS	0.7263	0.3163*	5.70 h	35.35 %	2.35 kg
	PGPS+	0.7333	0.1484*	6.13 h	39.35 %	2.55 kg
	CPS	0.6745	-*	11.01 h	100.00 %	4.68 kg
Pancreas	PGPS	0.6824	0.0681*	5.11 h	33.93 %	2.07 kg
	PGPS+	0.6822	0.0738*	5.19 h	35.92 %	2.30 kg
	CPS	0.6837	-*	15.12 h	100.00 %	6.06 kg
Hepatic Vessel	PGPS	0.6782	0.0210*	6.18 h	36.19 %	2.51 kg
	PGPS+	0.6871	0.0633*	6.65 h	40.36 %	2.72 kg
	CPS	0.9638	-*	11.24 h	100.00 %	4.76 kg
Spleen	PGPS	0.9621	0.3803*	5.22 h	36.30 %	2.15 kg
	PGPS+	0.9654	0.4147*	5.37 h	45.81 %	2.31 kg
	CPS	0.4553	-*	10.94 h	100.00 %	3.77 kg
Colon	PGPS	0.4925	0.0046*	4.36 h	32.21 %	1.69 kg
	PGPS+	0.4967	0.0087*	4.50 h	34.55 %	1.77 kg

3.1.2 Results

Focusing on the number of training iterations per epoch (Fig. 2 [left]), PGPS models outperformed RPSS and CPS models in Dice Score for each trained configuration. RPSS is outperformed by CPS and PGPS for all training configurations except for 50%, where it outperforms CPS. When comparing the same models regarding training efficiency, in terms of shown voxels during training (Fig. 2 [center]) and training runtime (Fig. 2 [right]), PGPS and RPSS have drastically increased efficiency. The number of shown training voxels for PGPS and RPSS is substantially reduced for each configuration to only 35.5% of the original CPS training amount. Training runtime is reflecting the trend of voxels shown. Training with 100% iterations per epoch is only running for 5.7 hours for PGPS instead of 13.55 hours for CPS.

3.2 Medical Segmentation Decathlon

3.2.1 Experiments

To evaluate the robustness of PGPS on a variety of medical segmentation tasks, we tested our curriculum learning approach on all ten different tasks of the MSD. Detailed descriptions of the tasks are given in [1]. Various anatomies have to be segmented, including tumors, vessels, and healthy organs on single or multi-modality input, covering CT and MRI images. The MSD covers binary and multi-class segmentation tasks and dataset sample sizes ranging from 30 to 750 samples. Dice Score, runtime, and total number of voxels shown during training are computed over a 5-fold cross-validation as in [8]. Besides, we adopt CodeCarbon [16] to track the CO₂-equivalents of our training process. Dice Scores of CPS models are taken from [8]. Because of that, we extrapolated the runtime values for CPS from the runtimes of PGPS by multiplying the averaged measured time for one epoch of the maximal patch size times the number of training epochs. Used nnU-Net model architectures and training hyperparameters can be seen for each MSD task in the Appendix of [8]. Concrete patch sizes used for PGPS are given in the Supplemental Table 3. Furthermore, we also compute P-values for paired one-sided T-tests to assess if the performance differs significantly. For that, we pair validation Dice Scores of each sample over all cross-validation splits regarding CPS and PGPS.

Besides training with PGPS, we also repeat experiments with increased batch size. In our approach, we increase the batch size of the current patch size stage to the extent that the total number of voxels in an input tensor is equal to or lower than that of the following patch size stage. In this way, we avoid drops in the number of input voxels per iteration. We refer to this adaption of PGPS as PGPS+ in the later text. The concrete patch and batch sizes used for PGPS and PGPS+ are given in the Supplemental Table 3.

3.2.2 Results

Results on the MSD tasks are given in Table 1. In Fig. 3, we plotted the fold-wise Dice Score differences between our new training curricula and standard CPS. On averaged Dice Scores over the 5-fold cross-validation, the PGPS models outperform CPS models in 6 out of 10 tasks. In the hippocampus and the colon cancer task, PGPS outperforms CPS significantly on the sample-based T-Test (P-value $<$ 0.05), while PGPS is outperformed significantly by CPS on the hepatic vessel task. When PGPS outperformed CPS, it achieved an average fold-wise increase of $1.8\%\pm 2.9\%$ in Dice Score compared to CPS, while for tasks CPS outperformed PGPS, there was only an average fold-wise difference of $0.5\%\pm 0.4\%$ in performance. Full training of the nnU-Net with our PGPS curriculum reduces the runtime on average to $46.09\%\pm 6.8\%$ compared to CPS (default nnU-Net training) by only iterating $35.09\%\pm 2.30\%$ of the CPS’ seen voxels and reducing CO₂ equivalents to $48.22\%\pm 9.34\%$ .

With the additional adjustment of batch size for PGPS+, we are able to outperform CPS in 7 out of 10 MSD tasks on the mean of fold-wise Dice Scores. Sample-based significant performance increases were observed for the hippocampus, pancreas, and colon cancer tasks. When PGPS+ outperformed CPS, it achieved an average increase of $1.7\%\pm 3.1\%$ in fold-wise average Dice Score compared to CPS, while when it lost, there was only an average difference of $0.2\%\pm 0.2\%$ . Comparing the performance differences between each fold, our PGPS+ even outperforms CPS in 8 out of 10 tasks, which can be seen in Fig. 3. The model training only takes up an average of $50.59\%\pm 7.6\%$ of the original time of CPS, while iterated voxels are reduced to $41.05\%\pm 6.20\%$ and CO₂ equivalents to $50.59\%\pm 5.55\%$ .

4 Discussion and Conclusion

We introduced a novel curriculum learning approach based on increasing the patch size during training and compared it with constant patch size training.

Our curriculum approach showed improved convergence in the experiments, as it significantly outperforms classical CPS on the MSD most difficult task of colon cancer by a large margin and furthermore shows improved or comparable performance in the other tasks. Better convergence is a proven property of optimal curricula [20]. In the lung cancer experiment, we showed that the performance gain per training voxel is drastically increased compared to CPS. This was observed for all different training configurations, differing in the length of network training. The beneficial impact of our curriculum is based on the ordering of patches by growing size, which we assume results in an efficient, easy-to-hard order. Furthermore, PGPS also outperforms random ordering (RPSS), validating the impact of the patch size ordering. We hypothesize that with growing patch size, we ask the network subsequently more difficult questions. In contrast, Li et al. [12] used only two different patch sizes during training as a comparison method to their proposed curriculum. Their results showed a performance drop of their patch-to-whole training over CPS training. We assume that our approach benefits from maximal patch size stages.

Moreover, our curriculum can utilize larger batch sizes within the same GPU hardware restrictions. In our experiments, this resulted in better or comparable performance for PGPS+ compared to classical CPS training, as higher batch sizes generally can lead to better convergence and performance.

Besides the improved convergence, our proposed curriculum also drastically reduced the training runtime and CO₂ emissions to roughly half of the original values for PGPS and PGPS+. This is a result of the cheaper operations due to smaller input sizes. Minimal training runtime is defined by the total number of seen voxels during training and depends highly on the technical implementation.

We acknowledge that this is a proof of concept work, relying on the hyperparameters of nnU-Net. More experiments are needed to explore the different effects of using another minimal patch size, number of patch size stages, batch size, normalization, and training scheduler. Furthermore, our approach is not limited to fully convolutional networks but could also be applied to transformers [3, 13] and hybrid model architectures [4, 5] as well as object detection tasks.

Overall, the proposed Progressive Growing of Patch Size is a resource-efficient curriculum strategy that drastically reduces the training runtime and furthermore can also lead to improved convergence. The performance gain per voxel is substantially higher than for classical constant patch size training, supporting our hypothesis of better convergence. Improving the efficiency of network training is particularly important considering the growing carbon footprint of training deep learning models [17].

References

[1] Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., Ronneberger, O., Summers, R.M., et al.: The medical segmentation decathlon. Nature Communications 13(1), 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9
[2] Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the International Conference on Machine Learning. pp. 41–48. PMLR (2009)
[3] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[4] Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In: Brainlesion Workshop at International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 272–284. Springer (2021)
[5] Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D.: UNETR: Transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 574–584 (2022)
[6] Havaei, M., Guizard, N., Chapados, N., Bengio, Y.: HeMIS: Hetero-modal image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 469–477. Springer (2016)
[7] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18(2), 203–211 (2021)
[8] Isensee, F., Jäger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: Automated design of deep learning methods for biomedical image segmentation. arXiv preprint arXiv:1904.08128 (2019)
[9] Jesson, A., Guizard, N., Ghalehjegh, S.H., Goblot, D., Soudan, F., Chapados, N.: Cased: Curriculum adaptive sampling for extreme data imbalance. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 639–646. Springer (2017)
[10] Jiménez-Sánchez, A., Mateus, D., Kirchhoff, S., Kirchhoff, C., Biberthaler, P., Navab, N., González Ballester, M.A., Piella, G.: Medical-based deep curriculum learning for improved fracture classification. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 694–702. Springer (2019)
[11] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
[12] Li, H., Liu, X., Boumaraf, S., Liu, W., Gong, X., Ma, X.: A new three-stage curriculum learning approach for deep network based liver tumor segmentation. In: Proceedings of International Joint Conference on Neural Networks. pp. 1–6. IEEE (2020)
[13] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022 (2021)
[14] Platanios, E.A., Stretcu, O., Neubig, G., Poczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848 (2019)
[15] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 234–241. Springer (2015)
[16] Schmidt, V., Goyal, K., Joshi, A., Feld, B., Conell, L., Laskaris, N., Blank, D., Wilson, J., Friedler, S., Luccioni, S.: CodeCarbon: Estimate and track carbon emissions from machine learning computing (2021). https://doi.org/10.5281/zenodo.4658424, v2.3.4
[17] Selvan, R., Bhagwat, N., Wolff Anthony, L.F., Kanding, B., Dam, E.B.: Carbon footprint of selecting and training deep learning models for medical image analysis. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 506–516. Springer (2022)
[18] Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. In: Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics. pp. 751–759. Association for Computational Linguistics (2010)
[19] Wei, J., Suriawinata, A., Ren, B., Liu, X., Lisovsky, M., Vaickus, L., Brown, C., Baker, M., Nasir-Moin, M., Tomita, N., Torresani, L., Wei, J., Hassanpour, S.: Learn like a pathologist: Curriculum learning by annotator agreement for histopathology image classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 2473–2483 (2021)
[20] Weinshall, D., Cohen, G., Amir, D.: Curriculum learning by transfer learning: Theory and experiments with deep networks. In: Proceedings of International Conference on Machine Learning. pp. 5238–5246. PMLR (2018)
[21] Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
[22] Zhao, J., Dai, L., Zhang, M., Yu, F., Li, M., Li, H., Wang, W., Zhang, L.: PGU-net+: Progressive growing of U-net+ for automated cervical nuclei segmentation. In: Multiscale Multimodal Medical Imaging Workshop at International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 51–58. Springer (2020)

Supplemental Table 2: Performance of Constant Patch Size Training Scheme (CPS) and Progressive Growing of Patch Size Training Scheme (PGPS) and Random Patch Size Sampling Training Scheme (RPSS) on Task06 Lung Cancer of Medical Segmentation Decathlon for different numbers of training iterations. (Iterations Per Epoch: Number of training iterations per epoch normalized to default training iterations of nnU-Net; Dice Score: Evaluated in 5-fold Cross-Validation; Runtime: Training runtime per fold on one NVIDIA A100-SXM4-80GB GPU; Voxels Shown: Total number of voxels iterated during training normalized to CPS training)

Iterations Per Epoch	Training Scheme	Dice Score	Runtime	Voxels Shown
10%	CPS	0.6073	2.49 h	10.0 %
	RPSS	0.5971	1.72 h	$\sim$ 3.5 %
	PGPS	0.6608	1.69 h	3.5 %
25%	CPS	0.6435	4.22 h	25.0 %
	RPSS	0.6386	2.37 h	$\sim$ 8.8 %
	PGPS	0.6753	2.35 h	8.8 %
50%	CPS	0.6833	7.40 h	50.0 %
	RPSS	0.7005	3.52 h	$\sim$ 17.7 %
	PGPS	0.7009	3.43 h	17.7 %
100%	CPS	0.7211	13.55 h	100.0 %
	RPSS	0.7173	5.69 h	$\sim$ 35.4 %
	PGPS	0.7263	5.70 h	35.4 %

Supplemental Table 3: All training input tensor sizes for nnU-Net model training with Progressive Growing of Patch Size Curriculum on the Medical Segmentation Decathlon tasks. The input is given as batch size * width * height * depth. These are the values used for PGPS+. The raw PGPS uses a constant batch size for all patch size stages, which is here given for the final patch size. The maximum patch size during model training is also used for inference.

	Brain	Heart	Liver	Hippocampus	Prostate
1. Stage	246432*32	243232*32	246432*32	24168*8	24864*64
2. Stage	126464*32	123264*32	126464*32	121616*8	128128*64
3. Stage	66464*64	63264*64	66464*64	91616*16	68128*128
4. Stage	49664*64	44864*64	49664*64	92416*16	412128*128
5. Stage	39696*64	34896*64	39696*64	92424*16	312192*128
6. Stage	29696*96	24896*96	29696*96	92424*24	212192*192
7. Stage	212896*96	26496*96	212896*96	93224*24	216192*192
8. Stage	2128128*96	264128*96	2128128*96	93232*24	216256*192
9. Stage	2128128*128	264128*128	2128128*128	93232*32	216256*256
10. Stage	-	280128*128	-	94032*32	220256*256
11. Stage	-	280160*128	-	94040*32	220320*256
12. Stage	-	280160*160	-	94040*40	-
13. Stage	-	280192*160	-	94048*40	-
14. Stage	-	-	-	94056*40	-
15. Stage	-	-	-	-	-

	Lung	Pancreas	Hepatic Vessel	Spleen	Colon
1. Stage	243232*32	241632*32	243232*32	243232*32	241632*32
2. Stage	123264*32	121664*32	123264*32	123264*32	121664*32
3. Stage	63264*64	61664*64	63264*64	63264*64	61664*64
4. Stage	44864*64	42464*64	44864*64	44864*64	42464*64
5. Stage	34896*64	32496*64	34896*64	34896*64	32496*64
6. Stage	24896*96	22496*96	24896*96	24896*96	22496*96
7. Stage	26496*96	23296*96	26496*96	26496*96	23296*96
8. Stage	264128*96	232128*96	264128*96	264128*96	232128*96
9. Stage	264128*128	232128*128	264128*128	264128*128	232128*128
10. Stage	280128*128	240128*128	264160*128	264160*128	240128*128
11. Stage	280160*128	240160*128	264160*160	264160*160	240160*128
12. Stage	280160*160	240160*160	264192*160	264192*160	240160*160
13. Stage	280192*160	240192*160	264192*192	-	248160*160
14. Stage	-	240192*192	-	-	248192*160
15. Stage	-	240224*192	-	-	256192*160

	Brain	Heart	Liver	Hippocampus	Prostate
1. Stage	246432*32	243232*32	246432*32	24168*8	24864*64
2. Stage	126464*32	123264*32	126464*32	121616*8	128128*64
3. Stage	66464*64	63264*64	66464*64	91616*16	68128*128
4. Stage	49664*64	44864*64	49664*64	92416*16	412128*128
5. Stage	39696*64	34896*64	39696*64	92424*16	312192*128
6. Stage	29696*96	24896*96	29696*96	92424*24	212192*192
7. Stage	212896*96	26496*96	212896*96	93224*24	216192*192
8. Stage	2128128*96	264128*96	2128128*96	93232*24	216256*192
9. Stage	2128128*128	264128*128	2128128*128	93232*32	216256*256
10. Stage	-	280128*128	-	94032*32	220256*256
11. Stage	-	280160*128	-	94040*32	220320*256
12. Stage	-	280160*160	-	94040*40	-
13. Stage	-	280192*160	-	94048*40	-
14. Stage	-	-	-	94056*40	-
15. Stage	-	-	-	-	-

	Lung	Pancreas	Hepatic Vessel	Spleen	Colon
1. Stage	243232*32	241632*32	243232*32	243232*32	241632*32
2. Stage	123264*32	121664*32	123264*32	123264*32	121664*32
3. Stage	63264*64	61664*64	63264*64	63264*64	61664*64
4. Stage	44864*64	42464*64	44864*64	44864*64	42464*64
5. Stage	34896*64	32496*64	34896*64	34896*64	32496*64
6. Stage	24896*96	22496*96	24896*96	24896*96	22496*96
7. Stage	26496*96	23296*96	26496*96	26496*96	23296*96
8. Stage	264128*96	232128*96	264128*96	264128*96	232128*96
9. Stage	264128*128	232128*128	264128*128	264128*128	232128*128
10. Stage	280128*128	240128*128	264160*128	264160*128	240128*128
11. Stage	280160*128	240160*128	264160*160	264160*160	240160*128
12. Stage	280160*160	240160*160	264192*160	264192*160	240160*160
13. Stage	280192*160	240192*160	264192*192	-	248160*160
14. Stage	-	240192*192	-	-	248192*160
15. Stage	-	240224*192	-	-	256192*160