RabbitCT

Introduction#

RabbitCT is an open benchmarking platform for comparing cone-beam CT backprojection performance across different architectures. It uses a clinical, high-resolution C-arm CT dataset of a real rabbit and provides a benchmark interface, multiple algorithm variants, and image quality measures for standardized evaluation.

Dwarf Classification#

RabbitCT implements the Signal Processing dwarf from the Berkeley taxonomy. The core computational pattern is 3D cone-beam backprojection: for each projection image, every voxel in a 3D volume is projected onto the detector plane using a 3x4 projection matrix, the projection data is sampled via bilinear interpolation, and the weighted value is accumulated into the volume. This pattern combines regular iteration over a structured 3D grid with gather-type memory access into the 2D projection images.

Key Features#

Algorithm variants with increasing optimization level:
- LolaBunny: Straightforward reference implementation
- LolaOMP: OpenMP parallelization with line-range clipping
- LolaOPT: Zero-padded projections (no bounds checking) and collapsed OpenMP scheduling
- LolaISPC: ISPC-vectorized kernel for data-parallel SIMD
- LolaASM: Hand-written SIMD assembly kernels (SSE, AVX, AVX-512, NEON) with cache-blocking
Architectures: x86-64 (SSE, AVX, AVX-512) and AARCH64 (NEON)
Problem sizes: 128, 256, 512, and 1024 voxel edge length
Quality metrics: RMSE, MSE, max absolute error, PSNR against reference volumes
Output: PGM slice images, raw float volumes for ParaView/ImageJ

Getting Started#

Clone the RabbitCT repository and follow the build and usage instructions in the README. The input dataset (~2.9 GB) can be downloaded using the provided download-input.sh script.

Performance Characteristics#

The backprojection kernel is compute-bound for optimized variants and transitions toward memory-bandwidth bound for the reference implementation due to irregular access into the projection images. The bilinear interpolation gather pattern limits spatial locality and stresses the cache subsystem. Vectorization is the primary optimization lever: the SIMD assembly variants (LolaASM) process multiple voxels per instruction using SSE/AVX/AVX-512/NEON, while LolaISPC achieves portable vectorization through ISPC’s foreach construct. Line-range clipping (skipping voxels that fall outside the detector) reduces redundant work significantly for smaller problem sizes. Key tuning dimensions include the SIMD instruction set, cache-blocking strategy, OpenMP thread count and affinity, and problem size.

Citations#

Hofmann, J., Treibig, J., Hager, G., & Wellein, G.: Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips. In Proceedings of WPMVP ‘14 (Workshop on Programming Models for SIMD/Vector Processing), February 15–19, 2014, Orlando, FL, USA. arXiv:1401.7494. https://arxiv.org/abs/1401.7494

Treibig, J., Hager, G., Hofmann, H. G., Hornegger, J., & Wellein, G.: Pushing the limits for medical image reconstruction on recent standard multicore processors. The International Journal of High Performance Computing Applications, 27(2), 162–177. https://doi.org/10.1177/1094342012442424. arXiv:1104.5243. https://arxiv.org/abs/1104.5243

Christopher Rohkohl, Benjamin Keck, Hannes G. Hofmann, and Joachim Hornegger: RabbitCT – an open platform for benchmarking 3D cone-beam reconstruction algorithms. Medical Physics, 36(9), 2009, Pages 3940-3944. DOI: 10.1118/1.3180956

Credits & License#

RabbitCT was originally a collaboration of the Department of Neuroradiology and the Pattern Recognition Lab at the Friedrich-Alexander-Universitat Erlangen-Nurnberg. It is maintained by the Erlangen National High Performance Computing Center (NHR@FAU).

Licensed under the MIT License.