Macro simd_reduce

Source
macro_rules! simd_reduce {
    (
        $values:expr,
        2, 8,                    // 2 accumulators, 8-element chunks
        $init_expr:expr,         // Accumulator initialization
        $combine_expr:expr,      // Vector combination function
        $extract_expr:expr,      // Result extraction
        $scalar_init:expr,       // Scalar initial value
        $scalar_combine:expr     // Scalar combination for remainder
    ) => { ... };
    (
        $values:expr,
        1, 4,
        $init_expr:expr,
        $combine_expr:expr,
        $extract_expr:expr,
        $scalar_init:expr,
        $scalar_combine:expr
    ) => { ... };
    (
        $values:expr,
        4, 16,
        $init_expr:expr,
        $combine_expr:expr,
        $extract_expr:expr,
        $scalar_init:expr,
        $scalar_combine:expr
    ) => { ... };
}
Expand description

Generate SIMD reduction operations with configurable accumulators and chunk sizes

This macro eliminates manual loop unrolling for common reduction operations like sum, min, max by generating optimized SIMD code with proper remainder handling.

§Parameters

  • $values: Input slice of f64 values
  • $accumulator_count: Number of SIMD accumulators (1, 2, or 4)
  • $chunk_size: Elements per chunk (4, 8, or 16)
  • $init_expr: Expression to initialize accumulators (e.g., f64x4::ZERO)
  • $combine_expr: Expression to combine vectors (e.g., $acc += $vec)
  • $extract_expr: Expression to extract final result from accumulator
  • $scalar_init: Initial value for scalar remainder processing
  • $scalar_combine: Expression to combine scalar values (e.g., $acc += $val)

§Example Usage

let sum = simd_reduce!(
    values, 2, 8,                    // 2 accumulators, 8-element chunks
    f64x4::ZERO,                     // Initialize with zeros
    |acc, vec| acc + vec,            // Add vectors to accumulator
    |acc1, acc2| {                   // Extract and combine results
        let total = acc1 + acc2;
        let arr = total.as_array_ref();
        arr[0] + arr[1] + arr[2] + arr[3]
    },
    0.0,                             // Scalar initial value
    |acc, val| acc + val             // Scalar combination
);