This document defines x86 extensions for accelerating computation tasks, initially focusing on matrix multiplication kernels and reduced precision data formats important to ML workloads.
The ACE extensions define matrix multiplication primitives that augment AVX and scalar code with new capabilities, adding:
- ACE register state, including tile and block scale registers
- Data processing operations that consume AVX register input and operate on tile register state
- Data move operations to move data between ACE register state and AVX registers
- State and operations for system management
ACE provides tight integration between AVX vectors and ACE tile registers, combining high compute density tile processing operations with the comprehensive data processing features of AVX.
In addition to matrix acceleration, a number of dedicated format convert operations are provided under the AVX10 framework.