Tools

HIR

HIR is an intermediate representation for hardware design. Implemented as a dialect in MLIR, it is built to enable automatic optimization of hardware designs and subsequent lowering to SystemVerilog. As a part of the MLIR infrastructure, it shares many compiler optimization passes with software compilers (such as constant propagation and inlining).

Pluto/Pluto+

Pluto/Pluto+ is a source-to-source parallelization and optimization tool based on the polyhedral compiler framework. It can automatically optimize affine loop nests (sequences of imperfectly nested loops with regular data access patterns) for parallelism and locality using affine transformations. It can target both shared-memory multicore architectures (by generating code with OpenMP parallel pragmas) and distributed-memory architectures (by generating message passing MPI code). Pluto/Pluto+ is extensively used for advanced experimentation with loop optimization and parallelization, optimization of scientific stencil computations, and in university courses teaching loop transformations.

PolyMage

PolyMage is a domain-specific language and compiler for automatic parallelization and optimization of image processing pipelines. PolyMage takes an image processing pipeline expressed by the user in a high-level language (embedded in Python) and generates a C++ implementation of the pipeline optimized using the polyhedral framework as the intermediate representation. It uses OpenCV for image I/O handling, islpy/ISL for integer set operations, ‘cgen’ for AST code generation and ‘OpenMP’ to mark parallel loops. PolyMage uses an asymmetric overlapped tiling technique (overlapped tiling extended for heterogeneous accesses and non-constant dependence vectors) to exploit locality and parallelism simultaneously. It uses a model-driven approach to automatically fuse image processing pipeline stages for tiling, and employs an in-built autotuner to find the best performing code within a small well-defined search space.

SMO

SMO is a storage optimization tool for regular loop nests. The input to SMO is a specification of the set of conflicting array indices – two indices are said to be in conflict if the corresponding array elements are simultaneously live. A specified conflict could therefore be intra-array or inter-array. The output obtained is the modulo storage mapping using our technique for each array written in the regular loop nest. In the scenario when only one statement is involved, the global conflict set specification defines the set of conflicts associated with the array space written by the statement.

TreeBeard

TreeBeard is an optimizing compiler for decision tree inference. It generates an optimized, user-callable inference function from a high-level model description (XGBoost JSON for example). TREEBEARD combines several novel optimizations at various abstraction levels to mitigate architectural bottlenecks and enable SIMD vectorization of tree walks. TREEBEARD is implemented using the MLIR compiler infrastructure. Code generated by TREEBEARD is significantly faster than state-of-the-art systems.

GPU codegen for tensor cores

GPU codegen for tensor cores (available “as is” under Apache 2 License)