PolyMage is a domain-specific language and optimizing code generator for automatic optimization of image processing pipelines, being developed at the Compilers and Programming Systems for High-Performance AI Lab, Indian Institute of Science. PolyMage takes an image processing pipeline expressed by the user in a high-level language (embedded in Python) and generates an optimized parallelized C++ implementation of the pipeline.

PolyMage’s optimizer primarily relies on the polyhedral framework to automatically perform a sequence of transformations for parallelization and locality optimization on multicore processors. The transformations include ways to enable fusion and tiling across image processing stages, using an asymmetric overlapped tiling technique.

Download

PolyMage is available as free and open source software under the permissive Apache License version 2.0 through its git repository on bitbucket.org here.

PolyMage optimized and base line code for the benchmarks used in its ASPLOS’15 paper can be found in a separate repository here. These generated codes are in the public domain; however, if you plan to reuse these, please consider including a note stating that these were generated by PolyMage, and cite the polymage-benchmarks github repo or this web page. Do keep in mind that the optimization capabilities of PolyMage’s compiler/code generator have changed significantly since its release right after ASPLOS 2015.
PolyMage generated code typically performs best when compiled with Intel’s C/C++ compilers.

Examples

Harris Corner Detection

R, C = Parameter(Int, "R"), Parameter(Int, "C")
I = Image(Float, "I", [R+2, C+2])

x, y = Variable("x"), Variable("y")
row, col = Interval(0, R+1, 1), Interval(0, C+1, 1)

c = Condition(x, '>=', 1) & Condition(x, '<=', R) & \
    Condition(y, '>=', 1) & Condition(y, '<=', C)

cb = Condition(x, '>=', 2) & Condition(x, '<=', R-1) & \
     Condition(y, '>=', 2) & Condition(y, '<=', C-1)

Iy = Function(varDom = ([x, y], [row, col]), Float, "Iy")
Iy.defn = [ Case(c, Stencil(I(x, y), 1.0/12, \
                                   [[-1,-2,-1], \
                                    [ 0, 0, 0], \
                                    [ 1, 2, 1]])) ]

Ix = Function(varDom = ([x, y], [row, col]), Float, "Ix")
Ix.defn = [ Case(c, Stencil(I(x, y), 1.0/12, \
                                   [[-1, 0, 1], \
                                    [-2, 0, 2], \
                                    [-1, 0, 1]])) ]

Ixx = Function(varDom = ([x, y], [row, col]), Float, "Ixx")
Ixx.defn = [ Case(c, Ix(x, y) * Ix(x, y)) ]

Iyy = Function(varDom = ([x, y], [row, col]), Float, "Iyy")
Iyy.defn = [ Case(c, Iy(x, y) * Iy(x, y)) ]

Ixy = Function(varDom = ([x, y], [row, col]), Float, "Ixy")
Ixy.defn = [ Case(c, Ix(x, y) * Iy(x, y)) ]

Sxx = Function(varDom = ([x, y], [row, col]), Float, "Sxx")
Syy = Function(varDom = ([x, y], [row, col]), Float, "Syy")
Sxy = Function(varDom = ([x, y], [row, col]), Float, "Sxy")
for pair in [(Sxx, Ixx), (Syy, Iyy), (Sxy, Ixy)]:
    pair[0].defn = [ Case(cb, Stencil(pair[1], 1, \
                                       [[1, 1, 1], \
                                        [1, 1, 1], \
                                        [1, 1, 1]])) ]

Det = Function(varDom = ([x, y], [row, col]), Float, "det")
d = Sxx(x, y) * Syy(x, y) - Sxy(x, y) * Sxy(x, y)
Det.defn = [ Case(cb, d) ]

Trace = Function(varDom = ([x, y], [row, col]), Float, "trace")
Trace.defn = [ Case(cb, Sxx(x, y) + Syy(x, y)) ]

Harris = Function(varDom = ([x, y], [row, col]), Float, "harris")
coarsity = Det(x, y) - 0.04 * Trace(x, y) * Trace(x, y)
Harris.defn = [ Case(cb, coarsity) ]

Pyramid Blending

Performance

Benchmark	Stages	Image size	PolyMage (1 core)	PolyMage (16 cores)	OpenCV
Harris Corner Dectetion	11	6400 * 6400	233.79 ms	18.69 ms	810.24 ms
Pyramid Blending	44	3 * 2048 * 2048	196.99 ms	21.91 ms	197.28 ms
Unsharp Mask	4	3 * 2048 * 2048	165.40 ms	14.85 ms	349.57 ms
Local Laplacian	99	3 * 2560 * 1536	274.40 ms	32.35 ms	*
Camera Pipeline	32	2528 * 1920	67.87 ms	5.86 ms	*
Bilateral Grid	7	2560 * 1536	89.76 ms	8.47 ms	*
Multiscale Interpolation	49	3 * 2560 * 1536	101.70 ms	18.18 ms	*

Machine configuration: Dual socket 8-core Xeon E5 2680 (SandyBridge) 2.70 GHz processors with 64 GB of non-ECC RAM running Linux 3.8.0-38 (64-bit)

Compiler: Intel C/C++ compiler 14.0.1 with flags “-O3 -xhost”

PolyMage (unoptimized) is code generated with only scalar optimizations enabled.

PolyMage (optimized) is auto-tuned code generated with grouping, tiling, storage optimization and vectorization enabled.

Halide (tuned) schedules for Local Laplacian Filters, Camera Pipeline, Bilateral Grid, and Multiscale Interpolation are optimized schedules available with the Halide distribution, and were further tuned for our target machine.

Harris Corner Detection

Multiscale Interpolation

Bilateral Grid

Pyramid Blending

Local Laplacian Filters

Camera Pipeline

Publications

PolyMage: Automatic Optimization for Image Processing Pipelines (PDF, slides)
Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
20^th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2015), Mar 2015.
An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
Abhinav Jangda, Uday Bondhugula
ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2018 (to appear).
Artifact evaluated (reusable and available).

PolyMage

Download

Examples

Harris Corner Detection

Pyramid Blending

Performance

Harris Corner Detection

Multiscale Interpolation

Bilateral Grid

Pyramid Blending

Local Laplacian Filters

Camera Pipeline

Publications

Explore

Get in touch

Location