Title: Building Support for Quantized Inference into TVM

Advisors: Luis Ceze and Arvind Krishnamurthy

Abstract: State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the cost of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance.

In this talk I will discuss how we build support for efficient quantized inference into TVM, an end to end deep learning framework for generating optimized code. We add support for describing quantized layers that are parameterized for data layout and bit precision, and implement a library of quantized operators that describe bitpacking and bitserial computation. I then present an extensive case study on optimizing quantized convolutions using these operators for a low power ARM Cortex-A53.

Place: 
CSE 615
When: 
Wednesday, May 9, 2018 - 12:30 to 13:30