Dynamic Quantization for Energy Efficient Deep Learning

Variational growing architecture

Abstract

A method performed by a deep neural network (DNN) includes receiving, at a layer of the DNN during an inference stage, a layer input comprising content associated with a DNN input received at the DNN. The method also includes quantizing one or more parameters of a plurality of parameters associated with the layer based on the content of the layer input. The method further includes performing a task corresponding to the DNN input, the task performed with the one or more one quantized parameters.

Type
Publication
U.S. Patent App.
Randy Ardywibowo
Randy Ardywibowo
Ph.D.

I am interested in reinforcement learning, language agents & reasoning, sampling techniques, and contextual bandits.