A new accelerator chip called Hidenit that can achieve state-of-the-art accuracy in computing sparse hidden neural networks with low computational burden has now been developed by Tokyo Tech researchers. By employing the proposed on-chip model construction, which is a combination of weight generation and supermask expansion, the HiddenNight chip significantly reduces external memory access for increased computational efficiency.
Deep Neural Networks (DNNs) are complex pieces of machine learning architecture for AI that require multiple parameters to be learned to predict outputs. However, DNNs can be “truncated”, reducing the computational burden and model size. A few years ago, the lottery ticket hypothesis took the machine learning world by storm. The hypothesis states that a randomly initialized DNN consists of subnetworks that achieve an accuracy equal to that of the original DNN after training. The larger the network, the more “lottery tickets” there are for successful adaptation. These lottery tickets thus allow “trimmed” sparse neural networks to achieve accuracy comparable to more complex, “dense” networks, thereby reducing the overall computational burden and power consumption.
One technique for finding such subnetworks is the Hidden Neural Network (HNN) algorithm, which uses AND logic on initial random weights (where the output is only high when all inputs are high) and a “binary mask” called a “supermask”. Goes, uses. (Fig. 1). The supermask, defined by the top-k% highest score, represents the unselected and selected connections as 0 and 1, respectively. HNN helps to reduce the computational efficiency from the software side. However, the computation of neural networks also requires improvements in hardware components.
Traditional DNN accelerators provide high performance, but they do not consider the power consumption caused by external memory access. Now, researchers led by Tokyo Institute of Technology (Tokyo Tech) professors Jehoon Yu and Masato Motomura have developed a new accelerator chip called “Hidenite”, which can compute hidden neural networks with vastly improved power consumption .
“Reducing external memory access is the key to reducing power consumption. At present, achieving high estimation accuracy requires large models. But this model increases external memory access to load parameters. Our main motivation behind the development of Hidden was to reduce this external memory access,” Prof. Motomura explains. Their study will be included in the upcoming International Solid-State Circuits Conference (ISSCC) 2022, a prestigious international conference that showcases the pinnacle of achievement in integrated circuits.
“Hidnight” stands for Hidden Neural Network Inference Tensor Engine, and is the first HNN inference chip. The hidden architecture (Figure 2) provides a threefold advantage to reduce external memory access and achieve higher energy efficiency. The first is that it provides on-chip weight generation to re-generate the weights using a random number generator. This eliminates the need to access external memory and store weights. The second advantage is the provision of “on-chip supermask expansion”, which reduces the number of supermasks to be loaded by the accelerator. The third improvement offered by the Hiddennite chip is a high-density four-dimensional (4D) parallel processor that maximizes the reuse of data during the computational process, thereby improving efficiency.
Pro. “The first two factors are what differentiate the Hidnite chip from existing DNN estimation accelerators,” says Motomura. “Furthermore, we also introduced a new training method for hidden neural networks, called ‘score distillation’,” in which conventional wisdom distillation weights are distilled into scores because the hidden neural network never converts the weights. do not update. The accuracy of using the score distillation is comparable to the binary model while being half the size of the binary model.”
Based on the Hiddennite architecture, the team has designed, fabricated and measured a prototype chip with Taiwan Semiconductor Manufacturing Company’s (TSMC) 40nm process (Figure 3). The chip is only 3mm x 3mm and handles 4,096 MAC (multiply-and-accumulation) operations at once. This achieves a state-of-the-art level of computational efficiency of up to 34.8 trillion or tera operations per second (TOPS) per watt of power, while the model reduces the amount of transfer to half that of binarized networks.
These findings and their successful demonstration in a real silicon chip will certainly lead to another shift in the world of machine learning, paving the way for faster, more efficient and ultimately more environmentally friendly computing.
State-of-the-art ‘edge’: a tunable neural network framework towards compact and efficient models
Hidden: 4K-PE Hidden Network Inference 4D-Tensor Engine Exploiting On-Chip Model Building for CIFAR-100 and ImageNet Achieving 34.8-to-16.0TOPS/W, Live Q&A with 15.4, ML Processor Performance, 23 Feb 9: 00 AM PST, International Solid-State Circuits Conference 2022 (ISSCC 2022). www.isscc.org/
Provided by Tokyo Institute of Technology
Citation:hiddenite: a new AI processor for low computational power consumption based on cutting-edge neural network theory (2022, February 18), March 29, 2022 from https://techxplore.com/news/2022-02-hiddenite-ai- have gotten. processor-power-consumption.html
This document is subject to copyright. No part may be reproduced without written permission, except for any fair use for the purpose of personal study or research. The content is provided for information purposes only.