• Re: Innervator: Hardware Acceleration for Neural Networks

    From Fereydoun Memarzanjany@3:633/280.2 to All on Wed Aug 7 15:02:10 2024
    Pasted below is an overview/abstract, and you will find more information (including a paper, demo video, statistics, slides, and source code) at
    the following GitHub repository:

    https://github.com/Thraetaona/Innervator

    ------------------------------------------------------------------------ Artificial intelligence ("AI") is deployed in various applications, from
    noise cancellation to image recognition, but AI-based products often
    come with high hardware and electricity costs; this makes them
    inaccessible for consumer devices and small-scale edge electronics.
    Inspired by biological brains, deep neural networks ("DNNs") are modeled
    using mathematical formulae, yet general-purpose processors treat otherwise-parallelizable AI algorithms as step-by-step sequential logic.
    ˙In contrast, programmable logic devices ("PLDs") can be customized to
    the specific parameters of a trained DNN, thereby ensuring data-tailored computation and algorithmic parallelism at the register-transfer level. Furthermore, a subgroup of PLDs, field-programmable gate arrays
    ("FPGAs"), are dynamically reconfigurable.˙ So, to improve AI runtime performance, I designed and open-sourced my hardware compiler:
    Innervator.˙ Written entirely in VHDL-2008, Innervator takes any DNN's
    metadata and parameters (e.g., number of layers, neurons per layer, and
    their weights/biases), generating its synthesizable FPGA hardware
    description with the appropriate pipelining and batch processing.
    Innervator is entirely portable and vendor-independent.˙ As a proof of
    concept, I used Innervator to implement a sample 8x8-pixel handwritten digit-recognizing neural network in a low-cost AMD Xilinx Artix-7(TM)
    FPGA @ 100 MHz.˙ With 3 pipeline stages and 2 batches at about 67% LUT utilization, the Network achieved ~7.12 GOP/s, predicting the output in
    630 ns and under 0.25 W of power.˙ In comparison, an Intel(R) Core(TM) i7-12700H CPU @ 4.70 GHz would take 40,000-60,000 ns at 45 to 115 W. Ultimately, Innervator's hardware-accelerated approach bridges the
    inherent mismatch between current AI algorithms and the general-purpose
    digital hardware they run on. ------------------------------------------------------------------------

    (Forgot to cross-post to c.a.fpga and c.a.embedded; adding them now.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)