24CH10039 AGV Task 4
24CH10039 AGV Task 4
Introduction to FPGA
FPGAs or Field Programmable Gate Arrays have become increasingly more important in recent times,
just like CPUs and GPUs, but unlike them they have the special ability that allows them to do multiple
calculations simultaneously.
As we know already that there are multiple matrix multiplications in a neural network, and FGPAs do it
in much fewer clock cycles as compared to CPUs and GPUs. This makes them better suited.
Some key advantages of FPGAs mentioned in the paper include:
1. Higher energy efficiency.
2. Parallel processing capabilities.
3. Real-time computation performance.
4. Flexibility in implementing custom algorithms.
Major companies have already started using FPGAs in their AI systems, like Microsoft with Bing
search engine and Baidu with their speech recognition applications
The model focuses on the forward propagation in detail, input layer to hidden layer, hidden layer to
hidden layer, and hidden layer to output layer. During the first stage, input vectors are multiplied by the
first hidden layer weight matrix using the multiply-add bank, which consists of many parallel
multiplication and accumulation units that can be adjusted. The results are then stored in memory
before being passed through activation functions implemented using lookup tables. This approach
allows different activation functions like sigmoid, ReLU, or tanh to be used by simply loading different
parameters into the lookup table. The output from the activation function is stored in another RAM
module before being processed by the next layer. For multi-layer networks, these components can be
reused to process each subsequent layer, making the architecture very flexible and efficient.
In the backward propagation process implements the learning algorithm that adjusts the network
weights. The paper uses cross-entropy as the loss function to calculate error derivatives for the
backpropagation algorithm. The core calculation involves multiplying the error (the difference
between predicted and actual values) by the derivative of the activation function.
It's implemented on the XILINX ZU9CG FPGA SoC platform, offering 2520 DSPs and 32Mb on-chip
memory. For larger networks, multiple FPGAs can be clustered. Additionally, the paper mentions the
possibility of deploying deep learning frameworks like TensorFlow directly on the 64-bit FPGA SoC
platform, calling FPGA hardware resources directly.