You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-6Lines changed: 26 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ A minimal GPU implementation in Verilog optimized for learning about how GPUs wo
5
5
-[Overview]()
6
6
-[Architecture](#architecture)
7
7
-[ISA](#isa)
8
-
-[SIMD](#simd)
8
+
-[Thread](#thread)
9
9
-[Memory](#memory)
10
10
-[Kernels](#kernels)
11
11
-[Simulation](#simulation)
@@ -111,6 +111,10 @@ Decodes the fetched instruction into control signals for thread execution.
111
111
112
112
### Register Files
113
113
114
+
Each thread has it's own dedicated set of register files. The register files hold the data that each thread is performing computations on, which enables the same-instruction multiple-data (SIMD) pattern.
115
+
116
+
Importantly, each register file contains a few read-only registers holding data about the current block & thread being executed locally, enabling kernels to be executed with different data based on the local thread id.
117
+
114
118
### ALUs
115
119
116
120
Dedicated arithmetic-logic unit for each thread to perform computations.
@@ -131,11 +135,23 @@ In real GPUs, individual threads can branch to different PCs, causing **branch d
131
135
132
136

133
137
134
-
# SIMD
138
+
tiny-gpu implements a simple 11 instruction ISA built to enable simple kernels for proof-of-concept like matrix addition & matrix multiplication (implementation further down on this page).
135
139
136
-

140
+
For these purposes, it supports the following instructions:
137
141
138
-
# Memory
142
+
-`BRnzp` - Branch instruction to jump to another line of program memory if the NZP register matches the `nzp` condition in the instruction.
143
+
-`CMP` - Compare the value of two registers and store the result in the NZP register to use for a later `BRnzp` instruction.
-`RET` - Signal that the current thread has reached the end of execution.
149
+
150
+
Each register is specified by 4 bits, meaning that there are 16 total registers. The first 13 register `R0` - `R12` are free registers that support read/write. The last 3 registers are special read-only registers used to supply the `%blockIdx`, `%blockDim`, and `%threadIdx` critical to SIMD.
151
+
152
+
# Thread
153
+
154
+

139
155
140
156
# Kernels
141
157
@@ -219,6 +235,10 @@ STR R9, R8 ; store C[i] in global memory
219
235
RET ; end of kernel
220
236
```
221
237
222
-
# Code
223
-
224
238
# Simulation
239
+
240
+
tiny-gpu is setup to simulate the execution of both of the above kernels using `iverilog` and `cocotb`.
241
+
242
+
Running `make test_matadd` or `make test_matmul` will run the specified kernel and output a log file with the complete execution trace of the kernel from start to finish, as well as the intial and final states of data memory.
243
+
244
+
The `matadd` kernel adds 2 1x8 matrices across 8 threads running on 2 cores, and the `matmul` kernel multiplies 2 2x2 matrices across 4 threads.
0 commit comments