Skip to content

Commit 7f33ea6

Browse files
committed
Update README.md
1 parent 12c9771 commit 7f33ea6

File tree

1 file changed

+26
-6
lines changed

1 file changed

+26
-6
lines changed

README.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ A minimal GPU implementation in Verilog optimized for learning about how GPUs wo
55
- [Overview]()
66
- [Architecture](#architecture)
77
- [ISA](#isa)
8-
- [SIMD](#simd)
8+
- [Thread](#thread)
99
- [Memory](#memory)
1010
- [Kernels](#kernels)
1111
- [Simulation](#simulation)
@@ -111,6 +111,10 @@ Decodes the fetched instruction into control signals for thread execution.
111111

112112
### Register Files
113113

114+
Each thread has it's own dedicated set of register files. The register files hold the data that each thread is performing computations on, which enables the same-instruction multiple-data (SIMD) pattern.
115+
116+
Importantly, each register file contains a few read-only registers holding data about the current block & thread being executed locally, enabling kernels to be executed with different data based on the local thread id.
117+
114118
### ALUs
115119

116120
Dedicated arithmetic-logic unit for each thread to perform computations.
@@ -131,11 +135,23 @@ In real GPUs, individual threads can branch to different PCs, causing **branch d
131135

132136
![ISA](/docs/images/isa.png)
133137

134-
# SIMD
138+
tiny-gpu implements a simple 11 instruction ISA built to enable simple kernels for proof-of-concept like matrix addition & matrix multiplication (implementation further down on this page).
135139

136-
![Thread](/docs/images/thread.png)
140+
For these purposes, it supports the following instructions:
137141

138-
# Memory
142+
- `BRnzp` - Branch instruction to jump to another line of program memory if the NZP register matches the `nzp` condition in the instruction.
143+
- `CMP` - Compare the value of two registers and store the result in the NZP register to use for a later `BRnzp` instruction.
144+
- `ADD`, `SUB`, `MUL`, `DIV` - Basic arithmetic operations to enable tensor math.
145+
- `LDR` - Load data from global memory.
146+
- `STR` - Store data into global memory.
147+
- `CONST` - Load a constant value into a register.
148+
- `RET` - Signal that the current thread has reached the end of execution.
149+
150+
Each register is specified by 4 bits, meaning that there are 16 total registers. The first 13 register `R0` - `R12` are free registers that support read/write. The last 3 registers are special read-only registers used to supply the `%blockIdx`, `%blockDim`, and `%threadIdx` critical to SIMD.
151+
152+
# Thread
153+
154+
![Thread](/docs/images/thread.png)
139155

140156
# Kernels
141157

@@ -219,6 +235,10 @@ STR R9, R8 ; store C[i] in global memory
219235
RET ; end of kernel
220236
```
221237

222-
# Code
223-
224238
# Simulation
239+
240+
tiny-gpu is setup to simulate the execution of both of the above kernels using `iverilog` and `cocotb`.
241+
242+
Running `make test_matadd` or `make test_matmul` will run the specified kernel and output a log file with the complete execution trace of the kernel from start to finish, as well as the intial and final states of data memory.
243+
244+
The `matadd` kernel adds 2 1x8 matrices across 8 threads running on 2 cores, and the `matmul` kernel multiplies 2 2x2 matrices across 4 threads.

0 commit comments

Comments
 (0)