We use some essential cookies to make our website work.

We use optional cookies, as detailed in our cookie policy, to remember your settings and understand how you use our website.

yeahimthatguy
Posts: 36
Joined: Thu Apr 08, 2021 6:14 pm

[Example] WS2812B from scratch using C & Assembly

Mon Oct 25, 2021 1:31 am

Yeah I know there are official PIO examples that work flawlessly. Yes I know you can do DMA tricks and other wizardry to make this work. I just got annoyed by how needlessly complicated this simple protocol kept being implemented so I decided to write my own going back to the cave men days with some basic C and basic Assembly.

I tried leaving comments everywhere describing things. This code can be easily ported to other microcontrollers, you just need to do some math and figure out the Assembly instructions and their cycle counts for microcontrollers with less than 100MHz clock speeds. It can also be edited to work with the other WS series if they differ heavily in command timings.

Tested with 210 LEDs. Current demo code below uses 500mA for 210 LEDs. It just fades all the lights in and out with white and there's a red led that will run back and forth across the strip.

3 basic functions;
set_led(0, 255, 255, 255); /* Sets LED 0 to white */
set_all(255, 255, 255); /* Sets all LEDs to white */
send_led_data(); /* Sends the data array to the LEDs, should wait at least 50us between sends */

main.c

Code: Select all

/* Libraries */
#include "pico/stdlib.h"

/* For WS2812B */
/* Datasheet used: https://cdn-shop.adafruit.com/datasheets/WS2812B.pdf */

/* WARNING: Due to WS2812B being 5V logic and the RP2040 being 3v3 a level shifter is needed */
/* You will not fry the pins or anything if you accidentally hook it up, it just won't register */
/* This is because WS2812B registers 0.7VDD as a minimum for logic high, 0.7*5V => 3.5V which is above what the RP2040 can do */

/* Assembler functions */
/* If you place this in a .cpp file make sure to change /extern/ to /extern "C"/ */
/* eg; extern "C" void function(); */
extern void cycle_delay_t0h();
extern void cycle_delay_t0l();
extern void cycle_delay_t1h();
extern void cycle_delay_t1l();
extern uint32_t disable_and_save_interrupts();			/* Used for interrupt disabling */
extern void enable_and_restore_interrupts(uint32_t);	/* Used for interrupt enabling */

/* The GPIO pin for the LED data */
#define LED_PIN 14

/* Number of individual LEDs */
/* You can have up to 80,000 LEDs before you run out of memory */
#define LED_NUM 210

/* Leave alone, only defined to hammer it into the compilers head */
#define LED_DATA_SIZE 3
#define LED_BYTE_SIZE LED_NUM * LED_DATA_SIZE

/* We are not here to waste any memory, 3 bytes per LED */
uint8_t led_data[LED_BYTE_SIZE];

/* Sets a specific LED to a certain color */
/* LEDs start at 0 */
void set_led(uint32_t led, uint8_t r, uint8_t g, uint8_t b)
{
	led_data[led * LED_DATA_SIZE] = g;		/* Green */
	led_data[(led * LED_DATA_SIZE) + 1] = r;	/* Red */
	led_data[(led * LED_DATA_SIZE) + 2] = b;	/* Blue */
}

/* Sets all the LEDs to a certain color */
void set_all(uint8_t r, uint8_t g, uint8_t b)
{
	for (uint32_t i = 0; i < LED_BYTE_SIZE; i += LED_DATA_SIZE)
	{
		led_data[i] = g;		/* Green */
		led_data[i + 1] = r;		/* Red */
		led_data[i + 2] = b;		/* Blue */
	}
}

/* Sends the data to the LEDs */
void send_led_data()
{	
	/* Disable all interrupts and save the mask */
	uint32_t interrupt_mask = disable_and_save_interrupts();
	
	/* Get the pin bit */
	uint32_t pin = 1UL << LED_PIN;
	
	/* Declared outside to force optimization if compiler gets any funny ideas */
	uint8_t red = 0;
	uint8_t green = 0;
	uint8_t blue = 0;
	uint32_t i = 0;
	int8_t j = 0;
	
	for (i = 0; i < LED_BYTE_SIZE; i += LED_DATA_SIZE)
	{
		/* Send order is green, red, blue because someone messed up big time */
		    
		/* Look up values once, a micro optimization, assume compiler is dumb as a brick */
		green = led_data[i];
		red = led_data[i + 1];
		blue = led_data[i + 2];
		    
		for (j = 7; j >= 0; j--) /* Handle the 8 green bits */
		{
			/* Get Nth bit */
			if (((green >> j) & 1) == 1) /* The bit is 1 */
			{
				sio_hw->gpio_set = pin; /* This sets the specific pin to high */
				cycle_delay_t1h();		/* Delay by datasheet amount (800ns give or take) */
				sio_hw->gpio_clr = pin; /* This sets the specific pin to low */
				cycle_delay_t1l();		/* Delay by datasheet amount (450ns give or take) */
			}
			else /* The bit is 0 */
			{
				sio_hw->gpio_set = pin;
				cycle_delay_t0h();
				sio_hw->gpio_clr = pin;
				cycle_delay_t0l();
			}
		}
		    
		for (j = 7; j >= 0; j--) /* Handle the 8 red bits */
		{
			if (((red >> j) & 1) == 1)
			{
				sio_hw->gpio_set = pin;
				cycle_delay_t1h();
				sio_hw->gpio_clr = pin;
				cycle_delay_t1l();
			}
			else
			{
				sio_hw->gpio_set = pin;
				cycle_delay_t0h();
				sio_hw->gpio_clr = pin;
				cycle_delay_t0l();
			}
		}
		    
		for (j = 7; j >= 0; j--) /* Handle the 8 blue bits */
		{
			if (((blue >> j) & 1) == 1)
			{
				sio_hw->gpio_set = pin;
				cycle_delay_t1h();
				sio_hw->gpio_clr = pin;
				cycle_delay_t1l();
			}
			else
			{
				sio_hw->gpio_set = pin;
				cycle_delay_t0h();
				sio_hw->gpio_clr = pin;
				cycle_delay_t0l();
			}
		}
	}
	    
	/* Set the level low to indicate a reset is happening */
	sio_hw->gpio_clr = pin;
	
	/* Enable the interrupts that got disabled */
	enable_and_restore_interrupts(interrupt_mask);
	
	/* Make sure to wait any amount of time after you call this function */
}

int main()
{
	/* System init */
	gpio_init(LED_PIN);
	gpio_set_dir(LED_PIN, GPIO_OUT);
	gpio_put(LED_PIN, false); /* Important to start low to tell the LEDs that it's time for new data */
	
	/* 100MHz is a clean number and used to calculate the cycle delays */
	set_sys_clock_khz(100000, true);
	
	/* Wait a bit to ensure clock is running and force LEDs to reset*/
	sleep_ms(10);
	
	/* Used for example */
	int32_t led = 0;
	uint8_t led_dir = 1;
	uint8_t dim_value = 1;
	uint8_t dim_dir = 1;
	
	uint32_t timer = 2; /* Change LEDs every 2ms, basically a speed control, higher is slower */
	uint32_t timer_val = 0; /* Track current time */
    	while (true) 
   	{
	    /* Go crazy */
	    
	    /* Only need to update LEDs once every 2ms */
	    if (timer_val > timer)
	    {
		    /*-- I'm using this to dim the LEDs on and off with the color white --*/
	    
		    if (dim_value >= 26) /* Start dimming down */
		    {
			    dim_value = 25;
			    dim_dir = 0;
		    }
		    else if(dim_value == 0) /* Start dimming up */
		    {
			    dim_dir = 1;
		    }
	    
		    if (dim_dir)
			    dim_value++;
		    else
			    dim_value--;
	    
		    /* Set LED data to dimmed white */
		    set_all(dim_value, dim_value, dim_value);
	    
		    /*---------------------------------------------------------------------*/
	    
		    /*-- I'm using this to race a red LED back and forth across the strip --*/
	    
		    if (led < 0) /* Reached end, go back */
		    {
			    led = 0;
			    led_dir = 1;
		    }
		    else if(led >= LED_NUM - 1) /* Reached other end, go back */
		    {
			    led = LED_NUM - 1;
			    led_dir = 0;
		    }
	    
		    /* Set new position */
		    set_led(led, 100, 0, 0); /* Red */
	    
		    /* Move LED for next iteration */
		    if (led_dir)
			    led++;
		    else
			    led--;
	    
		    /*-----------------------------------------------------------------------*/
		    
		    timer_val = 0; /* Reset update cycle */
	    }
	    
	    
	    /* Send out the color data to the LEDs */
	    send_led_data();
	    
	    /* Refresh rate for LEDs*/
	    /* It is hard to estimate due to the logic above consuming time, but forcing it at 1ms + roughly 2.5ms from LED data transfer + 0.5ms logic above => 4ms per loop */
	    /* Which is roughly 250Hz, but again this is a hard guess, it's probably even less */
	    /* Also note that this is different from the update rate which is how fast you are updating your LED colors in code */
	    sleep_ms(1); 
	    timer_val++;
	    
	    /* A wait is important like the sleep_ms() above. This is to give the LEDs a notice of reset. It expects anything more than 50us */
    	}
	
	return 0;
}
cycle_delay.S (capital S extension is important, not lower case)

Code: Select all

.syntax unified

.global cycle_delay_t0h
.global cycle_delay_t0l
.global cycle_delay_t1h
.global cycle_delay_t1l

.global disable_and_save_interrupts
.global enable_and_restore_interrupts

@(Assuming 100MHz CPU clock)

@(Remember that C side has some delays due to arithmetic and register logic in between delays so we must be under the required timings)

@(Each instruction cycle is 10ns => 1/100MHz*1000)
@(Formula for actual ns delay in the functions below is [10 + [10 * N] + [2 * 10 * N] + 20 - 10] where N is the number fed into R0)

@(These functions are just used to waste cycles)

cycle_delay_t0h:			@[400ns needed] [350ns actual]
	MOVS R0, #11			@1 cycle (change this up or down if timings don't work)
1:	SUBS R0, R0, #1			@1 cycle (subtract from R0 and check below if 0 yet)
	BNE 1b					@2 cycles on entry, 1 on exit (if no zero, branch back to local label)
	BX LR					@2 cycles (bounce back to C)
							 

cycle_delay_t0l:			@[850ns needed] [800ns actual]
	MOVS R0, #26			@(change this up or down if timings don't work)
2:	SUBS R0, R0, #1
	BNE 2b
	BX LR					

cycle_delay_t1h:			@[800ns needed] [770ns actual]
	MOVS R0, #25			@(change this up or down if timings don't work)
3:	SUBS R0, R0, #1
	BNE 3b
	BX LR					

cycle_delay_t1l:			@[450ns needed] [410ns actual]
	MOVS R0, #13			@(change this up or down if timings don't work)
4:	SUBS R0, R0, #1
	BNE 4b
	BX LR

@Used to disable all interrupts
disable_and_save_interrupts:
	MRS R0, PRIMASK
	CPSID IF
	BX LR

@Resume interrupts
enable_and_restore_interrupts:
	MSR PRIMASK, R0
	CPSIE IF
	BX LR
	
And in the CMakeLists.txt file you just add the assembler file like this;

Code: Select all

add_executable(program
        main.c
        cycle_delay.S)
        

And that's it, it should work.

WestfW
Posts: 284
Joined: Tue Nov 01, 2011 9:56 pm

Re: [Example] WS2812B from scratch using C & Assembly

Wed Oct 27, 2021 9:12 am

What did you do to account for the slow qspi flash and non-deterministic caching behavior?

dthacher
Posts: 1333
Joined: Sun Jun 06, 2021 12:07 am

Re: [Example] WS2812B from scratch using C & Assembly

Wed Oct 27, 2021 10:05 am

WestfW wrote: What did you do to account for the slow qspi flash and non-deterministic caching behavior?
I am guessing XIP may work fairly well here. Most of his code is fairly compact and does not involve large library calls. He could have used RAM functions.

XIP is slow compared to internal flash and dual core can poke this pretty hard. However the stripped multiple port SRAM would solve this without the use of two flash memories. XIP cache also helps out some. Writing code in tight loops which are small would work better. XIP cache is fairly large something like 1024 instructions.

Some microcontrollers like the PIC32MX795 have a 80MHz core running on 30-50MHz flash. This requires a small 16 instruction cache to spread out the wide slow flash to the narrow fast system bus. Issue can occur here however. This appears in other 32 bit controller from ST and TI. For linear execution your fine as prefetch can help. However for wide branches you in trouble. Small tight loops should work also. PIC32 also supports RAM functions however the instruction and data bus end up fighting over the single port SRAM.

Compared to slow 8/16/32 bit microcontroller these really can have poor determinism like a microprocessor does. Interrupts can become very difficult to do properly and are potentially pointless. However it may be possible to install those in RAM. RP2040 has a lot of horsepower to work with, but you have to know how to use it. Even then you are kind of screwed potentially.

The Cortex-M0 is potentially hiding some of the XIP issues. However this likely done for area considerations to enable dual core which is much more friendly and powerful potentially. Dual Cortex-M0 core for memory intensive operation can be very poor or the same compared to Cortex-M3 on the flip side.

The RP2040 is very strong in determinism in the IO and memory department so long as you have enough SRAM. I love the PIC32s but there are something in there which require more thought. The CPU is almost always above DMA, so the DMA can be stalled by a chain of memory operations in memory. Which is not very realistic. This is not likely to ever happen or matter due to FIFO and stripped multiport SRAM.

I am going to ignore any issues with IPC between cores which are prevented by the FIFO. I am also ignoring the ability to avoid bit banging and all the fun that brings but using PIO.

yeahimthatguy
Posts: 36
Joined: Thu Apr 08, 2021 6:14 pm

Re: [Example] WS2812B from scratch using C & Assembly

Wed Oct 27, 2021 3:49 pm

WestfW wrote:
Wed Oct 27, 2021 9:12 am
What did you do to account for the slow qspi flash and non-deterministic caching behavior?
I am not sure what you mean. The data buffer sits in RAM and the data transfer is tightly called. Are you referring to the for loops and calling assembly functions from within C?

matherp
Posts: 465
Joined: Tue May 02, 2017 10:54 am

Re: [Example] WS2812B from scratch using C & Assembly

Wed Oct 27, 2021 6:51 pm

Fully agree that this doesn't need over-engineering. I support all three variants of the WS2812 in my port of MMBasic for the Pico.

I just use the systick timer for the timing. The routine is set to run from RAM and I disable interrupts while the transmission is working. Works perfectly on the 8x8 LED arrays which is as big as I've tested so far.

My code is in https://github.com/UKTailwind/PicoMite/ ... External.c if anyone is interested

yeahimthatguy
Posts: 36
Joined: Thu Apr 08, 2021 6:14 pm

Re: [Example] WS2812B from scratch using C & Assembly

Wed Oct 27, 2021 7:50 pm

matherp wrote:
Wed Oct 27, 2021 6:51 pm
Fully agree that this doesn't need over-engineering. I support all three variants of the WS2812 in my port of MMBasic for the Pico.

I just use the systick timer for the timing. The routine is set to run from RAM and I disable interrupts while the transmission is working. Works perfectly on the 8x8 LED arrays which is as big as I've tested so far.

My code is in https://github.com/UKTailwind/PicoMite/ ... External.c if anyone is interested
Oh yeah SysTick is definitely a good idea and can be easily changed to accommodate for different CPU clocks, better than trying to time assembly instructions. I forgot SysTick can tick faster than the normal timer on the RP2040.

Now I'm jealous I didn't do that instead.

Return to “General”