Skip to content

TinyPico D4 possible memory overrun #7288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RetiredWizard opened this issue Dec 1, 2022 · 19 comments
Closed

TinyPico D4 possible memory overrun #7288

RetiredWizard opened this issue Dec 1, 2022 · 19 comments

Comments

@RetiredWizard
Copy link

CircuitPython version

Adafruit CircuitPython 8.0.0-beta.4-58-gc8390a791 on 2022-11-30; TinyPICO with ESP32-PICO-D4

Code/REPL

Delete everything from the TinyPico flash and copy the attached PyDOS.py file.

import PyDOS

At the /> prompt type: dir

Behavior

Starting Py-DOS...
Warning wild card length set to: 11

(4071120)/>dir
0 16
<> <>
Traceback (most recent call last):
File "", line 1, in
File "PyDOS.py", line 1208, in
File "PyDOS.py", line 573, in PyDOS
File "PyDOS.py", line 307, in prDir
TypeError: unsupported types for sub: '', 'int'

PyDOS.zip

Description

I believe the Unexpected Maker TinyPico D4 CircuitPython build has some sort of memory overrun problem.

I've only been able to really see the memory error on larger programs 1000+ lines of code. It seems that some of the program objects are being clobbered as the program runs. For example if I set a variable to a text string, later on in the program I get an error indicating "TypeError: can't convert 'dict' object to str implicitly" and if I print out the variable it's no longer the string variable but a dict from some other part of the program. If I set the variable to the string again just before the line indicated by the error the program continues but has a similar error a litter further down. Other large programs I test behave similarly. Moderatly smaller programs 400-500 lines seem to either run or simply hang (or hang the serial port).

All of the programs I'm seeing issues on are exactly the same versions that I've run on 10 or more different boards.

The errors also seem to be related to how many files are stored in the Flash filesystem??? If I put just a few files on the the flash either the errors occur at different locations or the program simply runs.

Since I"m guessing this is a memory error, I've tried a couple simple scripts to gobble up memory and see if I can reproduce the variable clobbering but so far they have only run the board out of memory,

I've spent several hours trying to get something to demonstrate the error that doesn't involve 1000 lines of code and I'll keep working on that front but I've attached a zipped copy of PyDOS.py that demonstrates the issue.

On line 192 at the start of the prDIR function I print out two variables, swBits and swAllB, between line 192 and 305 there is a single function definition only, and then on line 306 I print out swBits and swAllB again but their values have changed.

Additional information

I ran this quickly by @UnexpectedMaker on Discord but without an issue and some code that reproduced the issue he didn't think there was much that could be looked at.

@RetiredWizard
Copy link
Author

I'm using the TinyPICO V2

@dhalbert
Copy link
Collaborator

dhalbert commented Dec 2, 2022

Do you have other similar boards you could try this on to see if it is board-specific?

@dhalbert dhalbert added this to the 8.x.x milestone Dec 2, 2022
@RetiredWizard
Copy link
Author

The TinyPICO V2 is currently my only ESP32-D4 board.

I don't actually know if there is another D4 board that supports circuitpython. I have run this same code on ESP32, ESP32-S2, ESP32-S3 and ESP32C3 boards running CP without this issue occurring.

I believe the QTPY ESP32-Pico uses a similar microprocessor that I haven't tested yet, I'll go ahead and order one of those. If anyone knows of another D4 board, let me know :)

@RetiredWizard
Copy link
Author

Just as a swipe in the dark I decided to look at the boards/unexpectedmaker_tinypico/sdkconfig file and I set the CONFIG_SPIRAM_SIZE parameter first to 2097152, then 524288 and finally to 2048. Each time when the board started gc.mem_free() returned 4095360. I also tried using the sdkconfig file from the qtpy pico board but that build didn't startup on the tinyPICO.

@jepler
Copy link

jepler commented Dec 2, 2022

We retrieve espidf's own idea of the amount of psram present, ultimately via esp_spiram_get_size(), but through multiple layers of code. It may be that our version of esp-idf is improperly detecting the PSRAM (the early boot messages from esp-idf would be helpful here) or there's something wrong with how the data is passed among the layers. For instance, the way that ESP32 (only) starts at SOC_EXTRAM_DATA_LOW, and others end at SOC_EXTRAM_DATA_HIGH, seems weird, but it matched what I saw in esp-idf.

    #ifdef CONFIG_SPIRAM
    {
        intptr_t heap_start = common_hal_espidf_get_psram_start();
        intptr_t heap_end = common_hal_espidf_get_psram_end();
        size_t spiram_size = heap_end - heap_start;
        if (spiram_size > 0) {
            heap = (uint32_t *)heap_start;
            heap_size = (heap_end - heap_start) / sizeof(uint32_t);
        } else {
            ESP_LOGE(TAG, "CONFIG_SPIRAM enabled but no spiram heap available");        }
    }
    #endif
intptr_t common_hal_espidf_get_psram_start(void) {
    #ifdef CONFIG_SPIRAM
    if (esp_spiram_is_initialized()) {
        #ifdef CONFIG_IDF_TARGET_ESP32                        
        return SOC_EXTRAM_DATA_LOW;
        #else
        return SOC_EXTRAM_DATA_HIGH - psram_size_usable();
        #endif
    }   
    #endif
    return 0;
}    

intptr_t common_hal_espidf_get_psram_end(void) {
    #ifdef CONFIG_SPIRAM
    if (esp_spiram_is_initialized()) {
        return common_hal_espidf_get_psram_start() + psram_size_usable();
    }
    #endif
    return 0;
}
static size_t psram_size_usable(void) {                                         
    #ifdef CONFIG_SPIRAM                                                        
    /* PSRAM chip may be larger than the size we can map into address space */
    size_t s = MIN(esp_spiram_get_size(), SOC_EXTRAM_DATA_SIZE);
    return s - esp_himem_reserved_area_size();
    #else
    return 0;
    #endif
}

@RetiredWizard
Copy link
Author

This is the first things that come through on the Thonny console:

ets Jun 8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:12756
ho 0 tail 12 room 4
load:0x40080400,len:3024
entry 0x400805dc
Serial console setup

Device is busy or does not respond. Your options:

  • wait until it completes current work;
  • use Ctrl+C to interrupt current work;
  • reset the device and try again;
  • check connection properties;
  • make sure the device has suitable MicroPython / CircuitPython / firmware;
  • make sure the device is not in bootloader mode.

Adafruit CircuitPython 8.0.0-beta.4-58-gc8390a791 on 2022-11-30; TinyPICO with ESP32-PICO-D4
Board ID:unexpectedmaker_tinypico
UID:0520191A8983

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

@RetiredWizard
Copy link
Author

Interesting side note, when I started customizing the sdkconfig file the CP flash filesystem was initialized so I recopied my code down to it. Before getting the boot messages I posted in the previous comment, I restored the Adafruit build from circuitpython.org and the flash filesystem was also restored to the same state it was in prior to my loading the custom images. That probably makes perfect sense to someone that understands how the flash is handled but just in case I thought I'd mention it....

@RetiredWizard
Copy link
Author

I don't know if this will help, but I commented out the CONFIG_ESP32_SPIRAM_SUPPORT=y line in sdkconfig and added CIRCUITPY_ESP32_CAMERA = 0 to the mpconfigboard.mk file and gc.mem_free() now reports 106192 but the PyDOS code now seems to run without the variables being clobbered.

@UnexpectedMaker
Copy link

Sorry folks, been AWOL with production line issues for the last few weeks :(

I'll have another look over my TinyPICO board settings compared to MicroPython. Maybe I missed something, or set something wrong, though I'm pretty sure I checked them before doing the PR.

I'll also look into how MP includes the Flash/PSRAM cache issue IDF fixes, as they need to be included to have both Flash and PSRAM working together, and PSRAM to work properly.

I'll get onto this as soon as I can @RetiredWizard

@RetiredWizard
Copy link
Author

My QT Py ESP32 Pico was delivered today. It does not seem to have this same memory issue.

@UnexpectedMaker
Copy link

My QT Py ESP32 Pico was delivered today. It does not seem to have this same memory issue.

That's using a different ESP32 chip and different PSRAM. So different settings and it doesn't require any cache fixes for IDF.

@RetiredWizard
Copy link
Author

RetiredWizard commented Jan 15, 2023

If I build without the SPI RAM everything works great (except there's not much RAM 😁). The Micropython build also works fine so I've been comparing the sdkconfig files and one thing I've noticed is that CircuitPython configures the CONFIG_SPIRAM_SPEED_40M=y parameter regardless of whether I set CONFIG_SPIRAM_SPEED_80M=y in the boards/unexpectedmaker_tinypico/sdkconfig or not.

The Micropython build looks to me like it's using the 80M setting so I'd like to try a build with that setting but haven't figured out how yet.

EDIT: I managed to get the 80M setting but still no luck running the larger python scripts.

@RetiredWizard
Copy link
Author

I've hacked away at this on and off for a week or so without any breakthroughs. My current theory is that the CircuitPython variable structure (sorry, I know nothing about the internals) is somehow getting misplaced or corrupted (or perhaps a variable pointer) when the SPI RAM is used.

Everything on the board seems to work fine with the SPI RAM, but when larger python programs are run, sometime during their execution when CP accesses a variable it actually retrieves a different variable's contents. If I build CP without the SPI RAM the same larger programs will run without any variable corruptions.

@RetiredWizard
Copy link
Author

I ran the test code from #2687 with SPI RAM enable and disabled and got the same results as described in the issue. It's not entirely clear to me from the open issue if this is really an issue or not but it did highlight the id() function for me and since both the working and not working memory models on this board behave the same way I'm thinking #2687 probably isn't a factor here.

I did try using the id() function to get a better feel of what's going on. I ended up chopping my main application up until I got a crash that resulted in a python backtrace rather than freezing and then started inserting print statements. I suspect the freezes occurred when a function object was corrupted and python attempted to transfer program control to garbage memory locations.

This code segement:

    def prDir(dirPath,swBits):
        if swBits & (swAllB-int('010110',2)):
            print("Illegal switch, Command Format: DIR[/p][/w][/s] [path][file]")
            return

        savDir = os.getcwd()

        print('absPath:',id(absolutePath))
        print('slh:',slh,id(slh),'dirPath:',dirPath,id(dirPath),'savDir:',savDir,id(savDir))
        #fullPath = absolutePath(dirPath,savDir)
        fullPath = None
        print("fullPath:",fullPath,id(fullPath),"slh:",slh,id(slh),"dirPath:",dirPath,id(dirPath))

prints:

absPath: 1065450992
slh: / 102 dirPath: 102 205 savDir: slh: 25222

and then freezes apparently on the 'fullPath = None` statement.

If I modify the print statement by removing the text literals:

    def prDir(dirPath,swBits):
        if swBits & (swAllB-int('010110',2)):
            print("Illegal switch, Command Format: DIR[/p][/w][/s] [path][file]")
            return

        savDir = os.getcwd()

        print('absPath:',id(absolutePath))
        print(slh,id(slh),dirPath,id(dirPath),savDir,id(savDir))
        #fullPath = absolutePath(dirPath,savDir)
        fullPath = None
        print("fullPath:",fullPath,id(fullPath),"slh:",slh,id(slh),"dirPath:",dirPath,id(dirPath))
        fullPath = (savDir+dirPath if dirPath[0] != slh else dirPath)

The following output occurs:

absPath: 1065450736
/ 102 / 102 / 102
fullPath: <function> 1061198672 slh: / 102 dirPath: 1061198672 2122397345
Traceback (most recent call last):
  File "code.py", line 1, in <module>
  File "PyDOS.py", line 1227, in <module>
  File "PyDOS.py", line 596, in PyDOS
  File "PyDOS.py", line 321, in prDir
TypeError: 'int' object isn't subscriptable

In both cases it appears to me that the program has reached a point where it can no longer add new variables to the variable structure. In the first example I created new literals in the print statement and after that it seems objects could no longer be properly identified. In the second case the same thing happens after the fullPath variable is created.

Since all this weirdness only happens when using the SPI RAM I'm thinking it must have something to do with where the data structures are being located (possibly based on the available RAM???).

@dhalbert dhalbert modified the milestones: 8.x.x, Long term Apr 14, 2023
@RetiredWizard
Copy link
Author

With Scott's recent memory work on 9.x, I decided to take another look at this since I believe it's some sort of memory management bug, however the issue still seems to occur using the latest build from main.

I was able to build a debug build by disabling WiFi which actually ended up creating a much smaller image (about 500K free flash, rather than 50K free flash for the standard build). That build still showed the issue but it took much more activity (memory use) before the problem occurred.

I was able to grab the ESP Debug output immediately after power up as well:

Serial console setup
I (2130) espidf: Reserving 0 bytes of psram
I (2180) gpio: GPIO[0]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2190) gpio: GPIO[2]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2200) gpio: GPIO[4]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2210) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2220) gpio: GPIO[9]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2230) gpio: GPIO[12]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2240) gpio: GPIO[13]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2250) gpio: GPIO[14]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2260) gpio: GPIO[15]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2270) gpio: GPIO[18]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2280) gpio: GPIO[19]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2290) gpio: GPIO[20]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2300) gpio: GPIO[21]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2310) gpio: GPIO[22]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2320) gpio: GPIO[23]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2330) gpio: GPIO[27]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2340) gpio: GPIO[32]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2350) gpio: GPIO[33]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2360) gpio: GPIO[34]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2370) gpio: GPIO[35]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2380) gpio: GPIO[36]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2390) gpio: GPIO[37]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2400) gpio: GPIO[38]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (2410) gpio: GPIO[39]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
E (2420) timer_group: timer_deinit(343): HW TIMER NEVER INIT ERROR
E (2430) timer_group: timer_deinit(343): HW TIMER NEVER INIT ERROR
E (2430) timer_group: timer_deinit(343): HW TIMER NEVER INIT ERROR
E (2440) timer_group: timer_deinit(343): HW TIMER NEVER INIT ERROR
Adafruit CircuitPython 9.0.0-alpha.1-68-ga4b8afb54f-dirty on 2023-10-16; TinyPICO with ESP32-PICO-D4
Board ID:unexpectedmaker_tinypico
UID:0520191A8983
boot.py output:

I'll test this again once 9.x is released.

@tannewt tannewt added the esp32 label Oct 16, 2023
@UnexpectedMaker
Copy link

@RetiredWizard Curious I just had another look at the 8.2 branch compared to MP and the only thing different that stands out (but that I don't think it the issue) is CP is not setting CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL to anything. MP sets it to CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=8192

That setting will try to put smaller allocations in internal SRAM instead of PSRAM - which might cause less fragmentation in PSRAM. You could try building with that added to the TinyPICO settings.

Also, MP 1.21 introduced a new way to deal with allocating and tracking free space in PSRAM (maybe memory in general) and I believe the plan is to merge MP 1.21 into 9.x now that 1.20 has been merged.

But the fact that you previously tried MP way back beef these new memory management parts were added, and didn't see the issue in MP, only in CP doesn't give me much confidence that the new 1.21 changes will help.

It's likely just a missing or different SDKCONFIG setting somewhere that's causing this... but it might not be a PSRAM specific one, so hard to find.

@RetiredWizard
Copy link
Author

Thanks @UnexpectedMaker!

I threw CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=8192 in boards/unexpectedmaker_tinypico/sdkconfig but that didn't appear to have any impact.

I also just loaded the latest v1.22 preview Micropython again and it still seems to work fine.

@UnexpectedMaker
Copy link

Yeah, I didn't think it would help - but bummer anyway, sorry.

@RetiredWizard
Copy link
Author

I believe this has been fixed by #8553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants