-
-
Notifications
You must be signed in to change notification settings - Fork 25
RuntimeError: Could not build empty array #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interesting. Your code works fine for me for an array of more than 4 Petabytes on Linux (in about 10s):
As can be seen, only 12 GB of system memory is used. My code:
|
FWIW, the main bottleneck is allocating memory for sparse storage here. But if you may want to use a contiguous storage instead, and you will be able to create up to 38 Petabytes in less than a second (and consuming just 58 MB of RAM): import numpy as np
import blosc2
print("Blosc2 version:", blosc2.__version__)
a = blosc2.empty(shape=(17707749 * 1_000, 768 * 800), dtype=np.float32, contiguous=True)
# Storing to disk is contiguous by default (this should take around 240 bytes on-disk)
# a = blosc2.empty(shape=(17707749 * 1_000, 768 * 800), dtype=np.float32, urlpath="a.b2nd", mode="w")
print("a.info:\n", a.info)
|
The shapes you tried work for me too but shape
|
Ok. I cannot reproduce this on a couple of Linux boxes I have. I guess this is related with the partition that has been computed for your box, which depends on the size of the caches of the CPU. Can you post the output of this? import blosc2
import pprint
print("Blosc2 version:", blosc2.__version__)
print("Blosc2 cpu_info:")
pprint.pprint(blosc2.cpu_info) |
|
What was happening is that the cache size discovery machinery was not working correctly. This, with a glitch with the cap for the chunksize, was causing the error. I have fixed the cap with chunksize in main; can you do a quick check that it works in your machine? For a more accurate fix, can you tell us which is the output of: lscpu --json and cat /sys/devices/system/cpu/cpu0/cache/index3/size in your machine? |
BTW, a new python-blosc2 3.3.2 version, with this fix in, has been released. |
3.3.2 confirmed working now, thank you! For what it's worth, here are the outputs you asked for: lscpu --json: {
"lscpu": [
{
"field": "Architecture:",
"data": "x86_64",
"children": [
{
"field": "CPU op-mode(s):",
"data": "32-bit, 64-bit"
},{
"field": "Address sizes:",
"data": "48 bits physical, 48 bits virtual"
},{
"field": "Byte Order:",
"data": "Little Endian"
}
]
},{
"field": "CPU(s):",
"data": "192",
"children": [
{
"field": "On-line CPU(s) list:",
"data": "0-191"
}
]
},{
"field": "Vendor ID:",
"data": "AuthenticAMD",
"children": [
{
"field": "Model name:",
"data": "AMD EPYC 7R32",
"children": [
{
"field": "CPU family:",
"data": "23"
},{
"field": "Model:",
"data": "49"
},{
"field": "Thread(s) per core:",
"data": "2"
},{
"field": "Core(s) per socket:",
"data": "48"
},{
"field": "Socket(s):",
"data": "2"
},{
"field": "Stepping:",
"data": "0"
},{
"field": "BogoMIPS:",
"data": "5600.00"
},{
"field": "Flags:",
"data": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid"
}
]
}
]
},{
"field": "Virtualization features:",
"data": null,
"children": [
{
"field": "Hypervisor vendor:",
"data": "KVM"
},{
"field": "Virtualization type:",
"data": "full"
}
]
},{
"field": "Caches (sum of all):",
"data": null,
"children": [
{
"field": "L1d:",
"data": "3 MiB (96 instances)"
},{
"field": "L1i:",
"data": "3 MiB (96 instances)"
},{
"field": "L2:",
"data": "48 MiB (96 instances)"
},{
"field": "L3:",
"data": "384 MiB (24 instances)"
}
]
},{
"field": "NUMA:",
"data": null,
"children": [
{
"field": "NUMA node(s):",
"data": "2"
},{
"field": "NUMA node0 CPU(s):",
"data": "0-47,96-143"
},{
"field": "NUMA node1 CPU(s):",
"data": "48-95,144-191"
}
]
},{
"field": "Vulnerabilities:",
"data": null,
"children": [
{
"field": "Gather data sampling:",
"data": "Not affected"
},{
"field": "Itlb multihit:",
"data": "Not affected"
},{
"field": "L1tf:",
"data": "Not affected"
},{
"field": "Mds:",
"data": "Not affected"
},{
"field": "Meltdown:",
"data": "Not affected"
},{
"field": "Mmio stale data:",
"data": "Not affected"
},{
"field": "Reg file data sampling:",
"data": "Not affected"
},{
"field": "Retbleed:",
"data": "Mitigation; untrained return thunk; SMT enabled with STIBP protection"
},{
"field": "Spec rstack overflow:",
"data": "Vulnerable: Safe RET, no microcode"
},{
"field": "Spec store bypass:",
"data": "Mitigation; Speculative Store Bypass disabled via prctl"
},{
"field": "Spectre v1:",
"data": "Mitigation; usercopy/swapgs barriers and __user pointer sanitization"
},{
"field": "Spectre v2:",
"data": "Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected"
},{
"field": "Srbds:",
"data": "Not affected"
},{
"field": "Tsx async abort:",
"data": "Not affected"
}
]
}
]
} and
|
Thanks for the output. With this, I have come with a more refined way for guessing cache sizes in 11584f1. Can you try the code in main (just install it with |
Well I don't have any code to test right now (I hit this error while working on a problem and wanted to test with blosc2 but I've shifted to something else at the moment). If you have any code snippet you'd like me to test I'd be happy to run it |
Yes, that would be great. Can you please run the script in https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/compute_dists2.py with the 'large' param like this? /usr/bin/time -v python bench/ndarray/compute_dists2.py large and send the output back? It should take less than 5 min in your machine. Also, the script creates a plot file (blosc2_vs_numexpr_subplots.png) in current working directory; please attach it to this ticket too. Thanks! |
|
Hmm, interesting. Your CPU seems to run Blosc2 quite inefficiently. The reasons are unknown to me, as we generally don't have access to multi-socket CPUs. A possibility is that Zen2 is not very good at compressing/decompressing, but as Zen2 is not that old, I tend to think that there is quite a lot of room for improvement in multi-socket scenarios. For what is worth, here are the benchmarks for our AMD box (9800X3D, Zen5), with 64 GB of RAM:
|
I tried running it with numactl cpunode and membind which doesn't seem to have made a difference for blosc but drastically improved numexpr:
|
Ok. This is with main branch, or stock 3.3.2? It would be nice to see a comparison between both, as they make quite different L3 guesses. |
Those were with main, here are the results for 3.3.2:
|
Describe the bug
I'm encountering a runtime error when trying to create an array of shape (17707749, 768):
To Reproduce
Expected behavior
array to be created
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: