RuntimeError: Could not build empty array #392

sirfz · 2025-04-30T10:09:40Z

Describe the bug
I'm encountering a runtime error when trying to create an array of shape (17707749, 768):

RuntimeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 blosc2.empty(shape=(17707749, 768), dtype=np.float32)

File .venv/lib/python3.11/site-packages/blosc2/ndarray.py:2978, in empty(shape, dtype, **kwargs)
2976 blocks = kwargs.pop("blocks", None)
2977 chunks, blocks = compute_chunks_blocks(shape, chunks, blocks, dtype, **kwargs)
-> 2978 return blosc2_ext.empty(shape, chunks, blocks, dtype, **kwargs)

File blosc2_ext.pyx:2706, in blosc2.blosc2_ext.empty()

File blosc2_ext.pyx:2233, in blosc2.blosc2_ext._check_rc()

RuntimeError: Could not build empty array

To Reproduce

import numpy as np
import blosc2

blosc2.empty(shape=(17707749, 768), dtype=np.float32)

Expected behavior
array to be created

Desktop (please complete the following information):

OS: Ubuntu 24.04 (x86_64)
Version 3.3.1

FrancescAlted · 2025-04-30T11:55:22Z

Interesting. Your code works fine for me for an array of more than 4 Petabytes on Linux (in about 10s):

Blosc2 version: 3.3.1
a.info:
 type    : NDArray
shape   : (17707749000, 76800)
chunks  : (75, 76800)
blocks  : (1, 38400)
dtype   : float32
cratio  : 720000.00
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=16, blocksize=153600, splitmode=<SplitMode.AUTO_SPLIT: 3>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=16)

	Command being timed: "python python-blosc2/prova.py"
	User time (seconds): 6.83
	System time (seconds): 3.41
	Percent of CPU this job got: 109%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.38
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 12970412
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 3242421
	Voluntary context switches: 192
	Involuntary context switches: 325
	Swaps: 0
	File system inputs: 8
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

As can be seen, only 12 GB of system memory is used.

My code:

import numpy as np
import blosc2

print("Blosc2 version:", blosc2.__version__)
a = blosc2.empty(shape=(17707749 * 1_000, 768 * 100), dtype=np.float32)
print("a.info:\n", a.info)

FrancescAlted · 2025-04-30T12:19:19Z

FWIW, the main bottleneck is allocating memory for sparse storage here. But if you may want to use a contiguous storage instead, and you will be able to create up to 38 Petabytes in less than a second (and consuming just 58 MB of RAM):

import numpy as np
import blosc2

print("Blosc2 version:", blosc2.__version__)
a = blosc2.empty(shape=(17707749 * 1_000, 768 * 800), dtype=np.float32, contiguous=True)
# Storing to disk is contiguous by default (this should take around 240 bytes on-disk)
# a = blosc2.empty(shape=(17707749 * 1_000, 768 * 800), dtype=np.float32, urlpath="a.b2nd", mode="w")
print("a.info:\n", a.info)

Blosc2 version: 3.3.1
a.info:
 type    : NDArray
shape   : (17707749000, 614400)
chunks  : (10, 614400)
blocks  : (1, 61440)
dtype   : float32
cratio  : 0.00
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=16, blocksize=245760, splitmode=<SplitMode.AUTO_SPLIT: 3>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=16)

	Command being timed: "python python-blosc2/prova.py"
	User time (seconds): 0.98
	System time (seconds): 0.01
	Percent of CPU this job got: 757%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.13
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 59020
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 14457
	Voluntary context switches: 136
	Involuntary context switches: 59
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

sirfz · 2025-04-30T12:51:00Z

The shapes you tried work for me too but shape (17707749, 768) is failing for some reason:

In [1]: import numpy as np
   ...: import blosc2
   ...:
   ...: print("Blosc2 version:", blosc2.__version__)
   ...: a = blosc2.empty(shape=(17707749 * 1_000, 768 * 100), dtype=np.float32)
   ...: print("a.info:\n", a.info)
Blosc2 version: 3.3.1
a.info:
 type    : NDArray
shape   : (17707749000, 76800)
chunks  : (5975, 76800)
blocks  : (1, 25600)
dtype   : float32
cratio  : 57360000.00
cparams : CParams(codec=<Codec.ZSTD: 5>, codec_meta=0, clevel=1, use_dict=False, typesize=4,
        : nthreads=56, blocksize=102400, splitmode=<SplitMode.AUTO_SPLIT: 3>,
        : filters=[<Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>,
        : <Filter.NOFILTER: 0>, <Filter.NOFILTER: 0>, <Filter.SHUFFLE: 1>], filters_meta=[0, 0,
        : 0, 0, 0, 0], tuner=<Tuner.STUNE: 0>)
dparams : DParams(nthreads=56)


In [2]: a = blosc2.empty(shape=(17707749 * 1_000, 768), dtype=np.float32)

In [3]: a = blosc2.empty(shape=(17707749, 768), dtype=np.float32)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 1
----> 1 a = blosc2.empty(shape=(17707749, 768), dtype=np.float32)

File .venv/lib/python3.12/site-packages/blosc2/ndarray.py:2978, in empty(shape, dtype, **kwargs)
   2976 blocks = kwargs.pop("blocks", None)
   2977 chunks, blocks = compute_chunks_blocks(shape, chunks, blocks, dtype, **kwargs)
-> 2978 return blosc2_ext.empty(shape, chunks, blocks, dtype, **kwargs)

File blosc2_ext.pyx:2706, in blosc2.blosc2_ext.empty()

File blosc2_ext.pyx:2233, in blosc2.blosc2_ext._check_rc()

RuntimeError: Could not build empty array

In [4]: a = blosc2.empty(shape=(17707749 * 1_000, 768 * 800), dtype=np.float32, contiguous=True)

In [5]:

FrancescAlted · 2025-04-30T15:15:06Z

Ok. I cannot reproduce this on a couple of Linux boxes I have. I guess this is related with the partition that has been computed for your box, which depends on the size of the caches of the CPU. Can you post the output of this?

import blosc2
import pprint

print("Blosc2 version:", blosc2.__version__)
print("Blosc2 cpu_info:")
pprint.pprint(blosc2.cpu_info)

sirfz · 2025-04-30T19:51:31Z

Blosc2 version: 3.3.1
Blosc2 cpu_info:
{'arch': 'X86_64',
 'arch_string_raw': 'x86_64',
 'bits': 64,
 'brand_raw': 'AMD EPYC 7R32',
 'count': 192,
 'cpuinfo_version': [9, 0, 0],
 'cpuinfo_version_string': '9.0.0',
 'family': 23,
 'flags': ['3dnowprefetch',
           'abm',
           'adx',
           'aes',
           'aperfmperf',
           'apic',
           'arat',
           'avx',
           'avx2',
           'bmi1',
           'bmi2',
           'clflush',
           'clflushopt',
           'clwb',
           'clzero',
           'cmov',
           'cmp_legacy',
           'constant_tsc',
           'cpuid',
           'cr8_legacy',
           'cx16',
           'cx8',
           'de',
           'extd_apicid',
           'f16c',
           'fma',
           'fpu',
           'fsgsbase',
           'fxsr',
           'fxsr_opt',
           'ht',
           'hypervisor',
           'ibpb',
           'ibrs',
           'lahf_lm',
           'lm',
           'mca',
           'mce',
           'misalignsse',
           'mmx',
           'mmxext',
           'monitor',
           'movbe',
           'msr',
           'mtrr',
           'nonstop_tsc',
           'nopl',
           'npt',
           'nrip_save',
           'nx',
           'pae',
           'pat',
           'pclmulqdq',
           'pdpe1gb',
           'perfctr_core',
           'pge',
           'pni',
           'popcnt',
           'pse',
           'pse36',
           'rdpid',
           'rdpru',
           'rdrand',
           'rdseed',
           'rdtscp',
           'rep_good',
           'sep',
           'sha_ni',
           'smap',
           'smep',
           'ssbd',
           'sse',
           'sse2',
           'sse4_1',
           'sse4_2',
           'sse4a',
           'ssse3',
           'stibp',
           'syscall',
           'topoext',
           'tsc',
           'tsc_known_freq',
           'vme',
           'vmmcall',
           'wbnoinvd',
           'xgetbv1',
           'xsave',
           'xsavec',
           'xsaveerptr',
           'xsaveopt'],
 'hz_actual': [2800000000, 0],
 'hz_actual_friendly': '2.8000 GHz',
 'hz_advertised': [2800000000, 0],
 'hz_advertised_friendly': '2.8000 GHz',
 'l1_data_cache_size': 32768,
 'l1_instruction_cache_size': 3145728,
 'l2_cache_size': 524288,
 'l3_cache_size': 115964116992,
 'model': 49,
 'python_version': '3.12.9.final.0 (64 bit)',
 'vendor_id_raw': 'AuthenticAMD'}

FrancescAlted · 2025-05-01T04:59:06Z

What was happening is that the cache size discovery machinery was not working correctly. This, with a glitch with the cap for the chunksize, was causing the error.

I have fixed the cap with chunksize in main; can you do a quick check that it works in your machine?

For a more accurate fix, can you tell us which is the output of:

lscpu --json

and

cat /sys/devices/system/cpu/cpu0/cache/index3/size

in your machine?

FrancescAlted · 2025-05-01T05:51:58Z

BTW, a new python-blosc2 3.3.2 version, with this fix in, has been released.

sirfz · 2025-05-01T08:42:19Z

3.3.2 confirmed working now, thank you!

For what it's worth, here are the outputs you asked for:

lscpu --json:

{
   "lscpu": [
      {
         "field": "Architecture:",
         "data": "x86_64",
         "children": [
            {
               "field": "CPU op-mode(s):",
               "data": "32-bit, 64-bit"
            },{
               "field": "Address sizes:",
               "data": "48 bits physical, 48 bits virtual"
            },{
               "field": "Byte Order:",
               "data": "Little Endian"
            }
         ]
      },{
         "field": "CPU(s):",
         "data": "192",
         "children": [
            {
               "field": "On-line CPU(s) list:",
               "data": "0-191"
            }
         ]
      },{
         "field": "Vendor ID:",
         "data": "AuthenticAMD",
         "children": [
            {
               "field": "Model name:",
               "data": "AMD EPYC 7R32",
               "children": [
                  {
                     "field": "CPU family:",
                     "data": "23"
                  },{
                     "field": "Model:",
                     "data": "49"
                  },{
                     "field": "Thread(s) per core:",
                     "data": "2"
                  },{
                     "field": "Core(s) per socket:",
                     "data": "48"
                  },{
                     "field": "Socket(s):",
                     "data": "2"
                  },{
                     "field": "Stepping:",
                     "data": "0"
                  },{
                     "field": "BogoMIPS:",
                     "data": "5600.00"
                  },{
                     "field": "Flags:",
                     "data": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid"
                  }
               ]
            }
         ]
      },{
         "field": "Virtualization features:",
         "data": null,
         "children": [
            {
               "field": "Hypervisor vendor:",
               "data": "KVM"
            },{
               "field": "Virtualization type:",
               "data": "full"
            }
         ]
      },{
         "field": "Caches (sum of all):",
         "data": null,
         "children": [
            {
               "field": "L1d:",
               "data": "3 MiB (96 instances)"
            },{
               "field": "L1i:",
               "data": "3 MiB (96 instances)"
            },{
               "field": "L2:",
               "data": "48 MiB (96 instances)"
            },{
               "field": "L3:",
               "data": "384 MiB (24 instances)"
            }
         ]
      },{
         "field": "NUMA:",
         "data": null,
         "children": [
            {
               "field": "NUMA node(s):",
               "data": "2"
            },{
               "field": "NUMA node0 CPU(s):",
               "data": "0-47,96-143"
            },{
               "field": "NUMA node1 CPU(s):",
               "data": "48-95,144-191"
            }
         ]
      },{
         "field": "Vulnerabilities:",
         "data": null,
         "children": [
            {
               "field": "Gather data sampling:",
               "data": "Not affected"
            },{
               "field": "Itlb multihit:",
               "data": "Not affected"
            },{
               "field": "L1tf:",
               "data": "Not affected"
            },{
               "field": "Mds:",
               "data": "Not affected"
            },{
               "field": "Meltdown:",
               "data": "Not affected"
            },{
               "field": "Mmio stale data:",
               "data": "Not affected"
            },{
               "field": "Reg file data sampling:",
               "data": "Not affected"
            },{
               "field": "Retbleed:",
               "data": "Mitigation; untrained return thunk; SMT enabled with STIBP protection"
            },{
               "field": "Spec rstack overflow:",
               "data": "Vulnerable: Safe RET, no microcode"
            },{
               "field": "Spec store bypass:",
               "data": "Mitigation; Speculative Store Bypass disabled via prctl"
            },{
               "field": "Spectre v1:",
               "data": "Mitigation; usercopy/swapgs barriers and __user pointer sanitization"
            },{
               "field": "Spectre v2:",
               "data": "Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected"
            },{
               "field": "Srbds:",
               "data": "Not affected"
            },{
               "field": "Tsx async abort:",
               "data": "Not affected"
            }
         ]
      }
   ]
}

and

$ cat /sys/devices/system/cpu/cpu0/cache/index3/size
16384K

FrancescAlted · 2025-05-02T06:35:45Z

Thanks for the output. With this, I have come with a more refined way for guessing cache sizes in 11584f1. Can you try the code in main (just install it with pip install git+https://github.com/Blosc/python-blosc2.git@main), and tell me how performance is affected?

sirfz · 2025-05-02T10:28:20Z

Well I don't have any code to test right now (I hit this error while working on a problem and wanted to test with blosc2 but I've shifted to something else at the moment). If you have any code snippet you'd like me to test I'd be happy to run it

FrancescAlted · 2025-05-02T11:49:10Z

Yes, that would be great. Can you please run the script in https://github.com/Blosc/python-blosc2/blob/main/bench/ndarray/compute_dists2.py with the 'large' param like this?

/usr/bin/time -v python bench/ndarray/compute_dists2.py large

and send the output back? It should take less than 5 min in your machine. Also, the script creates a plot file (blosc2_vs_numexpr_subplots.png) in current working directory; please attach it to this ticket too. Thanks!

sirfz · 2025-05-02T12:19:35Z

Benchmarking constant distribution...
Blosc2 - constant - Size 3000x3000: 3.55 GB/s - cratio: 6280.5x
Blosc2 - constant - Size 6000x6000: 5.71 GB/s - cratio: 6306.9x
Blosc2 - constant - Size 9000x9000: 6.11 GB/s - cratio: 5681.0x
Blosc2 - constant - Size 12000x12000: 6.79 GB/s - cratio: 5051.2x
Blosc2 - constant - Size 15000x15000: 7.27 GB/s - cratio: 6314.4x
Blosc2 - constant - Size 18000x18000: 11.44 GB/s - cratio: 3788.8x
Blosc2 - constant - Size 21000x21000: 12.23 GB/s - cratio: 4420.3x
Blosc2 - constant - Size 24000x24000: 14.78 GB/s - cratio: 5051.6x
Blosc2 - constant - Size 27000x27000: 15.31 GB/s - cratio: 5683.1x
Blosc2 - constant - Size 30000x30000: 18.58 GB/s - cratio: 6314.4x
Numexpr - constant - Size 3000x3000: 9.17 GB/s
Numexpr - constant - Size 6000x6000: 23.48 GB/s
Numexpr - constant - Size 9000x9000: 64.67 GB/s
Numexpr - constant - Size 12000x12000: 55.28 GB/s
Numexpr - constant - Size 15000x15000: 59.04 GB/s
Numexpr - constant - Size 18000x18000: 58.97 GB/s
Numexpr - constant - Size 21000x21000: 8.45 GB/s
Numexpr - constant - Size 24000x24000: 7.43 GB/s
Numexpr - constant - Size 27000x27000: 14.82 GB/s
Numexpr - constant - Size 30000x30000: 9.13 GB/s

Benchmarking arange distribution...
Blosc2 - arange - Size 3000x3000: 1.26 GB/s - cratio: 5968.2x
Blosc2 - arange - Size 6000x6000: 5.39 GB/s - cratio: 5992.0x
Blosc2 - arange - Size 9000x9000: 6.61 GB/s - cratio: 5397.1x
Blosc2 - arange - Size 12000x12000: 6.92 GB/s - cratio: 4798.7x
Blosc2 - arange - Size 15000x15000: 6.60 GB/s - cratio: 5852.4x
Blosc2 - arange - Size 18000x18000: 9.29 GB/s - cratio: 3599.4x
Blosc2 - arange - Size 21000x21000: 11.07 GB/s - cratio: 4097.0x
Blosc2 - arange - Size 24000x24000: 14.21 GB/s - cratio: 4799.0x
Blosc2 - arange - Size 27000x27000: 14.77 GB/s - cratio: 5267.4x
Blosc2 - arange - Size 30000x30000: 16.80 GB/s - cratio: 5852.4x
Numexpr - arange - Size 3000x3000: 16.73 GB/s
Numexpr - arange - Size 6000x6000: 60.64 GB/s
Numexpr - arange - Size 9000x9000: 75.35 GB/s
Numexpr - arange - Size 12000x12000: 67.92 GB/s
Numexpr - arange - Size 15000x15000: 65.64 GB/s
Numexpr - arange - Size 18000x18000: 65.53 GB/s
Numexpr - arange - Size 21000x21000: 67.80 GB/s
Numexpr - arange - Size 24000x24000: 67.48 GB/s
Numexpr - arange - Size 27000x27000: 12.18 GB/s
Numexpr - arange - Size 30000x30000: 13.03 GB/s

Benchmarking linspace distribution...
Blosc2 - linspace - Size 3000x3000: 1.47 GB/s - cratio: 241.0x
Blosc2 - linspace - Size 6000x6000: 5.67 GB/s - cratio: 320.6x
Blosc2 - linspace - Size 9000x9000: 6.35 GB/s - cratio: 417.6x
Blosc2 - linspace - Size 12000x12000: 7.00 GB/s - cratio: 426.0x
Blosc2 - linspace - Size 15000x15000: 6.39 GB/s - cratio: 479.3x
Blosc2 - linspace - Size 18000x18000: 9.64 GB/s - cratio: 453.5x
Blosc2 - linspace - Size 21000x21000: 11.19 GB/s - cratio: 527.6x
Blosc2 - linspace - Size 24000x24000: 14.62 GB/s - cratio: 492.0x
Blosc2 - linspace - Size 27000x27000: 14.62 GB/s - cratio: 561.9x
Blosc2 - linspace - Size 30000x30000: 16.06 GB/s - cratio: 503.2x
Numexpr - linspace - Size 3000x3000: 19.98 GB/s
Numexpr - linspace - Size 6000x6000: 69.65 GB/s
Numexpr - linspace - Size 9000x9000: 71.35 GB/s
Numexpr - linspace - Size 12000x12000: 61.82 GB/s
Numexpr - linspace - Size 15000x15000: 67.47 GB/s
Numexpr - linspace - Size 18000x18000: 65.62 GB/s
Numexpr - linspace - Size 21000x21000: 69.22 GB/s
Numexpr - linspace - Size 24000x24000: 67.93 GB/s
Numexpr - linspace - Size 27000x27000: 67.95 GB/s
Numexpr - linspace - Size 30000x30000: 66.34 GB/s
        Command being timed: "uv run python bench/ndarray/compute_dists2.py large"
        User time (seconds): 492.40
        System time (seconds): 1785.30
        Percent of CPU this job got: 172%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 22:00.29
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 32688108
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 30026709
        Voluntary context switches: 239220
        Involuntary context switches: 10941
        Swaps: 0
        File system inputs: 0
        File system outputs: 18808
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

FrancescAlted · 2025-05-02T12:58:19Z

Hmm, interesting. Your CPU seems to run Blosc2 quite inefficiently. The reasons are unknown to me, as we generally don't have access to multi-socket CPUs. A possibility is that Zen2 is not very good at compressing/decompressing, but as Zen2 is not that old, I tend to think that there is quite a lot of room for improvement in multi-socket scenarios.

For what is worth, here are the benchmarks for our AMD box (9800X3D, Zen5), with 64 GB of RAM:

Benchmarking constant distribution...
Blosc2 - constant - Size 3000x3000: 18.09 GB/s - cratio: 6177.1x
Blosc2 - constant - Size 6000x6000: 46.99 GB/s - cratio: 6185.6x
Blosc2 - constant - Size 9000x9000: 31.74 GB/s - cratio: 5184.0x
Blosc2 - constant - Size 12000x12000: 133.59 GB/s - cratio: 4968.9x
Blosc2 - constant - Size 15000x15000: 153.75 GB/s - cratio: 6177.1x
Blosc2 - constant - Size 18000x18000: 160.01 GB/s - cratio: 3739.1x
Blosc2 - constant - Size 21000x21000: 159.11 GB/s - cratio: 4351.3x
Blosc2 - constant - Size 24000x24000: 155.22 GB/s - cratio: 4968.9x
Blosc2 - constant - Size 27000x27000: 130.23 GB/s - cratio: 5579.8x
Blosc2 - constant - Size 30000x30000: 155.67 GB/s - cratio: 6185.6x
Numexpr - constant - Size 3000x3000: 17.78 GB/s
Numexpr - constant - Size 6000x6000: 39.03 GB/s
Numexpr - constant - Size 9000x9000: 41.77 GB/s
Numexpr - constant - Size 12000x12000: 39.58 GB/s
Numexpr - constant - Size 15000x15000: 39.72 GB/s
Numexpr - constant - Size 18000x18000: 40.46 GB/s
Numexpr - constant - Size 21000x21000: 40.80 GB/s
Numexpr - constant - Size 24000x24000: 41.12 GB/s
Numexpr - constant - Size 27000x27000: 41.45 GB/s
Numexpr - constant - Size 30000x30000: 41.58 GB/s

Benchmarking arange distribution...
Blosc2 - arange - Size 3000x3000: 11.71 GB/s - cratio: 5874.7x
Blosc2 - arange - Size 6000x6000: 51.74 GB/s - cratio: 5882.4x
Blosc2 - arange - Size 9000x9000: 34.21 GB/s - cratio: 4638.0x
Blosc2 - arange - Size 12000x12000: 97.05 GB/s - cratio: 4724.4x
Blosc2 - arange - Size 15000x15000: 110.62 GB/s - cratio: 5734.3x
Blosc2 - arange - Size 18000x18000: 115.77 GB/s - cratio: 3554.5x
Blosc2 - arange - Size 21000x21000: 126.23 GB/s - cratio: 4037.5x
Blosc2 - arange - Size 24000x24000: 121.86 GB/s - cratio: 4724.4x
Blosc2 - arange - Size 27000x27000: 114.91 GB/s - cratio: 5178.5x
Blosc2 - arange - Size 30000x30000: 115.04 GB/s - cratio: 5741.6x
Numexpr - arange - Size 3000x3000: 22.45 GB/s
Numexpr - arange - Size 6000x6000: 39.28 GB/s
Numexpr - arange - Size 9000x9000: 41.44 GB/s
Numexpr - arange - Size 12000x12000: 39.68 GB/s
Numexpr - arange - Size 15000x15000: 39.63 GB/s
Numexpr - arange - Size 18000x18000: 40.32 GB/s
Numexpr - arange - Size 21000x21000: 40.78 GB/s
Numexpr - arange - Size 24000x24000: 41.11 GB/s
Numexpr - arange - Size 27000x27000: 41.55 GB/s
Numexpr - arange - Size 30000x30000: 41.66 GB/s

Benchmarking linspace distribution...
Blosc2 - linspace - Size 3000x3000: 11.43 GB/s - cratio: 240.8x
Blosc2 - linspace - Size 6000x6000: 62.70 GB/s - cratio: 320.3x
Blosc2 - linspace - Size 9000x9000: 26.31 GB/s - cratio: 408.6x
Blosc2 - linspace - Size 12000x12000: 87.68 GB/s - cratio: 425.4x
Blosc2 - linspace - Size 15000x15000: 94.90 GB/s - cratio: 478.5x
Blosc2 - linspace - Size 18000x18000: 93.14 GB/s - cratio: 449.6x
Blosc2 - linspace - Size 21000x21000: 101.17 GB/s - cratio: 529.0x
Blosc2 - linspace - Size 24000x24000: 104.50 GB/s - cratio: 491.2x
Blosc2 - linspace - Size 27000x27000: 103.89 GB/s - cratio: 560.8x
Blosc2 - linspace - Size 30000x30000: 92.90 GB/s - cratio: 503.0x
Numexpr - linspace - Size 3000x3000: 26.99 GB/s
Numexpr - linspace - Size 6000x6000: 39.38 GB/s
Numexpr - linspace - Size 9000x9000: 41.30 GB/s
Numexpr - linspace - Size 12000x12000: 39.71 GB/s
Numexpr - linspace - Size 15000x15000: 39.56 GB/s
Numexpr - linspace - Size 18000x18000: 40.53 GB/s
Numexpr - linspace - Size 21000x21000: 40.98 GB/s
Numexpr - linspace - Size 24000x24000: 41.16 GB/s
Numexpr - linspace - Size 27000x27000: 41.51 GB/s
Numexpr - linspace - Size 30000x30000: 41.65 GB/s
	Command being timed: "python bench/ndarray/compute_dists2.py large"
	User time (seconds): 156.13
	System time (seconds): 42.27
	Percent of CPU this job got: 244%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.05
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 27955872
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 510382
	Voluntary context switches: 879587
	Involuntary context switches: 21990
	Swaps: 0
	File system inputs: 0
	File system outputs: 824
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

sirfz · 2025-05-02T13:13:52Z

I tried running it with numactl cpunode and membind which doesn't seem to have made a difference for blosc but drastically improved numexpr:

Benchmarking constant distribution...
Blosc2 - constant - Size 3000x3000: 3.76 GB/s - cratio: 6280.5x
Blosc2 - constant - Size 6000x6000: 5.38 GB/s - cratio: 6306.9x
Blosc2 - constant - Size 9000x9000: 6.11 GB/s - cratio: 5681.0x
Blosc2 - constant - Size 12000x12000: 6.59 GB/s - cratio: 5051.2x
Blosc2 - constant - Size 15000x15000: 6.96 GB/s - cratio: 6314.4x
Blosc2 - constant - Size 18000x18000: 10.65 GB/s - cratio: 3788.8x
Blosc2 - constant - Size 21000x21000: 11.20 GB/s - cratio: 4420.3x
Blosc2 - constant - Size 24000x24000: 13.82 GB/s - cratio: 5051.6x
Blosc2 - constant - Size 27000x27000: 14.31 GB/s - cratio: 5683.1x
Blosc2 - constant - Size 30000x30000: 16.42 GB/s - cratio: 6314.4x
Numexpr - constant - Size 3000x3000: 33.52 GB/s
Numexpr - constant - Size 6000x6000: 29.12 GB/s
Numexpr - constant - Size 9000x9000: 43.61 GB/s
Numexpr - constant - Size 12000x12000: 51.13 GB/s
Numexpr - constant - Size 15000x15000: 59.47 GB/s
Numexpr - constant - Size 18000x18000: 63.32 GB/s
Numexpr - constant - Size 21000x21000: 63.91 GB/s
Numexpr - constant - Size 24000x24000: 64.73 GB/s
Numexpr - constant - Size 27000x27000: 65.23 GB/s
Numexpr - constant - Size 30000x30000: 65.71 GB/s

Benchmarking arange distribution...
Blosc2 - arange - Size 3000x3000: 2.27 GB/s - cratio: 5968.2x
Blosc2 - arange - Size 6000x6000: 5.50 GB/s - cratio: 5992.0x
Blosc2 - arange - Size 9000x9000: 6.22 GB/s - cratio: 5397.1x
Blosc2 - arange - Size 12000x12000: 6.62 GB/s - cratio: 4798.7x
Blosc2 - arange - Size 15000x15000: 6.92 GB/s - cratio: 5852.4x
Blosc2 - arange - Size 18000x18000: 10.72 GB/s - cratio: 3599.4x
Blosc2 - arange - Size 21000x21000: 11.23 GB/s - cratio: 4097.0x
Blosc2 - arange - Size 24000x24000: 13.90 GB/s - cratio: 4799.0x
Blosc2 - arange - Size 27000x27000: 14.32 GB/s - cratio: 5267.4x
Blosc2 - arange - Size 30000x30000: 15.79 GB/s - cratio: 5852.4x
Numexpr - arange - Size 3000x3000: 21.42 GB/s
Numexpr - arange - Size 6000x6000: 42.73 GB/s
Numexpr - arange - Size 9000x9000: 45.86 GB/s
Numexpr - arange - Size 12000x12000: 52.16 GB/s
Numexpr - arange - Size 15000x15000: 59.74 GB/s
Numexpr - arange - Size 18000x18000: 62.47 GB/s
Numexpr - arange - Size 21000x21000: 64.25 GB/s
Numexpr - arange - Size 24000x24000: 65.40 GB/s
Numexpr - arange - Size 27000x27000: 66.09 GB/s
Numexpr - arange - Size 30000x30000: 38.46 GB/s

Benchmarking linspace distribution...
Blosc2 - linspace - Size 3000x3000: 2.16 GB/s - cratio: 241.0x
Blosc2 - linspace - Size 6000x6000: 5.46 GB/s - cratio: 320.6x
Blosc2 - linspace - Size 9000x9000: 6.07 GB/s - cratio: 417.6x
Blosc2 - linspace - Size 12000x12000: 6.55 GB/s - cratio: 426.0x
Blosc2 - linspace - Size 15000x15000: 7.02 GB/s - cratio: 479.3x
Blosc2 - linspace - Size 18000x18000: 10.78 GB/s - cratio: 453.5x
Blosc2 - linspace - Size 21000x21000: 11.42 GB/s - cratio: 527.6x
Blosc2 - linspace - Size 24000x24000: 14.01 GB/s - cratio: 492.0x
Blosc2 - linspace - Size 27000x27000: 14.77 GB/s - cratio: 561.9x
Blosc2 - linspace - Size 30000x30000: 16.22 GB/s - cratio: 503.2x
Numexpr - linspace - Size 3000x3000: 18.15 GB/s
Numexpr - linspace - Size 6000x6000: 47.70 GB/s
Numexpr - linspace - Size 9000x9000: 48.58 GB/s
Numexpr - linspace - Size 12000x12000: 54.00 GB/s
Numexpr - linspace - Size 15000x15000: 62.39 GB/s
Numexpr - linspace - Size 18000x18000: 64.92 GB/s
Numexpr - linspace - Size 21000x21000: 65.72 GB/s
Numexpr - linspace - Size 24000x24000: 65.26 GB/s
Numexpr - linspace - Size 27000x27000: 67.60 GB/s
Numexpr - linspace - Size 30000x30000: 67.76 GB/s
        Command being timed: "numactl --cpunode=0 --membind=0 ./.venv/bin/python bench/ndarray/compute_dists2.py large"
        User time (seconds): 591.42
        System time (seconds): 348.34
        Percent of CPU this job got: 339%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:36.42
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 30987264
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 26513895
        Voluntary context switches: 179645
        Involuntary context switches: 9696
        Swaps: 0
        File system inputs: 0
        File system outputs: 1152
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

FrancescAlted · 2025-05-02T16:22:43Z

Ok. This is with main branch, or stock 3.3.2? It would be nice to see a comparison between both, as they make quite different L3 guesses.

sirfz · 2025-05-05T11:57:07Z

Those were with main, here are the results for 3.3.2:

Benchmarking constant distribution...
Blosc2 - constant - Size 3000x3000: 3.73 GB/s - cratio: 6280.5x
Blosc2 - constant - Size 6000x6000: 5.45 GB/s - cratio: 6306.9x
Blosc2 - constant - Size 9000x9000: 6.24 GB/s - cratio: 5681.0x
Blosc2 - constant - Size 12000x12000: 6.74 GB/s - cratio: 5051.2x
Blosc2 - constant - Size 15000x15000: 6.99 GB/s - cratio: 6314.4x
Blosc2 - constant - Size 18000x18000: 10.95 GB/s - cratio: 3788.8x
Blosc2 - constant - Size 21000x21000: 11.39 GB/s - cratio: 4420.3x
Blosc2 - constant - Size 24000x24000: 14.26 GB/s - cratio: 5051.6x
Blosc2 - constant - Size 27000x27000: 14.40 GB/s - cratio: 5683.1x
Blosc2 - constant - Size 30000x30000: 16.52 GB/s - cratio: 6314.4x
Numexpr - constant - Size 3000x3000: 28.70 GB/s
Numexpr - constant - Size 6000x6000: 46.40 GB/s
Numexpr - constant - Size 9000x9000: 48.29 GB/s
Numexpr - constant - Size 12000x12000: 51.97 GB/s
Numexpr - constant - Size 15000x15000: 60.65 GB/s
Numexpr - constant - Size 18000x18000: 64.06 GB/s
Numexpr - constant - Size 21000x21000: 64.17 GB/s
Numexpr - constant - Size 24000x24000: 64.99 GB/s
Numexpr - constant - Size 27000x27000: 65.89 GB/s
Numexpr - constant - Size 30000x30000: 66.33 GB/s

Benchmarking arange distribution...
Blosc2 - arange - Size 3000x3000: 2.08 GB/s - cratio: 5968.2x
Blosc2 - arange - Size 6000x6000: 4.96 GB/s - cratio: 5992.0x
Blosc2 - arange - Size 9000x9000: 6.28 GB/s - cratio: 5397.1x
Blosc2 - arange - Size 12000x12000: 6.59 GB/s - cratio: 4798.7x
Blosc2 - arange - Size 15000x15000: 6.75 GB/s - cratio: 5852.4x
Blosc2 - arange - Size 18000x18000: 10.96 GB/s - cratio: 3599.4x
Blosc2 - arange - Size 21000x21000: 11.10 GB/s - cratio: 4097.0x
Blosc2 - arange - Size 24000x24000: 14.03 GB/s - cratio: 4799.0x
Blosc2 - arange - Size 27000x27000: 14.32 GB/s - cratio: 5267.4x
Blosc2 - arange - Size 30000x30000: 16.52 GB/s - cratio: 5852.4x
Numexpr - arange - Size 3000x3000: 22.03 GB/s
Numexpr - arange - Size 6000x6000: 52.31 GB/s
Numexpr - arange - Size 9000x9000: 52.69 GB/s
Numexpr - arange - Size 12000x12000: 55.18 GB/s
Numexpr - arange - Size 15000x15000: 63.60 GB/s
Numexpr - arange - Size 18000x18000: 64.22 GB/s
Numexpr - arange - Size 21000x21000: 65.45 GB/s
Numexpr - arange - Size 24000x24000: 66.18 GB/s
Numexpr - arange - Size 27000x27000: 67.07 GB/s
Numexpr - arange - Size 30000x30000: 67.31 GB/s

Benchmarking linspace distribution...
Blosc2 - linspace - Size 3000x3000: 2.15 GB/s - cratio: 241.0x
Blosc2 - linspace - Size 6000x6000: 5.43 GB/s - cratio: 320.6x
Blosc2 - linspace - Size 9000x9000: 6.14 GB/s - cratio: 417.6x
Blosc2 - linspace - Size 12000x12000: 6.69 GB/s - cratio: 426.0x
Blosc2 - linspace - Size 15000x15000: 6.92 GB/s - cratio: 479.3x
Blosc2 - linspace - Size 18000x18000: 10.82 GB/s - cratio: 453.5x
Blosc2 - linspace - Size 21000x21000: 10.85 GB/s - cratio: 527.6x
Blosc2 - linspace - Size 24000x24000: 13.81 GB/s - cratio: 492.0x
Blosc2 - linspace - Size 27000x27000: 14.34 GB/s - cratio: 561.9x
Blosc2 - linspace - Size 30000x30000: 16.33 GB/s - cratio: 503.2x
Numexpr - linspace - Size 3000x3000: 18.74 GB/s
Numexpr - linspace - Size 6000x6000: 45.80 GB/s
Numexpr - linspace - Size 9000x9000: 47.38 GB/s
Numexpr - linspace - Size 12000x12000: 52.24 GB/s
Numexpr - linspace - Size 15000x15000: 60.96 GB/s
Numexpr - linspace - Size 18000x18000: 63.57 GB/s
Numexpr - linspace - Size 21000x21000: 64.86 GB/s
Numexpr - linspace - Size 24000x24000: 65.57 GB/s
Numexpr - linspace - Size 27000x27000: 65.34 GB/s
Numexpr - linspace - Size 30000x30000: 66.30 GB/s
        Command being timed: "numactl --cpunode=0 --membind=0 ./.venv/bin/python bench/ndarray/compute_dists2.py large"
        User time (seconds): 585.55
        System time (seconds): 327.29
        Percent of CPU this job got: 347%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:22.49
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 30937596
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 9
        Minor (reclaiming a frame) page faults: 26464104
        Voluntary context switches: 167039
        Involuntary context switches: 9807
        Swaps: 0
        File system inputs: 2248
        File system outputs: 1744
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

FrancescAlted closed this as completed in 78db27e May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Could not build empty array #392

RuntimeError: Could not build empty array #392

sirfz commented Apr 30, 2025

FrancescAlted commented Apr 30, 2025 •

edited

Loading

FrancescAlted commented Apr 30, 2025 •

edited

Loading

sirfz commented Apr 30, 2025

FrancescAlted commented Apr 30, 2025

sirfz commented Apr 30, 2025

FrancescAlted commented May 1, 2025

FrancescAlted commented May 1, 2025

sirfz commented May 1, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025 •

edited

Loading

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 5, 2025

RuntimeError: Could not build empty array #392

RuntimeError: Could not build empty array #392

Comments

sirfz commented Apr 30, 2025

FrancescAlted commented Apr 30, 2025 • edited Loading

FrancescAlted commented Apr 30, 2025 • edited Loading

sirfz commented Apr 30, 2025

FrancescAlted commented Apr 30, 2025

sirfz commented Apr 30, 2025

FrancescAlted commented May 1, 2025

FrancescAlted commented May 1, 2025

sirfz commented May 1, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025 • edited Loading

sirfz commented May 2, 2025

FrancescAlted commented May 2, 2025

sirfz commented May 5, 2025

FrancescAlted commented Apr 30, 2025 •

edited

Loading

FrancescAlted commented Apr 30, 2025 •

edited

Loading

FrancescAlted commented May 2, 2025 •

edited

Loading