Skip to content

New version for reducing memory consumption #387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 18, 2025
Merged

New version for reducing memory consumption #387

merged 4 commits into from
Apr 18, 2025

Conversation

FrancescAlted
Copy link
Member

arange() and linspace() constructors should be better now, as they do not require temporaries.

This is still work in progress.

@FrancescAlted
Copy link
Member Author

FrancescAlted commented Apr 18, 2025

With this optimization, I have been able to create a 5 TB dataset (a linspace of 850_000 x 850_000, float64; see new bench/ndarray/array-constructor.py benchmark) on disk, using less than 200 MB of RAM. Prior to this, the RAM used exceeded 170 GB (so almost a 1000x better!). And the time to create is also very good (< 1 hour for creating the 10 TB array); this around a 10% faster than before. Here it is the output of the creation script:

*** Creating a blosc2 array with 722_500_000_000 elements (shape: (850000, 850000)) ***
Time: 3568.683 s - size: 5383.04 GB (1.51 GB/s) Storage required: 168356.28 MB (cratio: 32.7x)
        Command being timed: "python bench/ndarray/array-constructor.py"
        User time (seconds): 6226.12
        System time (seconds): 108.86
        Percent of CPU this job got: 177%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 59:28.81
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 196288
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 5
        Minor (reclaiming a frame) page faults: 55974941
        Voluntary context switches: 19291812
        Involuntary context switches: 906587
        Swaps: 0
        File system inputs: 920
        File system outputs: 344939944
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Another advantage of this PR is that constructors need no temporary space at all, except for blosc2.fromiter(), that still needs a temporary (but only on-disk, and that is deleted after completion).

@FrancescAlted FrancescAlted merged commit 109cebf into main Apr 18, 2025
10 checks passed
@FrancescAlted FrancescAlted deleted the tempfile2 branch April 18, 2025 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant