Skip to content

aristo: switch to vector memtable #3447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 4, 2025
Merged

aristo: switch to vector memtable #3447

merged 1 commit into from
Jul 4, 2025

Conversation

arnetheduck
Copy link
Member

Every time we persist, we collect all changes into a batch and write that batch to a memtable which rocksdb lazily will write to disk using a background thread.

The default implementation of the memtable in rocksdb is a skip list which can handle concurrent writes while still allowing lookups. We're not using concurrent inserts and the skip list comes with significant overhead both when writing and when reading.

Here, we switch to a vector memtable which is faster to write but terrible to read. To compensate, we then proceed to flush the memtable eagerly to disk which is a blocking operation.

One would think that the blocking of the main thread this would be bad but it turns out that creating the skip list, also a blocking operation, is even slower, resulting in a net win.

Coupled with this change, we also make the "lower" levels bigger effectively reducing the average number of levels that must be looked at to find recently written data. This could lead to some write amplicification which is offset by making each file smaller and therefore making compactions more targeted.

Taken together, this results in an overall import speed boost of about 3-4%, but above all, it reduces the main thread blocking time during persist.

pre (for 8k blocks persisted around block 11M):

DBG 2025-07-03 15:58:14.053+02:00 Core DB persisted
kvtDur=8ms182us947ns mptDur=4s640ms879us492ns endDur=10s50ms862us669ns
stateRoot=none()

post:

DBG 2025-07-03 14:48:59.426+02:00 Core DB persisted
kvtDur=12ms476us833ns mptDur=4s273ms629us840ns endDur=3s331ms171us989ns
stateRoot=none()

Every time we persist, we collect all changes into a batch and write
that batch to a memtable which rocksdb lazily will write to disk using a
background thread.

The default implementation of the memtable in rocksdb is a skip list
which can handle concurrent writes while still allowing lookups. We're
not using concurrent inserts and the skip list comes with significant
overhead both when writing and when reading.

Here, we switch to a vector memtable which is faster to write but
terrible to read. To compensate, we then proceed to flush the memtable
eagerly to disk which is a blocking operation.

One would think that the blocking of the main thread this would be bad
but it turns out that creating the skip list, also a blocking operation,
is even slower, resulting in a net win.

Coupled with this change, we also make the "lower" levels bigger
effectively reducing the average number of levels that must be looked at
to find recently written data. This could lead to some write
amplicification which is offset by making each file smaller and
therefore making compactions more targeted.

Taken together, this results in an overall import speed boost of about
3-4%, but above all, it reduces the main thread blocking time during
persist.

pre (for 8k blocks persisted around block 11M):
```
DBG 2025-07-03 15:58:14.053+02:00 Core DB persisted
kvtDur=8ms182us947ns mptDur=4s640ms879us492ns endDur=10s50ms862us669ns
stateRoot=none()
```

post:
```
DBG 2025-07-03 14:48:59.426+02:00 Core DB persisted
kvtDur=12ms476us833ns mptDur=4s273ms629us840ns endDur=3s331ms171us989ns
stateRoot=none()
```
@arnetheduck arnetheduck merged commit 0eea2fa into master Jul 4, 2025
23 checks passed
@arnetheduck arnetheduck deleted the vector-memtable branch July 4, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant