Skip to content

Do not hash accounts when calculating capitalization #6867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 8, 2025

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Jul 7, 2025

Problem

When creating a snapshot with ledger-tool, we (re)calculate the capitalization. Under the hood this uses the merkle-based accounts hash calculation. This has the following problems:

  1. It's slow. We don't need to hash each account, nor load account data from disk, nor build the merkle tree, nor compute the merkle root.

  2. Once all the lattice hash features are enabled, there will be no more merkle-based accounts hash calculations. At that time we'd like to remove the old code. We cannot remove the code yet, since calculating capitalization uses it.

Summary of Changes

Reimplement calculate_capitalization() to not use any of the merkle-based accounts hash calculation code.

Results

I ran ledger-tool capitalization against a mnb snapshot to time how long it took to calculate the capitalization with this pr vs master:

runtime (s) 🏁
master 540 🐢
this pr 110 🏎️

@brooksprumo brooksprumo self-assigned this Jul 7, 2025
@brooksprumo brooksprumo force-pushed the capitalization/calculate branch from d6d9a5b to bde1b4d Compare July 7, 2025 20:08
@brooksprumo brooksprumo changed the title Do not use hash accounts when calculating capitalization Do not hash accounts when calculating capitalization Jul 7, 2025
@brooksprumo brooksprumo marked this pull request as ready for review July 7, 2025 20:43
@brooksprumo brooksprumo requested review from HaoranYi and roryharr July 7, 2025 20:43
Comment on lines 6287 to 6328
pub fn calculate_capitalization_at_startup_from_index(
&self,
ancestors: &Ancestors,
startup_slot: Slot,
) -> u64 {
self.accounts_index
.account_maps
.par_iter()
.map(|accounts_index_bin| {
accounts_index_bin
.keys()
.into_iter()
.map(|pubkey| {
self.accounts_index
.get_with_and_then(
&pubkey,
Some(ancestors),
Some(startup_slot),
false,
|(slot, account_info)| {
(!account_info.is_zero_lamport()).then(|| {
self.get_account_accessor(
slot,
&pubkey,
&account_info.storage_location(),
)
.get_loaded_account(|loaded_account| {
loaded_account.lamports()
})
// SAFETY: The index said this pubkey exists, so
// there must be an account to load.
.unwrap()
})
},
)
.flatten()
.unwrap_or(0)
})
.sum::<u64>()
})
.sum()
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the main change. The impl mirrors calculate_accounts_lt_hash_at_startup_from_index(), which is two functions above.

@codecov-commenter
Copy link

codecov-commenter commented Jul 7, 2025

Codecov Report

Attention: Patch coverage is 99.02913% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.3%. Comparing base (4412582) to head (219e815).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6867   +/-   ##
=======================================
  Coverage    83.3%    83.3%           
=======================================
  Files         853      853           
  Lines      378386   378440   +54     
=======================================
+ Hits       315324   315377   +53     
- Misses      63062    63063    +1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/// account-by-account, summing each account's balance.
///
/// Only intended to be called at startup by ledger-tool or tests.
/// (cannot be made DCOU due to snapshot-minimizer)
pub fn set_capitalization(&self) -> u64 {
Copy link

@roryharr roryharr Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do in this PR, but please rename this function! Made this change more confusing then it needed to be.
maybe recalculate_capitalization?
recalculate_and_update_capitalization?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%. It took everything for me to not refactor it all immediately within this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #6883.

&pubkey,
&account_info.storage_location(),
)
.get_loaded_account(|loaded_account| {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't get_loaded_account get data as well?
What is the speedup if you create an accessor that skips loading data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this impl does currently load account data. There's not a way with LoadedAccountAccessor to not load the account data, so it'll be a bigger change.

I propose that we keep this PR as-is, and then have a follow-up for loading accounts without data and using it here. This will keep both PRs smaller/focused.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@brooksprumo brooksprumo requested a review from roryharr July 8, 2025 12:29
@@ -6322,9 +6324,10 @@ impl AccountsDb {
.flatten()
.unwrap_or(0)
})
.sum::<u64>()
.try_fold(0, u64::checked_add)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, we no longer call update_accounts_hash_with_verify in ledger-tool.

Is that expected that we don't need to check accounts hash in ledger-tool? or the accounts hash is verified in other place than calculating the capital?

@brooksprumo
Copy link
Author

With this PR, we no longer call update_accounts_hash_with_verify in ledger-tool.

Is that expected that we don't need to check accounts hash in ledger-tool? or the accounts hash is verified in other place than calculating the capital?

Good eye! When creating a snapshot with ledger-tool, the accounts hash is also calculated as part of bank_to_xxx_snapshot_archive(), so we don't also need to call it when calculating capitalization.

@brooksprumo brooksprumo requested a review from HaoranYi July 8, 2025 15:42
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@brooksprumo brooksprumo merged commit f12b99a into anza-xyz:master Jul 8, 2025
41 checks passed
@brooksprumo brooksprumo deleted the capitalization/calculate branch July 8, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants