Skip to content

Do not use pretty print for checkpoints as it might cause stack overflow #17167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: compatible
Choose a base branch
from

Conversation

dkijania
Copy link
Member

@dkijania dkijania commented May 8, 2025

When running replayer on mainnet archive we often receive stack overflow panic like below:

2025-05-08 13:32:28 UTC [Info] Checkpoint target was 718546, setting to 718596
  
(monitor.ml.Error ("Stack overflow")
 ("Raised by primitive operation at Stdlib.max in file \"stdlib.ml\", line 75, characters 17-23"
  "Called from Bi_outbuf.really_extend in file \"src/bi_outbuf.ml\", line 16, characters 12-34"
  "Called from Bi_outbuf.add_sub in file \"src/bi_outbuf.ml\", line 76, characters 2-14"
  "Called from Bi_outbuf.add_substring in file \"src/bi_outbuf.ml\" (inlined), line 80, characters 20-39"
  "Called from Yojson.finish_string in file \"write.ml\", line 25, characters 4-70"
  "Called from Stdlib.output_string in file \"stdlib.ml\", line 369, characters 2-47"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1906, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1906, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1906, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1906, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1906, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1908, characters 32-46"
  "Called from CamlinternalFormat.output_acc in file \"camlinternalFormat.ml\", line 1910, characters 32-46"
  "Called from Stdlib__Printf.kfprintf.(fun) in file \"printf.ml\", line 20, characters 26-42"
  "Called from Yojson.finish_string in file \"write.ml\", line 27, characters 4-93"
  "Called from Yojson.write_string in file \"write.ml\", line 50, characters 2-24"
  "Called from Yojson.json_string_of_string in file \"write.ml\", line 55, characters 2-19"
  "Called from Yojson.Pretty.format_field in file \"pretty.ml\", line 54, characters 24-52"
  "Called from Stdlib__List.map in file \"list.ml\", line 92, characters 20-23"
  "Called from Stdlib__List.map in file \"list.ml\", line 92, characters 32-39"
  "Called from Stdlib__List.map in file \"list.ml\", line 92, characters 32-39"
  "Called from Stdlib__List.map in file \"list.ml\", line 92, characters 32-39"
  "Called from Stdlib__List.map in file \"list.ml\", line 92, characters 32-39"

My least invasive solution is to skip pretty print on replayer checkpoints and output files

@dkijania
Copy link
Member Author

dkijania commented May 8, 2025

!ci-build-me

@dkijania dkijania marked this pull request as ready for review May 8, 2025 15:27
@dkijania dkijania self-assigned this May 8, 2025
@dkijania
Copy link
Member Author

dkijania commented May 8, 2025

!ci-build-me

@@ -604,7 +604,7 @@ let write_replayer_checkpoint ~logger ~ledger ~last_global_slot_since_genesis
let%map input =
create_replayer_checkpoint ~ledger ~start_slot_since_genesis
in
input_to_yojson input |> Yojson.Safe.pretty_to_string
input_to_yojson input |> Yojson.Safe.to_string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large is the json output? Do we need it at all, now it is non-pretty, but maybe it's worth dropping right away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need it as it's dumping checkpoint which is used then for next replayer execution in next run. The same goes for replayer output file. It is quite large for mainnet at it contains all accounts state at given slot. We didn't experience issues on different network.

@glyh
Copy link
Member

glyh commented May 9, 2025

We have logproc now, I assume it's possible to remove any pretty printing and leave any complexity of log readability to logproc when necessary?

There's also jq but it only accept a single json tree.

@dkijania
Copy link
Member Author

dkijania commented May 13, 2025

Unfortunately it's not about logging at all, but dumping json for next usage. We either dumping a checkpoint file for next replayer run or we are saving output file with final result both are treated as artifacts not log output

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

@dkijania
Copy link
Member Author

!ci-build-me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants