GH-135379: Top of stack caching for the JIT. #135465

markshannon · 2025-06-13T13:11:56Z

The stats need fixing and the generated tables could be more compact, but it works.

Issue: Top-of-stack caching in the JIT #135379

Fidget-Spinner

This is really cool. I'll do a full review soon enough.

Python/optimizer.c

markshannon · 2025-06-20T13:15:39Z

Performance is in the noise, but we would need a really big speed up of jitted code for it to be more than noise overall.

The nbody benchmark, which spends a lot of time in the JIT shows a 13-18% speedup, except on Mac where it shows no speedup.
I don't know why that would be as I think we are using stock LLVM for Mac, not the Apple compiler.

Fidget-Spinner · 2025-06-20T13:21:10Z

The nbody benchmark, which spends a lot of time in the JIT shows a 13-18% speedup, except on Mac where it shows no speedup. I don't know why that would be as I think we are using stock LLVM for Mac, not the Apple compiler.

Nice. We use Apple's Compiler for the interpreter, though the JIT uses stock LLVm. Thomas previously showed that the version of the Apple compiler we use is subject to huge fluctuations in performance due to a PGO bug.

Fidget-Spinner · 2025-06-20T15:07:58Z

Misc/NEWS.d/next/Core_and_Builtins/2025-06-20-16-03-59.gh-issue-135379.eDg89T.rst

+Implement a limited form of register allocation know as "top of stack
+caching" in the JIT. It works by keeping 0-3 of the top items in the stack
+in registers. The code generator generates multiple versions of thos uops
+that do not escape and are relatively small. During JIT compilation, the
+copy that produces the least memory traffic is selected, spilling or
+reloading values when needed.


Suggested change

Implement a limited form of register allocation know as "top of stack

caching" in the JIT. It works by keeping 0-3 of the top items in the stack

in registers. The code generator generates multiple versions of thos uops

that do not escape and are relatively small. During JIT compilation, the

copy that produces the least memory traffic is selected, spilling or

reloading values when needed.

Implement a limited form of register allocation known as "top of stack

caching" in the JIT. It works by keeping 0-3 of the top items in the stack

in registers. The code generator generates multiple versions of those uops

that do not escape and are relatively small. During JIT compilation, the

copy that produces the least memory traffic is selected, spilling or

reloading values when needed.

Fidget-Spinner

I need to review the cases generator later.

Fidget-Spinner · 2025-06-20T16:08:02Z

Misc/NEWS.d/next/Core_and_Builtins/2025-06-13-13-32-16.gh-issue-135379.pAxZgy.rst

@@ -0,0 +1,3 @@
+Implement top-of-stack caching for the JIT (and tier 2 interpreter). Reduces


Why is there a second news file?

Fidget-Spinner · 2025-06-20T16:13:56Z

Python/optimizer.c

+static int
+get_exit_depth(_PyUOpInstruction *inst)


Can you write a short snippet on what this does? It's rather confusing otherwise. IIUC, it finds what is the number of "used" registers on exit right?

Fidget-Spinner · 2025-06-20T16:14:25Z

Python/optimizer.c

+    if (_PyUop_Caching[base_opcode].exit_depth_is_output) {
+        return input + _PyUop_Caching[base_opcode].delta;


What does this do?

Fidget-Spinner · 2025-06-20T16:16:21Z

Python/optimizer.c

+static int
+stack_allocate(_PyUOpInstruction *buffer, int length)
+{
+    for (int i = length-1; i >= 0; i--) {


To my understanding, this is due to the possibility of needing to insert a spill between every instruction right, so you need to reserve 2N number of instructions?

Fidget-Spinner · 2025-06-20T16:19:44Z

Tools/cases_generator/analyzer.py

+    if ideal_inputs > 3:
+        ideal_inputs = 3
+    if ideal_outputs > 3:
+        ideal_outputs = 3


Can you move the value 3 to a global magic number so that we can play around with increasing/decreasing register counts in the future?

Fidget-Spinner · 2025-06-20T16:20:03Z

Tools/cases_generator/analyzer.py

+    #if has_exit and ideal_inputs != ideal_outputs:
+    #    n = min(ideal_inputs, ideal_outputs)
+    #    yield n, n
+    #    return


Let's remove this.

markshannon added 6 commits June 12, 2025 14:19

Tier 2 TOS caching. Work in progress

579b758

Tier 2 TOS caching, working for interpreter.

489e510

Get JIT working

f603929

Fix tool to support 3.11

cf1d7ab

Add news

efd7a0a

int arithmetic doesn't escape

bb4e6b9

bedevere-app bot mentioned this pull request Jun 13, 2025

Top-of-stack caching in the JIT #135379

Open

Fidget-Spinner reviewed Jun 13, 2025

View reviewed changes

Python/optimizer.c Show resolved Hide resolved

markshannon added 10 commits June 13, 2025 14:48

Repair stats

e976b9b

Add missing type annotation

11de93e

Pacify mypy

4698695

Add type annotation

33837a7

Avoid overflow gathering stats

8bb12ef

Reduce spilling

920e6de

Merge branch 'main' into tier-2-tos-caching

45e1abd

Merge branch 'main' into tier-2-tos-caching

3d72871

Merge branch 'main' into tier-2-tos-caching

0240115

Improve heuristics for stack caching

2850d72

markshannon force-pushed the tier-2-tos-caching branch from 78489ea to 2850d72 Compare June 19, 2025 14:49

Merge branch 'main' into tier-2-tos-caching

1c291f1

Add news

ba2331a

markshannon marked this pull request as ready for review June 20, 2025 15:04

markshannon requested review from brandtbucher and savannahostrowski as code owners June 20, 2025 15:04

bedevere-app bot added the awaiting core review label Jun 20, 2025

Fidget-Spinner reviewed Jun 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GH-135379: Top of stack caching for the JIT. #135465

GH-135379: Top of stack caching for the JIT. #135465

markshannon commented Jun 13, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

Fidget-Spinner left a comment

Uh oh!

Uh oh!

markshannon commented Jun 20, 2025

Uh oh!

Fidget-Spinner commented Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner left a comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,3 @@
		Implement top-of-stack caching for the JIT (and tier 2 interpreter). Reduces

		if (_PyUop_Caching[base_opcode].exit_depth_is_output) {
		return input + _PyUop_Caching[base_opcode].delta;

		static int
		get_exit_depth(_PyUOpInstruction *inst)

Uh oh!

GH-135379: Top of stack caching for the JIT. #135465

Are you sure you want to change the base?

GH-135379: Top of stack caching for the JIT. #135465

Conversation

markshannon commented Jun 13, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon commented Jun 20, 2025

Uh oh!

Fidget-Spinner commented Jun 20, 2025

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon commented Jun 13, 2025 •

edited by bedevere-app bot

Loading