Skip to content

feat(gemini): Add support for Gemini 2.5 Thinking Budget #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2025

Conversation

btumbleson
Copy link
Contributor

@btumbleson btumbleson commented Apr 29, 2025

Description

In the preview Gemini 2.5 models, Google introduced "thinking", and is enabled by default on these models. Without this feature, using the new models will always include thinking, which is billed at a higher rate and not always desirable. This PR enables you to configure a thinkingBudget, which Gemini views a max amount of tokens that may be used for thinking, or it can be set to 0 to disable it altogether.

This PR:

  • Introduces a new thoughtTokens to the Usage class
  • Leverages the withProviderOptions() method to inject a thinkingBudget when passed as a key, value pair
  • Includes two tests that check for thoughtTokens in the Usage output
  • Updates the documentation for Gemini to include this functionality

Usage

$response = Prism::text()
			->using(Provider::Gemini, 'gemini-2.5-flash-preview')
			->withPrompt('Explain the concept of Occam\'s Razor and provide a simple, everyday example.')
			->withProviderOptions(['thinkingBudget' => 500])
			->asText();

You can set thinkingBudget to 0 to disable thinking altogether. Note this also works with structured.

Potential Discussion

Gemini has made the choice that thinking is on by default, so this PR follows that convention. As in, if you do not specify a thinkingBudget of 0, then thinking will be enabled. An alternative approach could assume that unless you include a thinkingBudget, that Prism can set this value to 0. For this to work effectively you would need to (re)introduce a ThinkingModeResolver, at least in the near term, as I expect all models will eventually gravitate towards this thinking convention.

@ChrisB-TL
Copy link
Contributor

Thank you!

Re: default, I think we should be consistent across providers if we can. For Anthropic, we have it default disabled so I'd be tempted to do the same here. (IMO its probably the more sensible default anyway?)

@@ -10,6 +10,7 @@ public function __construct(
public int $promptTokens,
public int $completionTokens,
public ?int $cacheWriteInputTokens = null,
public ?int $cacheReadInputTokens = null
public ?int $cacheReadInputTokens = null,
public ?int $thoughtTokens = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to add the thoughtTokens to the Response additionalContent property, as it is unique to Gemini.

What do you think @sixlive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go both ways, but I think it's likely that all major models will converge on having "thinking" be a capability of their model, so my thought with this here was to future proof a bit.

Copy link
Contributor

@ChrisB-TL ChrisB-TL Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at the other providers that have reasoning models. As always with providers... its a pretty 50:50 split whether they add thinking tokens:

  • OpenAI, Gemini, and xAI do; and
  • Anthropic and Deepseek don't.
  • Groq and Ollama - not sure.

I reckon there's enough that do to warrant adding.

@@ -440,3 +441,30 @@
expect($response->usage->cacheReadInputTokens)->toBe(88759);
});
});

describe('Thinking Mode for Gemini', function (): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add Http assertions that the correct payload is sent also? Conscious we don't have this enough in the code base (on the todo to add more), and we've been bitten by it a couple of times in refactors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. Actually found a bug while adding these tests!

Copy link

kinsta bot commented Apr 29, 2025

Preview deployments for prism ⚡️

Status Branch preview Commit preview
❌ Failed to deploy N/A N/A

Commit: e057c7097571c04b536aba8a7d9c609b63998926

Deployment ID: 8e2bd336-a4ed-4c3a-a917-6777636082e6

Static site name: prism-97nz9

Comment on lines 85 to 88
'thinkingBudget' => array_key_exists('thinkingBudget', $providerOptions)
? $providerOptions['thinkingBudget']
: null,
], fn($v) => $v !== null),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally messier than I wanted it to be, but just a simple array_filter will throw away a 0 value, which is actually valid.

Http::assertSent(function (Request $request) {
$data = $request->data();

expect($data['generationConfig'])->not->toHaveKey('thinkingConfig');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is actually purposeful inline with the default convention that we would expect to see thoughtTokens in the response without specifying a thinkingConfig

@btumbleson btumbleson requested a review from ChrisB-TL April 29, 2025 20:15
@ChrisB-TL
Copy link
Contributor

Thank you!

Re: default, I think we should be consistent across providers if we can. For Anthropic, we have it default disabled so I'd be tempted to do the same here. (IMO its probably the more sensible default anyway?)

Sorry, you may have missed this comment as I didn't include in the body of my review!

@btumbleson
Copy link
Contributor Author

Thank you!

Re: default, I think we should be consistent across providers if we can. For Anthropic, we have it default disabled so I'd be tempted to do the same here. (IMO its probably the more sensible default anyway?)

Sorry, you may have missed this comment as I didn't include in the body of my review!

I can make this change, but have two hesitations:

  1. This functionality is enabled via the withProviderOptions which somewhat implies it's provider-specific. Deviating from how Google handles this (which is on by default) feels counter-intuitive. If the method was withThinkingBudget then I think I'd agree with the need to better standardize against all providers.
  2. Setting the thinking budget to 0 by default will require a ThinkingModeResolver which as of today, only supports a single model. This will introduce incremental maintainence. Not major, but I'm conscious of this. Similar to how earlier iterations of Prism, there was a StructuredModeResolver in more providers but now has gotten removed.

Just let me know your thoughts on how we want to proceed.

@btumbleson
Copy link
Contributor Author

@sixlive - anything you're looking to see here to merge?

@sixlive
Copy link
Contributor

sixlive commented May 4, 2025

Just need to sit down and give it a thorough review. Probably later today or tomorrow.

@pushpak1300
Copy link
Contributor

So @btumbleson if i don't specifically set the thinking budget the request will be sent without thinking budget ? Right?

@btumbleson
Copy link
Contributor Author

btumbleson commented May 4, 2025

So @btumbleson if i don't specifically set the thinking budget the request will be sent without thinking budget ? Right?

Other way around. If you don't specify a thinking budget on a model that support it (e.g. Gemini 2.5 Flash Preview), then it will include thinking, just with an unspecified budget (you're effectively letting the model decide how much it needs to "think"). This matches Gemini's default behavior: https://ai.google.dev/gemini-api/docs/thinking#use-thinking-models

The "thoughts" themselves are not returned via the API, only a count of the tokens used. As thinking tokens are billed at a higher rate, you may not want thinking enabled, or may wish to limit the amount of tokens that can be used to "think". Both of which you can accomplish with setting thinkingBudget

Copy link
Contributor

@sixlive sixlive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SICK!!! Thank you so much!

@sixlive sixlive merged commit b028f84 into prism-php:main May 12, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants