Skip to content

Conversation

bolinfest
Copy link
Collaborator

@bolinfest bolinfest commented Aug 22, 2025

This introduces a complementary set of tools, exec_command and write_stdin, which are designed to facilitate working with long-running processes in a token-efficient manner.

To test:

codex-rs$ just codex --cd .. --config experimental_use_exec_command_tool=true

Though the above alone is unlikely to convince the model to use the new tools because of this bit in the base instructions:

## Shell commands
When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
- Read files in chunks with a max chunk size of 250 lines. Do not use python scripts to attempt to output larger chunks of a file. Command line output will be truncated after 10 kilobytes or 256 lines of output, regardless of the command used.

So you probably need to add:

--config experimental_instructions_file=/some/other_instructions.md

where /some/other_instructions.md suggests using exec_command and write_stdin instead of shell.

let mut properties = BTreeMap::<String, JsonSchema>::new();
properties.insert(
"session_id".to_string(),
JsonSchema::String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it was defined as an int elsewhere (also we don't output a session ID from the exec outputs so the model hallcuinates one)

{"type":"function_call_output","call_id":"call_ZH0rYAapH2v1P5XNHOTGO7Fk","output":"failed to parse function arguments: invalid type: string \"e6d45cfc-81f2-44d6-b52b-6df1f8c3b9d0\", expected u32 at line 1 column 68"}

ResponseInputItem::FunctionCallOutput {
call_id,
output: FunctionCallOutputPayload {
content: text,
Copy link
Contributor

@hanson-openai hanson-openai Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will probably work best if the output is either in the form:

Wall time: {:.3g} seconds
Process exited with code {code}
# or, if running: Process running with session ID {session_id}
Warning: truncated output (original token count: {tokens})  # (if truncated)
Output:
{actual text output}

OR if it's more convenient to use json:

{
  "wall_time": round(wall_time, 3),
  "exit_code": code,
  "session_id": session_id, # should not be set if exit_code is set
  "original_token_count": count, # only if truncated
  # NOTE: you'll want to dump the JSON w/o whitespace to reduce token usage
  "output": {"1":line1,"2":line2,...}
}

Both exec_command and write_stdin can return the exact same shape of output

Comment on lines 56 to 57
// Cap by assuming 4 bytes per token (TODO: use a real tokenizer).
let cap_bytes_u64 = params.max_output_tokens.saturating_mul(4);
Copy link
Contributor

@hanson-openai hanson-openai Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the semantics of the internal implementation are actually to collect all output within yield_time_ms (uncapped) and then truncate the middle tokens/characters after receiving all the output.

Otherwise the agent has no real way of "flushing out" outputs if a command has a ton of logspew


ResponsesApiTool {
name: WRITE_STDIN_TOOL_NAME.to_owned(),
description: r#"Write characters to the stdin of an existing exec_command session."#
Copy link
Contributor

@hanson-openai hanson-openai Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference description:

Write characters to an exec session's stdin. Returns all stdout+stderr received within yield_time_ms.
Can write control characters (\u0003 for Ctrl-C), or an empty string to just poll stdout+stderr.

@bolinfest bolinfest force-pushed the pr2574 branch 4 times, most recently from e502e00 to a906dca Compare August 22, 2025 07:43
@bolinfest bolinfest marked this pull request as ready for review August 22, 2025 07:58
@bolinfest bolinfest force-pushed the pr2574 branch 3 times, most recently from fcfcf13 to 3b7acc3 Compare August 22, 2025 21:26
// Skip missed messages; continue collecting fresh output.
}
Ok(Err(tokio::sync::broadcast::error::RecvError::Closed)) => break,
Err(_) => break, // timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Python version I found that read() may throw a BlockingIOError sometimes, in which case i would wait a bit and try again https://docs.python.org/3/library/exceptions.html#BlockingIOError

not sure if that's a idiosyncracy from the python path, or is it handled internally by the rust standard libs somewhere?

Comment on lines 359 to 360
// Split budget between head and tail. We prefer to keep whole lines to
// avoid producing partial numeric tokens like "73".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines 399 to 455
if prefix_end > 0 {
out.push_str(&s[..prefix_end]);
}
if suffix_start > prefix_end && suffix_start < s.len() {
out.push_str(&s[suffix_start..]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to put some message in the middle so it's more visible where the truncation happened, e.g.

…{truncated} tokens truncated…

// Forward to broadcast; best-effort if there are subscribers.
let _ = output_tx_clone.send(buf[..n].to_vec());
}
Err(_) => break,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanson-openai I think this is the read that you are worried about BlockingIO. I'll add some logic.

@bolinfest
Copy link
Collaborator Author

@hanson-openai addressed your feedback and updated tests, but I'll also ask someone from Codex to take a look

Copy link
Collaborator

@dylan-hurd-oai dylan-hurd-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, will review more carefully tomorrow.

My main current thought is that an integration test would be really helpful here!

@bolinfest
Copy link
Collaborator Author

Did you miss the integration test in session_manager.rs?

@bolinfest bolinfest merged commit e3b03ea into main Aug 23, 2025
30 checks passed
@bolinfest bolinfest deleted the pr2574 branch August 23, 2025 01:10
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants