-
Notifications
You must be signed in to change notification settings - Fork 5.6k
feat: StreamableShell with exec_command and write_stdin tools #2574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
let mut properties = BTreeMap::<String, JsonSchema>::new(); | ||
properties.insert( | ||
"session_id".to_string(), | ||
JsonSchema::String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it was defined as an int elsewhere (also we don't output a session ID from the exec outputs so the model hallcuinates one)
{"type":"function_call_output","call_id":"call_ZH0rYAapH2v1P5XNHOTGO7Fk","output":"failed to parse function arguments: invalid type: string \"e6d45cfc-81f2-44d6-b52b-6df1f8c3b9d0\", expected u32 at line 1 column 68"}
ResponseInputItem::FunctionCallOutput { | ||
call_id, | ||
output: FunctionCallOutputPayload { | ||
content: text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will probably work best if the output is either in the form:
Wall time: {:.3g} seconds
Process exited with code {code}
# or, if running: Process running with session ID {session_id}
Warning: truncated output (original token count: {tokens}) # (if truncated)
Output:
{actual text output}
OR if it's more convenient to use json:
{
"wall_time": round(wall_time, 3),
"exit_code": code,
"session_id": session_id, # should not be set if exit_code is set
"original_token_count": count, # only if truncated
# NOTE: you'll want to dump the JSON w/o whitespace to reduce token usage
"output": {"1":line1,"2":line2,...}
}
Both exec_command
and write_stdin
can return the exact same shape of output
// Cap by assuming 4 bytes per token (TODO: use a real tokenizer). | ||
let cap_bytes_u64 = params.max_output_tokens.saturating_mul(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the semantics of the internal implementation are actually to collect all output within yield_time_ms
(uncapped) and then truncate the middle tokens/characters after receiving all the output.
Otherwise the agent has no real way of "flushing out" outputs if a command has a ton of logspew
|
||
ResponsesApiTool { | ||
name: WRITE_STDIN_TOOL_NAME.to_owned(), | ||
description: r#"Write characters to the stdin of an existing exec_command session."# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference description:
Write characters to an exec session's stdin. Returns all stdout+stderr received within yield_time_ms.
Can write control characters (\u0003 for Ctrl-C), or an empty string to just poll stdout+stderr.
e502e00
to
a906dca
Compare
fcfcf13
to
3b7acc3
Compare
// Skip missed messages; continue collecting fresh output. | ||
} | ||
Ok(Err(tokio::sync::broadcast::error::RecvError::Closed)) => break, | ||
Err(_) => break, // timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Python version I found that read()
may throw a BlockingIOError sometimes, in which case i would wait a bit and try again https://docs.python.org/3/library/exceptions.html#BlockingIOError
not sure if that's a idiosyncracy from the python path, or is it handled internally by the rust standard libs somewhere?
// Split budget between head and tail. We prefer to keep whole lines to | ||
// avoid producing partial numeric tokens like "73". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
if prefix_end > 0 { | ||
out.push_str(&s[..prefix_end]); | ||
} | ||
if suffix_start > prefix_end && suffix_start < s.len() { | ||
out.push_str(&s[suffix_start..]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to put some message in the middle so it's more visible where the truncation happened, e.g.
…{truncated} tokens truncated…
// Forward to broadcast; best-effort if there are subscribers. | ||
let _ = output_tx_clone.send(buf[..n].to_vec()); | ||
} | ||
Err(_) => break, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanson-openai I think this is the read that you are worried about BlockingIO. I'll add some logic.
@hanson-openai addressed your feedback and updated tests, but I'll also ask someone from Codex to take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock, will review more carefully tomorrow.
My main current thought is that an integration test would be really helpful here!
Did you miss the integration test in session_manager.rs? |
This introduces a complementary set of tools,
exec_command
andwrite_stdin
, which are designed to facilitate working with long-running processes in a token-efficient manner.To test:
Though the above alone is unlikely to convince the model to use the new tools because of this bit in the base instructions:
codex/codex-rs/core/prompt.md
Lines 266 to 271 in e4c275d
So you probably need to add:
where
/some/other_instructions.md
suggests usingexec_command
andwrite_stdin
instead ofshell
.