Skip to content

Conversation

@hanishkvc
Copy link
Contributor

@hanishkvc hanishkvc commented Nov 10, 2025

This adds the initial basic skeleton for supporting vision models going forward to the tools/server/public_simplechat alternate web client ui. Basic tested with Gemma3 and Qwen3VL

This builds on the previous PR in this series ie #17038 which completed the current initial go at tool calling which also includes a set of client side based builtin and bundled tool calls for code execution, data storage, web access, web search, pdf and xml use, ... as well as support for looking into reasoning/chainofthought of the models which share the same.

Instead of automatically calling any requested tool by the GenAi
/ llm, that is from the tail end of the handle user submit btn
click,

Now if the GenAi/LLM has requested any tool to be called, then
enable the Tool Run related UI elements and fill them with the
tool name and tool args.

In turn the user can verify if they are ok with the tool being
called and the arguments being passed to it. Rather they can
even fix any errors in the tool usage like the arithmatic expr
to calculate that is being passed to simple_calculator or the
javascript code being passed to run_javascript_function_code

If user is ok with the tool call being requested, then trigger
the same.

The results if any will be automatically placed into the user
query text area.

User can cross verify if they are ok with the result and or
modify it suitabley if required and inturn submit the same to
the GenAi/LLM.
Also avoid showing Tool calling UI elements, when not needed to
be shown.
So that it can be used from different modules, if required.
Try ensure as well as verify that original console.log is saved
and not overwritten. Throw an exception if things seem off wrt
same.

Also ensure to add a newline at end of console.log messages
The request for code to run as well as the resultant response data
both need to follow a structured object convention, so that it is
easy to map a request and the corresponding response to some extent.
These no longer need to worry about

* setting up the console.log related redirection to capture
  the generated outputs, nor about
* setting up a dynamic function for executing the needed
  tool call related code

The web worker setup to help run tool calls in a relatively
isolated environment independent of the main browser env,
takes care of these.

One needs to only worry about getting the handle to the
web worker to use and inturn pass the need code wrt the
tool call to it.
tools manager/module

* setup the web worker that will help execute the tool call related
  codes in a js environment that is isolated from the browsers main
  js environment

* pass the web worker to the tool call providers, for them to use

* dont wait for the result from the tool call, as it will be got
  later asynchronously through a message

* allow users of the tools manager to register a call back, which
  will be called when ever a message is got from the web worker
  containing response wrt previously requested tool call execution.

simplechat

* decouple toolcall response handling and toolcall requesting logic

* setup a timeout to take back control if tool call takes up too
  much time. Inturn help alert the ai model, that the tool call
  took up too much time and so was aborted, by placing a approriate
  tagged tool response into user query area.

* register a call back that will be called when response is got
  asynchronously wrt anye requested tool calls.
  In turn take care of updating the user query area with response
  got wrt the tool call, along with tool response tag around it.
Had forgotten to specify type as module wrt web worker, in order
to allow it to import the toolsconsole module.

Had forgotten to maintain the id of the timeout handler, which is
needed to clear/stop the timeout handler from triggering, if tool
call response is got well in time.

As I am currently reverting the console redirection at end of
handling a tool call code in the web worker message handler, I
need to setup the redirection each time. Also I had forgotten
to clear the console.log capture data space, before a new tool
call code is executed, this is also fixed by this change.

TODO: Need to abort the tool call code execution in the web worker
if possible in future, if the client / browser side times out
waiting for tool call response, ie if the tool call code is taking
up too much time.
As the tool calling, if enabled, will need access to last few
user query and ai assistant responses (which will also include
in them the tool call requests and the corresponding results),
so that the model can build answers based on its tool call reqs
and got responses, and also given that most of the models these
days have sufficiently large context windows, so the sliding
window context implemented by SimpleChat logic has been increased
by default to include last 4 query and their responses roughlty.
Modify the constructor, newFrom and clear towards this goal.
Rename ChatMessage to ChatMessageEx.

Add typedefs for NSToolCall and NSChatMessage, they represent the
way the corresponding data is structured in network hs.

Add logic to build the ChatMessageEx from data got over network in
streaming mode.
Update HasToolCalls and ContentEquiv to work with new structure
Use the equivalent update_stream directly added to ChatMessageEx.

update_stream is also more generic to some extent and also directly
implemented by the ChatMessageEx class.
response_extract logic moved directly into ChatMessageEx as update
oneshot, with suitable adjustments. Inturn use the same directly.
these have been updated to work with ChatMessageEx to an extent
GetSystemLatest and its users updated wrt ChatMessageEx.

RecentChat updated wrt ChatMessageEx. Also now irrespective of
whether full history is being retrieved or only a subset, both
cases refer to the ChatMessageEx instances in SimpleChat.xchat
without creating new instances of anything.
Simplify Add semantic by expecting any validation of stuff before
adding to be done by the callers of Add and not by add itself.

Also update it to expect ChatMessageEx object

Update all users of add to follow the new syntax and semantic.

Remove the old and ununsed AddSysPromptOnlyAtBegin helper
Users of recent_chat updated to work with ChatMessageEx

As part of same recent_chat_ns also added, for the case where the
array of chat messages can be passed as is ie in the chat mode,
provided it has only the network handshake representation of the
messages.
wrt ChatMessageEx related required flow as well as avoid warnings
Use HTMLElement's dataset to maintain tool call id along with
the element which maintains the toolname.

Pass it along to the tools manager and inturn the actual tool
calls and through them to the web worker handling the tool call
related code and inturn returning it back as part of the obj
which is used to return the tool call result.

Embed the tool call id, function name and function result into
the content field of chat message in terms of a xml structure

Also make use of tool role to send back the tool call result.
Do note that currently the id, name and content are all embedded
into the content field of the tool role message sent to the
ai engine on the server.

NOTE: Use the user query entry area for showing tool call result
in the above mentioned xml form, as well as for user to enter
their own queries. Based on presence of the xml format data at
beginning the logic will treat it has a tool result and if not
then as a normal user query.

The css has been updated to help show tool results/msgs in a
lightyellow background
Expand the xml format id, name and content in content field of
tool result into apropriate fields in the tool result message sent
to the genai/llm engine on the server.
these common helpers avoid needing ignore tagging to ts-check, in
places where valid constructs have been used which go beyond strict
structured js handling that is tried to be achieved using it, but
are still valid and legal.
Also update the sliding window context size to last 9 chat messages
so that there is a sufficiently large context for multi turn tool
calls based adjusting by ai and user, without needing to go full
hog, which has the issue of overflowing the currently set context
window wrt the loaded ai model.
This increaments before itself, but we need to increment after
Pass a list to keep track of the numbering at different depths
as well as to delay incrementing the numbering to the last min

Dont let recursion go beyond a predefined limit
Switch from empty strings or empty list and so to undefined.

undefined will be treated by Javascript and JSON to mean, not even
instantiated and also dont instantiate the same
NSChatMessage implements the undefined based flow and provides
helpers to check if any of the field like content or reasoning
or tool_calls is available in the chat message or not.

It also provides helpers to get the corresponding fields.

ChatMessageEx updated to make use of NSChatMessage.
Remove the load from disk support that was previously retained
wrt the old on-disk-storage format for the chat session messages.

Make a note to allow non ai server handshake roles to be maintained
wrt NSChatMessage. Add helper to cross check if a message belongs
to such temp roles or not.

Update SimpleChat class to make use of the new NSChatMessage based
needed flow.
TODO: individual tool/function calls from tool_calls field, accessed
using different methods in different places for now. Need to think
on which is the best method to retain and use everywhere and or
retain things as is.
Add a new static helper to create a ChatMessageEx for a given
tool response data.

Use the same when storing tool response in the chat session msgs
list.

This inturn avoids the need for creating a xml string with all
the fields corresponding to tool response. So also no need to
extract the individiual tool response fields from the all-in-one
xml string and populate the tool response fields in the network
structure equivalent ns data structure, when recent_chat_ns is
called.
Given that the user query box no longer includes the special xml
string wrt tool response data (TOOL.TEMP ROLE), so now instead set
a special attribute to indicate that user query/input box is
maintaining a tool response.

For regular Tool responses in the chat session, now show the tool
call id and tool name before the tool response data (ie content
field).
Had forgotten to update these two functions wrt the tool response
related new fields. This is fixed now.

Also show tool-call-id and tool-name to end user as part of chat
message showing.

ALERT on disk structure change old saves wont work esp wrt tool
responses
Allow for empty tool call results

Block no content response from user role only.

Also change for console.debug to console.log so people can
see the blocking of empty response from user, in the browser
console.
Use css conditional attribute styling to change background color
of the user input textarea to match the tool role message block
color, when the user input textarea is in the TOOL.TEMP mode

With this user can know that the user input area is currently
showing and accepting tool result data for submission.
Currently the logic doesnt allow user to send a empty message to
ai, during their term. Previously this path wasnt directly alerting
the end user. Now it informs the end user using placeholder property
so they can see the alert, while also ensuring that once user enters
something, the alert wont interfere.

The logic takes care of saving any original placeholder, so that
the same is restored, when user switches sessions.
Avoid directly accessing content field, from any place other than
where it is absolutely requried.

Add a bOverwrite field to the content_adj helper, so that one can
overwrite instead of appending passed content to whats already in.

* this is currently used only wrt
  * promote_tooltemp helper
  * trim garbage helper
* the oneshot could ideally use overwriting, but currently
  not doing as this flow will occur only once per message

Add a image_url field for the image url with image data in dataurl
format with base64 encoded image data.
If I cant control the look of the file type input, I may have to
hide it and use a normal button, which chains into file selection
or so
Also rename the id/label of InFile+Btn to Image.

Extra fields while Adding.
@hanishkvc hanishkvc changed the title server/public_simplechat vision (wip), toolcall (done, with 0 cost builtin tools+), reasoing(done) server/public_simplechat vision (wip), toolcall (done, with 0 setup clientside builtin tools+), reasoing(done) Nov 10, 2025
@github-actions github-actions bot added examples python python script changes server labels Nov 10, 2025
Add a new helper to create a file type input which includes a btn
with image. Use same wrt the user image selection button.

Update button creation helper to show innerText only if the newly
added innerHTML arg is undefined.

When ever user makes a image selection, the image will be shown
in the input-filetype-image-button. In turn when the same is
submitted to ai engine server, the image will be cleared.
There can be issue with chat.add->chat.save, in that trying to
store into localStorage or so can raise exception, like quota
exceeded and so.

So now trap chat.add also and inturn for now take care of clearing
image state while also trapping and rethrowing a new error which
identifies the above location, as well as tracks the original err
Move all dataUrl handling into helper functions.

So that its manipulation is done in a controlled manner, as well as
in future, changes to the semantic can be easily carried out by
updating the helper functions suitably and inturn updating the caller
as needed.

For now avoid push and pop and work with 0th index directly, given
that currently the logic is setup for handling only a single image
with the ai model. This keeps things simple. It can be changed if
required in future easily.
If a caught error had chained in details about what triggered it
in the 1st place, then show it also to user.
@hanishkvc
Copy link
Contributor Author

hanishkvc commented Nov 10, 2025

a basic / simple vision flow should work now.

  • user can select a image to be loaded/passed to ai (the same can be viewed directly in the image load button itself, once loaded from filesystem)
    • they can change the image to pass if reqd by clicking the image button to load a new image
  • the same is handshaked with the ai server using openai http handshake format of array of items containing text and image_url items in the array as needed.
  • any reasoning generated by ai model when analysising the image will be shown to user (rather implicit in the flow from before) as well as the response generated.
  • show the user query and associated image in the chat session view

Given the limit of around 5MB or so enforced by browsers wrt localStorage, the auto save and option to restore a previous chat, can fail given that it will get filled fast and or even with a single large image of few MBs or more. May change to indexedDB later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant