Skip to content

Conversation

@latent-variable
Copy link

Summary

  • add arguments_json to FunctionCall so we persist the exact tool-call payload returned by the model
  • have the OpenAI client reuse the stored JSON when constructing follow-up requests (falling back to a canonical dump only if the string is missing)
  • capture the raw JSON when parsing responses, keeping both the parsed dict (for tool execution) and the untouched string (for replay)

Problem

When Mini-Agent is configured to send requests to a local LM Studio endpoint (or any local serving stack with KV cache), each subsequent request must be byte-identical for the cached portion of the context. Today the request builder re-serializes every tool call in the transcript using json.dumps(..., sort_keys=True). That changes key ordering, whitespace, or float formatting compared to what the model actually emitted, meaning the tool call that was prepended to Request #2 is different from the one the model saw during Request #1. LM Studio therefore treats the assistant history as a cache miss, reprocessing all prior tokens (~12k tokens per turn in our setup), wasting latency and compute.

Testing

@latent-variable latent-variable force-pushed the fix/deterministic-local-tools branch from 2eba424 to 91d7951 Compare November 18, 2025 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant