Bypassing Text Serialization: Building a Deterministic Bytecode VM for LLM Tool-Calling

In contemporary multi-agent design patterns, the industry standard for system orchestration—such as Anthropic’s Model Context Protocol (MCP) or standard JSON-RPC structures—relies entirely on text serialization strings. Under high-frequency runtime loops or compute-constrained edge environments, this approach introduces severe architectural flaws: volatile token overhead, high decoding latency, and a vulnerability to syntax hallucinations.

To evaluate if we can achieve complete execution determinism and lower the time-to-first-token bottleneck, we engineered Agent-IR: a low-level alternative that strips away text parsing entirely by forcing a local language model to act directly as an instruction compiler.

1. Grammatical Compilation via GBNF Constrained Decoding

Traditional tool-calling systems let an LLM generate fuzzed text freely, hoping the output adheres to a valid schema block before feeding it to a regular expression parser or JSON validator. If a bracket is dropped, the execution thread crashes, forcing an expensive retry context loop.

Agent-IR eliminates formatting errors at the inference tier. By utilizing GBNF (Guidance Backus-Naur Form) Grammars injected directly into the sampler loop of our execution engine, we dynamically overwrite the model's logit probabilities at each token step. The model is mathematically blocked from predicting any character that violates our language definition rules.

Instead of generating a verbose string token payload, the model selects native register commands sequentially:

By forcing the model's weights to resolve paths strictly within this syntax tree, a local 7B model functions with absolute architectural reliability. Output sequences collapse from a 50-token JSON envelope down to a tight, 4-token instruction stream.

2. Low-Overhead Runtime: The Memory-Safe Rust Virtual Machine

Once generated, these token sequences are streamed raw over an active, authenticated WebSocket layer into a custom compilation runtime built in Rust. Instead of initializing standard text-deserialization stacks, the incoming opcode streams are intercepted directly by an execution layout framework.

The backend environment reads incoming data patterns sequentially, passing bytes straight into our sandboxed register structure using a modular compilation processor node:

This decoupled architecture shifts the operational load out of the probabilistic model environment and into a deterministic system loop. Pointers are evaluated cleanly without garbage collection drops, and string labels are parsed safely via a localized multi-pass indexing layer before any block hits the data engine.

3. Bounding Execution Safety: The Academic Horizon

Engineering a low-overhead compilation loop is a significant software achievement, but transforming this apparatus into an empirical research framework requires addressing system boundaries. Our active development pipeline is currently targeting two core execution parameters:

Instruction Budget Boundaries: Implementing an explicit instruction-step budget loop inside the VM context layer to guarantee that recursive model logic pathways cannot trigger an infinite processing crash.
Quantitative Latency Telemetry: Compiling precise dataset performance curves tracking token footprint reduction ratios and compute consumption efficiency gains against standard JSON-RPC baselines across variable parameter scales.

By treating the language model as a direct token-to-register compiler rather than a conversational text buffer, Agent-IR provides an alternative approach for secure, high-speed, localized agent networks.

1. Grammatical Compilation via GBNF Constrained Decoding

2. Low-Overhead Runtime: The Memory-Safe Rust Virtual Machine

3. Bounding Execution Safety: The Academic Horizon

Command Palette