NovelAI GLM 4.6 Tool Calling
Overview
Section titled “Overview”NovelAI’s GLM 4.6 model uses prompt-based tool calling — tools are defined in the system prompt, and the model generates structured XML blocks when it decides to use a tool. This is fundamentally different from providers like Google Gemini or OpenRouter that have native function calling APIs.
The implementation lives primarily in src/providers/novelai/novelaiStreamAdapter.ts.
Image Generation State
Section titled “Image Generation State”generate_image_nainow requires an explicitserver_novelai_imagegen_configs.nai_diffusion_model_id. When that dedicated slot isNULL, the tool stays hidden and NovelAI image generation remains disabled until/model imagesets a NovelAI model again./model imagenow also handles the dedicated NovelAI image slot when the selected provider is NovelAI.generate_image_nainow resolves its sampler, steps, scale, noise schedule, andcfg_rescalefromserver_novelai_imagegen_configsfirst, falling back to theNAI_IMAGE_*/NAI_CFG_RESCALEenv values when the server override isNULL./novelai image paramsis the admin-facing command for those parameter overrides.- Image tag profile commands are provider-neutral:
/persona image-tags,/personal image-tags,/config image-tags default-positive, and/config image-tags default-negative. /novelai image generateis the slash-command image generation entrypoint for direct tag-based NAI image creation, and now opens a modal for prompt, extra negative tags, optional character reference, and orientation selection./novelai character-referencenow persists persona/user reference images throughsrc/utils/storage/charrefStorage.ts.generate_image_nainow supports a structuredcharacters[]array for V4 models.generate_image_nainow uses a simpler active character schema: eachcharacters[]item is one visible character instance, andcharacters[].tagsmust contain that character’s full appearance plus their role in the scene. Profile-driven autofill byidandremove_tagssuppression are currently disabled in the active schema/runtime. If known persona/user Physical Appearance tags are available in conversation context, the model is expected to copy the relevant tags intocharacters[].tagsdirectly. For erotic scenes, clothing tags can be omitted and the intended nude state can be stated directly intags.- Saved character references are still persisted by
/novelai character-reference, but the current activegenerate_image_naicharacter prompting flow does not inject profile-driven refs or profile-driven Physical Appearance tags. - Multi-character generations intentionally skip saved reference images and rely on per-character tags only, because NovelAI still treats Director/Precise Reference as whole-image guidance rather than strict per-character binding.
- Character placement now populates both top-level
characterPrompts[]andv4_prompt.caption.char_captions[]from the inlinecharacters[].tagsonly. Coordinate mode is enabled when two or more characters are present. - Context building now surfaces saved Physical Appearance tags inline on the relevant conversation entries instead of a separate
# Image Profilesblock, so identity and image appearance guidance stay together in one place. - Nested tool schemas are now preserved recursively by the provider adapters, so structured array/object params such as
characters[]survive tool conversion instead of being flattened toitems.type.
Architecture
Section titled “Architecture”Pipeline Flow
Section titled “Pipeline Flow”1. Tool definitions registered at stream start └─ normalizeToolDefinitions() → NormalizedToolDefinition[]
2. Tool guide injected into system prompt └─ buildToolCallingGuide() → <tools> XML block + format instructions
3. Tool history from previous calls injected into conversation └─ buildToolHistoryGlm() → <|assistant|>/<|observation|> turns
4. Model generates response (may include tool calls) └─ Streamed via NovelAI's OpenAI-compatible completions API
5. Stream tokens processed through tool-aware pipeline └─ processTokenWithToolParsing() → decides: text vs tool_call
6. Tool call parsed and returned to orchestrator └─ parseToolCallBlock() → FunctionCall object
7. On stream end without closing tag, recovery attempted └─ Synthesize </tool_call> and parse accumulated bufferToken Processing Modes
Section titled “Token Processing Modes”The adapter uses a state machine (toolCallMode) with four states:
| State | Description | Transitions |
|---|---|---|
disabled | Tools not available — pass tokens to processVisibleText() directly | — |
undecided | Accumulating initial tokens to decide if the model is generating text or a tool call | → text or tool_call |
text | Model is generating visible text; scan for <tool_call> mid-stream | → tool_call (if tag found) |
tool_call | Accumulating tool call XML until </tool_call> is found | → parsed FunctionCall |
Decision Logic (decideToolCallMode)
Section titled “Decision Logic (decideToolCallMode)”When in undecided mode, each token is appended to toolPreludeBuffer and analyzed:
<think>...</think>blocks — consumed silently (thinking content stripped)<tool_call>tag — switch totool_callmode (properly wrapped call)- Known tool name — if the first line matches a registered tool name (with underscore/hyphen normalization), wait for
<arg_key>to confirm, then wrap in<tool_call>and switch totool_callmode - Anything else — switch to
textmode
Tool Call Format
Section titled “Tool Call Format”What the Model Should Generate (per system prompt instructions)
Section titled “What the Model Should Generate (per system prompt instructions)”<tool_call>web_search<arg_key>query</arg_key><arg_value>live performances Japan February 2026</arg_value><arg_key>category</arg_key><arg_value>text</arg_value></tool_call>What the Model Actually Generates (common GLM behavior)
Section titled “What the Model Actually Generates (common GLM behavior)”GLM 4.6 frequently omits the <tool_call> wrapper tag and outputs the function name directly:
web_search<arg_key>query</arg_key><arg_value>live performances Japan February 2026</arg_value><arg_key>category</arg_key><arg_value>text</arg_value>The adapter handles this via unwrapped tool call detection — checking if the first line of the prelude matches a known tool name (with underscore/hyphen normalization via normalizeToolName()).
Tool Name Normalization
Section titled “Tool Name Normalization”MCP tools are sometimes registered with hyphens (e.g., web-search) but the model outputs underscores (e.g., web_search). The normalizeToolName() method tries:
- Exact match
- Underscores → hyphens
- Hyphens → underscores
This normalization is used in both:
decideToolCallMode()— for detecting unwrapped tool callsparseToolCallBlock()— for resolving the final function name
Tool History Format (GLM Chat Template)
Section titled “Tool History Format (GLM Chat Template)”Previous tool calls and their results are formatted using GLM’s role tag structure:
<|assistant|><think></think><tool_call>web_search<arg_key>query</arg_key><arg_value>...</arg_value><arg_key>category</arg_key><arg_value>text</arg_value></tool_call><|observation|><tool_response>{"results": [...]}</tool_response>Built by buildToolHistoryGlm() and inserted into the prompt between dialogue turns and the generation prompt.
System Prompt Tool Guide
Section titled “System Prompt Tool Guide”Built by buildToolCallingGuide(), injected into the <|system|> block:
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:<tools>{"name":"web_search","description":"...","parameters":{...}}{"name":"create_task","description":"...","parameters":{...}}</tools>
For each function call, output the function name and arguments within the following XML format:<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value>...</tool_call>Truncation Recovery
Section titled “Truncation Recovery”NAI’s ~150-token hard cap (or 600 max_length budget) often cuts the model off mid-tool-call before it generates </tool_call>. Two recovery mechanisms handle this on stream end:
1. tool_call mode recovery
Section titled “1. tool_call mode recovery”If the stream ends while in tool_call mode with accumulated buffer:
- Synthesize
</tool_call>closing tag - Attempt to parse the patched block
- If successful, return the
FunctionCallto the orchestrator
2. undecided mode recovery
Section titled “2. undecided mode recovery”If the stream ends while still in undecided mode with a prelude buffer:
- Check if the first line matches a known tool name
- If
<arg_key>is present, wrap in<tool_call>...</tool_call>and parse
Debris Detection and Suppression
Section titled “Debris Detection and Suppression”The adapter includes three layers of debris detection to handle GLM’s tendency to generate garbage after valid output:
1. </think> Debris Detection (RESOLVED)
Section titled “1. </think> Debris Detection (RESOLVED)”The model sometimes generates stray </think> tags mid-response followed by garbage text (e.g., "oggers:</think>\nTomori I'll kill you").
Solution: processVisibleText() checks for </think> during the visible text phase. When found, the stream stops immediately — only clean text before the tag is emitted, everything after is discarded.
2. Stray Tool Calls After Text (RESOLVED)
Section titled “2. Stray Tool Calls After Text (RESOLVED)”The model may generate a complete text response, then attempt a tool call (e.g., select_sticker_for_response) at the very end without arguments.
Solution: A hasEmittedVisibleText flag tracks whether any visible text has been sent to the user. When set, all subsequent tool call detections are suppressed:
processTokenWithToolParsing()— ignoresundecided→tool_calltransitionsprocessTextWithToolScan()— ignores both<tool_call>tags and unwrapped function namesprocessChunk()final-chunk recovery — skips truncation recovery for bothtool_callandundecidedmodes
3. Mid-Text Unwrapped Tool Call Detection (RESOLVED)
Section titled “3. Mid-Text Unwrapped Tool Call Detection (RESOLVED)”When the model starts with text then switches to an unwrapped tool call (bare function name without <tool_call> wrapper), the previous code only scanned for <tool_call> XML tags.
Solution: detectUnwrappedToolCallInText() scans the text buffer for bare function names (matching registered tools via normalizeToolName()) followed by <arg_key> tags. If found after visible text, they’re suppressed as debris. If found before any visible text, they’re wrapped in <tool_call> tags for standard parsing.
Known Limitations
Section titled “Known Limitations”1. Token Budget vs Thinking
Section titled “1. Token Budget vs Thinking”With /nothink removed (to enable reasoning for tool use), the model may use tokens on internal reasoning. Combined with NAI’s token budget, this can result in:
- Truncated tool calls (handled by recovery)
- Thinking consuming entire budget (empty response)
- Model choosing to respond with text instead of tool calls
2. Tool Call Arguments Truncation
Section titled “2. Tool Call Arguments Truncation”If the token cap hits mid-<arg_value>, the last argument is incomplete. The truncation recovery synthesizes </tool_call> but the incomplete argument may be lost.
3. URL Fetch Tool Disabled
Section titled “3. URL Fetch Tool Disabled”fetch_url is not exposed to NovelAI initially. It is a built-in tool for other providers, but fetched-page payloads can be large enough to destabilize GLM’s prompt/tool budget. Re-enable only after validating realistic fetched-page results against NovelAI’s system-prompt and tool-history limits.
File References
Section titled “File References”| File | Purpose |
|---|---|
src/providers/novelai/novelaiStreamAdapter.ts | Stream adapter with all tool parsing logic |
src/providers/novelai/novelaiService.ts | API communication, parameter conversion |
src/providers/novelai/novelaiProvider.ts | Provider interface, stream config setup |
references/glm_46_chat_template.jinja.txt | Official GLM 4.6 Jinja template (source of truth) |