Skip to content

Thinking Level

This page describes how TomoriBot’s provider-scoped thinking_level preference works today.

Use this page to verify:

  • the default
  • what each level means
  • how Tomori maps the general levels to provider-specific request fields
  • which providers currently ignore the setting

thinking_level is a provider-scoped saved preference controlled by:

  • /config samplers thinking_level:<value>
  • /config samplers provider:<saved-provider> thinking_level:<value>

Current values:

  • auto
  • none
  • low
  • medium
  • high

Default:

  • auto

Storage:

  • saved_provider_configs.thinking_level
  • server_model_configs.thinking_level (deprecated Phase 1.5 mirror; drop scheduled for step #14.5)

That means the active value is:

  • visible in /tool status
  • reflected in /tool prompt snapshot
  • preserved in provider snapshots and restored by /config provider switch

thinking_level is a provider-agnostic preference, not a guaranteed vendor feature.

Tomori only applies it when the active provider/model exposes a verified request-side reasoning or thinking control.

If a provider/model does not support a stable request-side control in Tomori, the setting is ignored for that request.

These are the meanings Tomori uses before mapping to vendor-specific fields:

LevelMeaning
autoLet the provider/model use its default or automatic behavior.
noneDisable thinking if possible, otherwise use the provider’s lowest safe setting.
lowAsk for light reasoning effort.
mediumAsk for balanced reasoning effort.
highAsk for the strongest available reasoning effort.

Tomori already has a per-turn forceReason flag used by some flows.

Current implementation rule:

  • if forceReason = true and stored thinking_level is auto or none, Tomori upgrades the effective level for that request to high
  • this does not rewrite the stored config

When a provider accepts a numeric reasoning budget, Tomori maps low / medium / high using these env vars:

  • THINKING_LEVEL_BUDGET_LOW_TOKENS=1024
  • THINKING_LEVEL_BUDGET_MEDIUM_TOKENS=4096
  • THINKING_LEVEL_BUDGET_HIGH_TOKENS=8192

These are Tomori defaults, not vendor defaults.

This section describes the mapping implemented in src/utils/provider/thinkingControl.ts.

Tomori splits Gemini behavior by model family:

  • Gemini 2.5 family: uses numeric thinking_budget
  • Gemini 3 / 3.1 family: uses enum-like thinking_level

Tomori behavior:

Model familyautononelow / medium / high
Gemini 2.5thinkingBudget: -1Flash/Flash-Lite: 0; Pro: 128uses env budget defaults, clamped to vendor minimums
Gemini 3 / 3.1omit thinking configFlash: MINIMAL; Pro: LOWLOW / MEDIUM / HIGH

Important implementation notes:

  • Gemini 2.5 Pro cannot be fully disabled, so Tomori maps none to the minimum safe budget instead of pretending it can turn thinking off.
  • Gemini 2.5 Flash-Lite has a higher positive minimum than Flash, so Tomori clamps upward when needed.
  • Gemini 3 Pro does not get a true disable path in Tomori; none becomes the lowest supported level.

Tomori uses adaptive thinking for supported Claude 4.6 / 4.7 models.

Currently mapped:

  • claude-sonnet-4-6
  • claude-opus-4-6
  • claude-opus-4-7

Tomori behavior:

LevelAnthropic request
autothinking: { type: "adaptive" }
nonethinking: { type: "disabled" }
lowthinking: { type: "adaptive" } + output_config: { effort: "low" }
mediumthinking: { type: "adaptive" } + output_config: { effort: "medium" }
highthinking: { type: "adaptive" } + output_config: { effort: "high" }

Additional behavior:

  • when adaptive thinking is active, Tomori omits sampling params that Anthropic rejects in that mode
  • unsupported Anthropic models currently ignore thinking_level

Tomori maps thinking_level to OpenRouter’s reasoning-effort control.

Tomori behavior:

LevelOpenRouter request
autoomit reasoning
nonereasoning: { effort: "none" }
lowreasoning: { effort: "low" }
mediumreasoning: { effort: "medium" }
highreasoning: { effort: "high" }

Tomori does not currently send numeric reasoning budgets through OpenRouter.

Tomori treats the two DeepSeek chat model modes differently:

  • deepseek-chat: optional request-side thinking enable
  • deepseek-reasoner: reasoning model by identity

Tomori behavior:

Modelauto / nonelow / medium / high
deepseek-chatomit thinking flagthinking: { type: "enabled" }
deepseek-reasonerno extra toggle; model stays reasoning-orientedno extra toggle; model stays reasoning-oriented

Additional behavior:

  • when DeepSeek thinking is active, Tomori removes incompatible sampling fields
  • Tomori does not currently expose a numeric DeepSeek reasoning budget because no verified stable budget field is wired here

Tomori maps thinking_level to Z.ai’s documented thinking enable/disable flag.

Tomori behavior:

LevelZ.ai request
autoomit thinking
nonethinking: { type: "disabled" }
low / medium / highthinking: { type: "enabled" }

Additional behavior:

  • when Z.ai thinking is active, Tomori removes temperature, top_p, frequency_penalty, and presence_penalty
  • Tomori does not currently send a numeric Z.ai thinking budget

Tomori only auto-maps thinking_level for Ollama-style OpenAI endpoints in the custom provider path.

Detection heuristic:

  • endpoint hostname contains ollama, or
  • endpoint port is 11434

Tomori behavior for detected Ollama endpoints:

LevelCustom request
autoomit reasoning_effort
nonereasoning_effort: "none"
lowreasoning_effort: "low"
mediumreasoning_effort: "medium"
highreasoning_effort: "high"

Tomori’s thinking_level has no effect on Gemma 4 thinking over a custom endpoint. Thinking activation is controlled entirely at the KoboldCPP launch level — not at the request level via the OpenAI-compatible API.

To enable Gemma 4 thinking in KoboldCPP:

  1. Use a Jinja chat template for Gemma 4 (enable “Use Jinja” and “Jinja for Tools” in the KoboldCPP UI).
  2. Launch KoboldCPP with --jinja_kwargs='{"enable_thinking":true}' to pass enable_thinking=true into the template engine. Without this flag the template defaults enable_thinking to false and no thinking tokens are emitted regardless of the template file.
  3. For 26B/31B hybrid models, alternatively hardcode {%- set enable_thinking = true -%} at the top of the Jinja template file.

Response-side parsing:

KoboldCPP v1.111.2+ automatically converts Gemma 4’s <|channel>thought…<channel|> thinking tokens into the standard reasoning_content field for pure-text responses. Tomori’s base adapter reads reasoning_content and routes it to the thought log channel automatically.

When a tool call immediately follows the thinking block, KoboldCPP does not split the chunk and the raw tokens appear in delta.content instead. Tomori’s GemmaThinkingParser (src/providers/custom/customGemmaThinkingParser.ts) handles this case — it strips the thinking block and routes it to thoughts before GemmaToolCallParser processes the tool call. Set CUSTOM_GEMMA_THINKING_PARSER_ENABLED=false to disable if a non-Gemma model unexpectedly produces similar token strings.

Thought log suppression:

Thought logs are suppressed for private channels (channels listed under /server private-channels) regardless of model or provider. Test thought log routing in a non-private channel.

Tomori maps thinking_level to the GLM prompt directive:

LevelPrompt directive
autofollow NAI_GLM_THINKING_ENABLED env behavior
none/nothink
low / medium / high<think></think>

This is a prompt-format control, not a numeric reasoning budget.

Tomori intentionally does not auto-send a generic request-side thinking control for:

  • KoboldCPP (see Gemma 4 section above for response-side parsing)
  • llama.cpp
  • generic vLLM custom endpoints

Reason:

  • These backends expose thinking via startup flags, Jinja template variables, or GUI settings — not via a stable, universally-supported OpenAI-compatible request field.
  • Injecting unrecognised fields into the request body can cause 400/422 errors on servers that validate strictly.

So the current implementation is conservative: configure thinking at the server level, not from Tomori’s thinking_level preference.

When adding a new provider, the implementation should now explicitly decide one of these:

  1. map thinking_level to the vendor’s verified request-side reasoning control
  2. intentionally no-op and document why the provider does not use it

Do not silently ignore the feature without documenting the decision.

See also:

These are the vendor docs used for the current mapping:

Some vendor docs describe capabilities and constraints, but not Tomori’s exact five-level mapping.

Where that happened, Tomori makes a conservative implementation choice:

  • prefer vendor-documented request fields
  • clamp to documented minimums instead of inventing unsupported disables
  • avoid sending undocumented generic fields to local/custom backends