06.3: Generation Turn
Drive the provider call with model fallback and API-key rotation.
File: src/utils/chat/generationTurn.ts:46-152
Mission
Section titled “Mission”Run the LLM call for this turn, with two layers of resilience: a model fallback chain (primary model + any configured fallback entries) and, per attempt, an API-key rotation loop (cycles through saved rotation keys before giving up). Each attempt delegates the actual streaming + tool-call dispatch to the tool-loop pipeline. Emits stream results to the sink and finalizes with the first non-error result (or the last attempt’s result if all fail).
ChatTurnContext(from per-turn stage 01, withresponseTargetpopulated by stage 02).ChatResponseSink(from per-turn stage 02).
Output
Section titled “Output”GenerationTurnResult — see src/utils/chat/types.ts:244-250:
{ status: StreamResult["status"] | "skipped"; streamResults: StreamResult[]; personaResponses: ChatPersonaResponse[]; thoughtLog?: ThoughtLogPayload; thoughtLogOwner?: ThoughtLogOwner;}status === "skipped" is emitted when the attempts list is exhausted without
a non-error result and the loop falls through (rare; defensive).
Side effects
Section titled “Side effects”Per-attempt setup (buildGenerationAttempts, createAttempt):
- Resolves the primary
TomoriState— applies personal-provider selection (if BYOK), channel LLM override, and anyllmOverrideCodenamefrom the incoming. - Selects an API key from the rotation pool, falling back to the server’s
own encrypted key via
decryptApiKey. - Builds a
ProviderConfigvia the resolvedLLMProvider.createConfig. - Assembles a unified pool with the primary model at index 0 followed by
every configured fallback entry, then builds one attempt per pool member
(custom-endpoint or saved-provider-config flavor). The lead attempt is always
labelled
"primary"in logs even when the randomizer (below) promoted a fallback into that slot; the true model is still visible viasuccessModel.
Per-turn model randomizer (buildGenerationAttempts):
- When
config.model_randomizer_enabledistrueand the pool has ≥2 members, a random pool member is spliced to the front of the attempt list per generation turn; the remaining members keep their relative order as the failover tail. This is a pure reordering — the original primary stays in the chain and serves as failover if the random lead errors. No model is dropped and no model is attempted twice. - Because the fallback-used notice keys on
index > 0, a randomized lead that succeeds stays silent (no spurious “Fallback Used” embed); a genuine failover after the lead fails still notifies correctly. - When the toggle is
false, the pool order is unchanged ([primary, ...fallbacks]), preserving the deterministic primary-first behavior. - The toggle is server-level (
server_chat_configs.model_randomizer_enabled) and is enabled via/config model-randomizer, which refuses to enable unless ≥1 fallback model is configured — guaranteeing the pool always has ≥2 members.
Per-attempt context prep (prepareProviderContextItems):
- Resolves dialogue
mediaDescriptorsinto final image/video parts or model-appropriate system notices using the attempt’sTomoriState. This is where personal-provider routing, fallback model capability differences, and OpenRouter live media capability corrections affect media visibility. - Applies provider-specific token-limit truncation
(
truncateDialogueHistory) for Gemini, OpenRouter, NovelAI. - If the previous attempt ended with
emptyResponseFinishReason === "length"and we’re on a retry, additionally drops the oldest history exchange pairs.
Per-attempt execution (key rotation inner loop):
- Calls
runToolLoop(...)— see tool-loop pipeline. - On success:
recordKeySuccess(rotationKeyId), break out of the rotation loop. - On error: classifies the error (rate-limit vs api-error),
recordKeyError(...), rotates to the next rotation key (up to - Suppresses user-facing stream errors while another rotation key or model fallback can still be tried.
- Holds non-final failed model attempts out of
responseSink.emitStreamResultso their details can be summarized by the fallback notice instead of posted as public errors. - On completed model fallback: sends the compact
Fallback Usedbutton notice with the earlier failure chain available on demand, unless a stop/follow-up interrupt is pending for the channel. - On non-error or last attempt: emits only final error results, calls
responseSink.finalize(result), and returns. - On thrown error: calls
responseSink.emitError(error)and finalizes with anerrorresult.
NovelAI subscription refresh:
- For NovelAI providers without a cached context-token count, refreshes the
subscription via
refreshNovelAISubscription(one-shot, cached for subsequent turns).
Invariants
Section titled “Invariants”After this stage runs:
responseSink.finalize(result)has been called exactly once. Generation guarantees this in both the success and thecatchpaths.- If the result is non-error,
result.personaResponses.length > 0(or the status is"skipped", which post-turn effects will distinguish). - Rotation-key bookkeeping (
recordKeySuccess/recordKeyError) reflects the outcome of the key that was actually used for each attempt.
Extension points
Section titled “Extension points”The stage is a coordinator over several plugin-relevant subsystems:
| Subsystem | Helper | Plugin-relevance |
|---|---|---|
| Provider dispatch | ProviderFactory.getProviderByName, getProviderForTomori | The provider plugin contract is the seam — see provider pipeline |
| Tool execution | runToolLoop | See tool-loop pipeline |
| Key rotation | selectApiKey, recordKeySuccess, recordKeyError, hasAvailableRotationKey | Internal — rotation-key schema is core, not plugin-relevant |
| Fallback chain | createFallbackAttempt, applySavedProviderConfig | The fallback-entry schema (FallbackEntry union: model or custom_endpoint) is the data-model seam |
| Context truncation | truncateDialogueHistory | Per-provider token-limit table is the registration surface |
| Personal-provider routing | applyPersonalProviderSelectionsToTomoriState | BYOK substitution; see provider pipeline |
The stage itself is internal — its job is to orchestrate the “attempt with fallback + key rotation” pattern. Plugins wanting to:
- Add a new provider — register it via the provider plugin contract.
- Change attempt-list construction (e.g. add a probe attempt before the
primary) — would extend
buildGenerationAttempts. → plugin plan candidate. - Intercept stream results — wrap the sink (per-turn stage 02), not this stage.
Configuration
Section titled “Configuration”| Env var | Default | Purpose |
|---|---|---|
OPENROUTER_APP_ATTRIBUTION_ENABLED | true | Sends TomoriBot app attribution headers to OpenRouter for app rankings and aggregated usage analytics. Set to false to omit them. |
OPENROUTER_LENGTH_EMPTY_RETRY_DROP_PAIRS | 2 | Per-retry history-pair drop count when OpenRouter returns empty/length |
OPENROUTER_MAX_OUTPUT_TOKENS | 8192 | Cap on OpenRouter truncation output-token budget |
Plus MAX_KEY_ATTEMPTS from keyRotation.ts.
Related docs
Section titled “Related docs”- Tool execution loop: → tool-loop pipeline
- Provider streaming + adapter pattern: → provider pipeline
- Key rotation: → no dedicated doc yet;
keyRotation.tshelper only - Fallback chain schema: →
docs/subsystems/database-schema.md(fallback_chaincolumn) - Personal-provider runtime substitution: → provider pipeline