Skip to content

06.3: Generation Turn

Drive the provider call with model fallback and API-key rotation.

File: src/utils/chat/generationTurn.ts:46-152

Run the LLM call for this turn, with two layers of resilience: a model fallback chain (primary model + any configured fallback entries) and, per attempt, an API-key rotation loop (cycles through saved rotation keys before giving up). Each attempt delegates the actual streaming + tool-call dispatch to the tool-loop pipeline. Emits stream results to the sink and finalizes with the first non-error result (or the last attempt’s result if all fail).

  • ChatTurnContext (from per-turn stage 01, with responseTarget populated by stage 02).
  • ChatResponseSink (from per-turn stage 02).

GenerationTurnResult — see src/utils/chat/types.ts:244-250:

{
status: StreamResult["status"] | "skipped";
streamResults: StreamResult[];
personaResponses: ChatPersonaResponse[];
thoughtLog?: ThoughtLogPayload;
thoughtLogOwner?: ThoughtLogOwner;
}

status === "skipped" is emitted when the attempts list is exhausted without a non-error result and the loop falls through (rare; defensive).

Per-attempt setup (buildGenerationAttempts, createAttempt):

  • Resolves the primary TomoriState — applies personal-provider selection (if BYOK), channel LLM override, and any llmOverrideCodename from the incoming.
  • Selects an API key from the rotation pool, falling back to the server’s own encrypted key via decryptApiKey.
  • Builds a ProviderConfig via the resolved LLMProvider.createConfig.
  • Assembles a unified pool with the primary model at index 0 followed by every configured fallback entry, then builds one attempt per pool member (custom-endpoint or saved-provider-config flavor). The lead attempt is always labelled "primary" in logs even when the randomizer (below) promoted a fallback into that slot; the true model is still visible via successModel.

Per-turn model randomizer (buildGenerationAttempts):

  • When config.model_randomizer_enabled is true and the pool has ≥2 members, a random pool member is spliced to the front of the attempt list per generation turn; the remaining members keep their relative order as the failover tail. This is a pure reordering — the original primary stays in the chain and serves as failover if the random lead errors. No model is dropped and no model is attempted twice.
  • Because the fallback-used notice keys on index > 0, a randomized lead that succeeds stays silent (no spurious “Fallback Used” embed); a genuine failover after the lead fails still notifies correctly.
  • When the toggle is false, the pool order is unchanged ([primary, ...fallbacks]), preserving the deterministic primary-first behavior.
  • The toggle is server-level (server_chat_configs.model_randomizer_enabled) and is enabled via /config model-randomizer, which refuses to enable unless ≥1 fallback model is configured — guaranteeing the pool always has ≥2 members.

Per-attempt context prep (prepareProviderContextItems):

  • Resolves dialogue mediaDescriptors into final image/video parts or model-appropriate system notices using the attempt’s TomoriState. This is where personal-provider routing, fallback model capability differences, and OpenRouter live media capability corrections affect media visibility.
  • Applies provider-specific token-limit truncation (truncateDialogueHistory) for Gemini, OpenRouter, NovelAI.
  • If the previous attempt ended with emptyResponseFinishReason === "length" and we’re on a retry, additionally drops the oldest history exchange pairs.

Per-attempt execution (key rotation inner loop):

  • Calls runToolLoop(...) — see tool-loop pipeline.
  • On success: recordKeySuccess(rotationKeyId), break out of the rotation loop.
  • On error: classifies the error (rate-limit vs api-error), recordKeyError(...), rotates to the next rotation key (up to
  • Suppresses user-facing stream errors while another rotation key or model fallback can still be tried.
  • Holds non-final failed model attempts out of responseSink.emitStreamResult so their details can be summarized by the fallback notice instead of posted as public errors.
  • On completed model fallback: sends the compact Fallback Used button notice with the earlier failure chain available on demand, unless a stop/follow-up interrupt is pending for the channel.
  • On non-error or last attempt: emits only final error results, calls responseSink.finalize(result), and returns.
  • On thrown error: calls responseSink.emitError(error) and finalizes with an error result.

NovelAI subscription refresh:

  • For NovelAI providers without a cached context-token count, refreshes the subscription via refreshNovelAISubscription (one-shot, cached for subsequent turns).

After this stage runs:

  • responseSink.finalize(result) has been called exactly once. Generation guarantees this in both the success and the catch paths.
  • If the result is non-error, result.personaResponses.length > 0 (or the status is "skipped", which post-turn effects will distinguish).
  • Rotation-key bookkeeping (recordKeySuccess/recordKeyError) reflects the outcome of the key that was actually used for each attempt.

The stage is a coordinator over several plugin-relevant subsystems:

SubsystemHelperPlugin-relevance
Provider dispatchProviderFactory.getProviderByName, getProviderForTomoriThe provider plugin contract is the seam — see provider pipeline
Tool executionrunToolLoopSee tool-loop pipeline
Key rotationselectApiKey, recordKeySuccess, recordKeyError, hasAvailableRotationKeyInternal — rotation-key schema is core, not plugin-relevant
Fallback chaincreateFallbackAttempt, applySavedProviderConfigThe fallback-entry schema (FallbackEntry union: model or custom_endpoint) is the data-model seam
Context truncationtruncateDialogueHistoryPer-provider token-limit table is the registration surface
Personal-provider routingapplyPersonalProviderSelectionsToTomoriStateBYOK substitution; see provider pipeline

The stage itself is internal — its job is to orchestrate the “attempt with fallback + key rotation” pattern. Plugins wanting to:

  • Add a new provider — register it via the provider plugin contract.
  • Change attempt-list construction (e.g. add a probe attempt before the primary) — would extend buildGenerationAttempts. → plugin plan candidate.
  • Intercept stream results — wrap the sink (per-turn stage 02), not this stage.
Env varDefaultPurpose
OPENROUTER_APP_ATTRIBUTION_ENABLEDtrueSends TomoriBot app attribution headers to OpenRouter for app rankings and aggregated usage analytics. Set to false to omit them.
OPENROUTER_LENGTH_EMPTY_RETRY_DROP_PAIRS2Per-retry history-pair drop count when OpenRouter returns empty/length
OPENROUTER_MAX_OUTPUT_TOKENS8192Cap on OpenRouter truncation output-token budget

Plus MAX_KEY_ATTEMPTS from keyRotation.ts.