01: Context Assembly
Translates the provider-agnostic StructuredContextItem[] into a provider-native API request and opens the HTTP streaming connection.
Contract: BaseStreamAdapter.startStream — src/types/stream/interfaces.ts:245
Canonical implementation: GoogleStreamAdapter.startStream — src/providers/google/googleStreamAdapter.ts:151-291
Mission
Section titled “Mission”Each provider’s StreamAdapter subclass implements startStream(config, context) as an async
generator. The first half of that method — covered in this stage — handles all request
construction before any HTTP bytes arrive. It converts the provider-agnostic
StructuredContextItem[] from StreamContext.contextItems into the format the provider’s SDK
expects (e.g., Gemini Content[]), attaches tools and function interaction history, applies
per-provider config options (stop strings, thinking mode, generation parameters), and opens the
HTTP streaming connection to the provider’s API. Stage 02 covers what happens after the
connection is open (the generator loop).
The LLMProvider.streamToDiscord() facade (on each
provider class) constructs the adapter and StreamConfig, then calls
StreamOrchestrator.streamToDiscord(adapter, config, context), which calls adapter.startStream
to begin this stage.
Tool declarations are prepared before this stage by getAvailableToolsWithMCP(). That path first
filters tools by provider/config/model constraints, then runs the dynamic tool assembler
(src/tools/assembly.ts) so provider adapters receive normal Tool objects whose descriptions and
parameter schemas already match the active backend for the turn.
config: StreamConfig— extendsProviderConfigwith Discord-specific settings (buffer sizes, timing, humanizer degree). Defined atsrc/types/stream/interfaces.ts:62.context: StreamContext— full Discord + application state. Defined atsrc/types/stream/interfaces.ts:88. Key fields consumed in this stage:context.contextItems: StructuredContextItem[]— the assembled conversation context from the context-build pipeline, each item tagged with aContextItemTagthat determines its placement (system instruction vs. dialogue turn).context.functionInteractionHistory— paired(functionCall, functionResponse)records from prior iterations of the tool-loop pipeline, replayed as model + user turns.context.currentTurnModelParts— provider-native model parts accumulated within the current tool-loop iteration (used to replay partial model output before a tool call).context.tomoriState— server config, including stop strings, speaker-pattern flag, thinking mode toggles.context.messageIdMap— opaque map for resolvingmedia_N/ref_Nkeys back to Discord message snowflake IDs (used by Google’s GIF and video routing).
Output
Section titled “Output”This stage produces no separate return value — it transitions into the generator loop (stage 02). As a side effect of setup, the HTTP streaming connection to the provider API is opened by the end of this stage.
Side effects
Section titled “Side effects”- Google: Calls
GoogleStreamAdapter.buildTokenCountPayload(contextItems, model, messageIdMap), which materialises allStructuredContextItemparts into GeminiContent[]objects. Images are fetched and base64-encoded viafetchAndOptimizeImage(); GIFs are processed viaextractGifKeyframes()in dev or replaced with a text placeholder in production. Videos are fetched and inlined if underVIDEO_CONTEXT_MAX_INLINE_MB. - All adapters: Calls
buildProviderStopStrings()to merge configuredllm_stop_stringswith persona-specific speaker-pattern stop strings and provider-native stop sequences. - All adapters: Initialises speaker-guard state (per-adapter rolling tail buffers used in stage 02).
Invariants
Section titled “Invariants”After context assembly completes (before the generator loop begins):
- The provider SDK client has been initialised with the correct API key from
config.apiKey. - The assembled native request payload includes all dialogue turns derived from
context.contextItems, in the correct role ordering required by the provider. - Function interaction history (if present) has been replayed in alternating model/user turns with image metadata attached to the corresponding user turn.
config.tools(if non-empty) has been attached to the request config after dynamic tool assembly and provider-specific serialization.
Extension points
Section titled “Extension points”| Surface | Plugin-relevance |
|---|---|
BaseStreamAdapter.startStream() abstract method | A plugin adding a new provider implements this method. The contract is defined in src/types/stream/interfaces.ts:182. The full implementation must yield RawStreamChunk objects (stage 02) and conform to the generator signature. |
| Dynamic tool assembly | src/tools/assembly.ts is the standard seam for tools whose LLM-visible schema depends on active backend capability. A built-in tool implements assembleForContext(context) and returns a per-turn variant or null; provider adapters should keep consuming the assembled Tool[]. |
StructuredContextItem routing (system vs. dialogue) | Each adapter decides which ContextItemTag values become system instructions vs. dialogue turns. Google’s SYSTEM_INSTRUCTION_TAGS set at src/providers/google/googleStreamAdapter.ts:94 is the canonical example. A plugin changing context routing would subclass the relevant adapter or provide its own. → plugin plan candidate |
buildProviderStopStrings() | src/providers/utils/stopStrings.ts. Internal — stop-string merging is a provider-operational concern; the llm_stop_strings DB column is the configuration surface. |
Image / video / GIF fetching (fetchAndOptimizeImage, safeDownload, extractGifKeyframes) | Internal — media fetching is tightly coupled to provider-specific inline-data limits and format requirements. |
Configuration
Section titled “Configuration”| Source | Key / Env var | Default | Purpose |
|---|---|---|---|
TomoriState.config | llm_stop_strings | null | Custom stop strings appended to provider stops |
TomoriState.config | llm_stop_speaker_pattern_enabled | false | Adds a persona-name speaker-label stop string |
TomoriState.config | llm_thinking_enabled / llm_thinking_budget_tokens | provider-specific | Enables chain-of-thought mode (Google / Anthropic thinking config) |
| Env var | VIDEO_CONTEXT_MAX_INLINE_MB | 20 | Max video size (MB) allowed for inline base64 in Google context |
| Env var | RUN_ENV | — | "production" replaces GIF frames with text placeholders to avoid memory pressure |
Related docs
Section titled “Related docs”- Context items that arrive here: → context-build pipeline
- Function history that is replayed here: → tool-loop pipeline — Stage 02
executeToolCall - Type definitions:
StructuredContextItem→src/types/misc/context.ts;StreamConfig/StreamContext→src/types/stream/interfaces.ts - Provider adapter registry:
src/utils/provider/providerInfoRegistry.ts - Adding a new provider end-to-end: →
docs/guides/adding-new-provider.md - Strict chat-completion normalizations applied during assembly (role alternation, prefix completion, always-on media relocation): →
subsystems/strict-chat-completion.md