Skip to content

Testing Chat Changes

This guide covers the chat regression harness introduced in Phase 5 (#12a). The harness protects the highest-risk chat decision behavior across the chat coordinator and generation-stage refactors.

Terminal window
# Run only chat regression tests
bun test tests/regression/chat/
# Run all tests; chat regression tests are included
bun test tests/

The first harness is fixture-driven and targets chat decision functions:

  • shouldBotReply() in src/utils/chat/replyDecision.ts for reply/no-reply decisions
  • determineMatchingPersonas() in src/utils/chat/triggerProcessor.ts for persona routing and deterministic ordering

Fixtures live in tests/regression/chat/fixtures/conversations.json. Expected decisions live in tests/regression/chat/fixtures/expected-decisions.json.

The initial fixture set records representative golden paths for:

FixtureProvider labelBehavior protected
google-direct-mention-mainGoogleDirect bot mention routes to the main persona
openrouter-deliberate-alter-triggerOpenRouterDeliberate trigger mode accepts @trigger for an alter
novelai-autochat-channel-alterNovelAIAlways-reply autochat channel routes to its assigned alter
google-multi-persona-trigger-orderGoogleMulti-persona triggers keep deterministic first-match ordering
openrouter-deliberate-mode-ignores-plain-triggerOpenRouterDeliberate trigger mode rejects plain trigger words without @
google-persona-webhook-self-triggerGooglePersona webhook messages are treated as self-generated and cannot trigger themselves

The provider labels document the three required provider families for Phase 5 (#12a). These fixtures do not call live providers; they freeze chat orchestration decisions that happen before provider streaming.

Terminal window
# Fast local loop
bun test tests/regression/chat/chat.regression.test.ts
# Full chat harness
bun test tests/regression/chat/
# Full project test command; this is what CI should run
bun test tests/

The harness does not require Discord credentials, provider API keys, or a database. It builds mocked Discord Message, Client, and TextChannel objects with only the fields needed by the decision functions.

  1. Add a new entry to tests/regression/chat/fixtures/conversations.json.
  2. Use stable fake Discord IDs (user_004, channel_001, etc.).
  3. Add the expected result to tests/regression/chat/fixtures/expected-decisions.json.
  4. Run bun test tests/regression/chat/.

Keep fixtures small. Each fixture should protect one behavior: a reply trigger, a no-reply guard, persona ordering, a webhook/self-trigger edge case, an autochat path, or a reply-reference path.

For chat-stage refactors, prefer adding fixtures against the named owner module whenever extracted logic changes. The harness should keep proving behavior at the module boundary instead of depending only on tomoriChat.ts.

chat.regression.test.ts includes a skipped probe test:

it.skip("[REGRESSION PROBE] fails when a fixture expectation is deliberately inverted", () => {
// ...
});

To smoke test the smoke test:

  1. Temporarily remove .skip from the probe.
  2. Run bun test tests/regression/chat/.
  3. Confirm the probe fails.
  4. Restore .skip.
  5. Re-run the harness and confirm it passes.

When changing chat stages, add fixtures for every behavior the extraction touches:

  • Golden-path message handling
  • Function-call tool trigger and response routing
  • REST and MCP tool routing decisions
  • Multi-persona triggering and ordering
  • Webhook self-trigger suppression
  • Empty mentions and malformed reply references
  • Rate-limit and cooldown rejection paths
  • Provider fallback or credential-unavailable branches

When a post-refactor chat bug is fixed, add a fixture for it in the same PR. The harness is a living regression suite, not a one-time checkpoint.