Defer sort seed checks until adjust-order startup

2026-05-22 22:44:56 +08:00
parent e438386e68
commit 4b2eff1c7e
146 changed files with 20067 additions and 65 deletions
--- a/.trellis/spec/guides/cross-layer-thinking-guide.md
+++ b/.trellis/spec/guides/cross-layer-thinking-guide.md
@@ -0,0 +1,162 @@
+# Cross-Layer Thinking Guide
+
+> **Purpose**: Think through data flow across layers before implementing.
+
+---
+
+## The Problem
+
+**Most bugs happen at layer boundaries**, not within layers.
+
+Common cross-layer bugs:
+- API returns format A, frontend expects format B
+- Database stores X, service transforms to Y, but loses data
+- Multiple layers implement the same logic differently
+
+---
+
+## Before Implementing Cross-Layer Features
+
+### Step 1: Map the Data Flow
+
+Draw out how data moves:
+
+```
+Source → Transform → Store → Retrieve → Transform → Display
+```
+
+For each arrow, ask:
+- What format is the data in?
+- What could go wrong?
+- Who is responsible for validation?
+
+### Step 2: Identify Boundaries
+
+| Boundary | Common Issues |
+|----------|---------------|
+| API ↔ Service | Type mismatches, missing fields |
+| Service ↔ Database | Format conversions, null handling |
+| Backend ↔ Frontend | Serialization, date formats |
+| Component ↔ Component | Props shape changes |
+
+### Step 3: Define Contracts
+
+For each boundary:
+- What is the exact input format?
+- What is the exact output format?
+- What errors can occur?
+
+---
+
+## Common Cross-Layer Mistakes
+
+### Mistake 1: Implicit Format Assumptions
+
+**Bad**: Assuming date format without checking
+
+**Good**: Explicit format conversion at boundaries
+
+### Mistake 2: Scattered Validation
+
+**Bad**: Validating the same thing in multiple layers
+
+**Good**: Validate once at the entry point
+
+### Mistake 3: Leaky Abstractions
+
+**Bad**: Component knows about database schema
+
+**Good**: Each layer only knows its neighbors
+
+---
+
+## Checklist for Cross-Layer Features
+
+Before implementation:
+- [ ] Mapped the complete data flow
+- [ ] Identified all layer boundaries
+- [ ] Defined format at each boundary
+- [ ] Decided where validation happens
+
+After implementation:
+- [ ] Tested with edge cases (null, empty, invalid)
+- [ ] Verified error handling at each boundary
+- [ ] Checked data survives round-trip
+
+---
+
+## Cross-Platform Template Consistency
+
+In Trellis, command templates (e.g., `record-session.md`) exist in **multiple platforms** with identical or near-identical content. This is a cross-layer boundary.
+
+### Checklist: After Modifying Any Command Template
+
+- [ ] Find all platforms with the same command: `find src/templates/*/commands/trellis/ -name "<command>.*"`
+- [ ] Update all platform copies (Markdown `.md` and TOML `.toml`)
+- [ ] For Gemini TOML: adapt line continuations (`\\` vs `\`) and triple-quoted strings
+- [ ] Run `/trellis:check-cross-layer` to verify nothing was missed
+
+**Real-world example**: Updated `record-session.md` in Claude to use `--mode record`, but forgot iFlow, Kilo, OpenCode, and Gemini — caught by cross-layer check.
+
+---
+
+## Generated Runtime Template Upgrade Consistency
+
+Some generated files are both documentation and runtime input. In Trellis,
+`.trellis/workflow.md` is parsed by `get_context.py`, `workflow_phase.py`,
+SessionStart filters, and per-turn hooks. Template changes must be validated
+against both fresh init and upgrade paths.
+
+### Checklist: After Modifying A Runtime-Parsed Template
+
+- [ ] Identify every runtime parser that reads the template, not just the file
+  writer that installs it
+- [ ] Check whether relevant syntax lives outside obvious managed regions
+  such as tag blocks
+- [ ] Verify fresh `init` output and a versioned `update` scenario that writes
+  the older `.trellis/.version`
+- [ ] Add an upgrade regression using an older pristine template fixture, then
+  assert the installed file reaches the current packaged shape
+- [ ] Update the backend spec that owns the runtime contract
+
+**Real-world example**: Codex inline mode changed workflow platform markers from
+`[Codex]` / `[Kilo, Antigravity, Windsurf]` to `[codex-sub-agent]` /
+`[codex-inline, Kilo, Antigravity, Windsurf]`. Fresh init was correct, but
+`trellis update` only merged `[workflow-state:*]` blocks and preserved stale
+markers outside those blocks. Result: upgraded projects got new hook scripts
+but old workflow routing, so `get_context.py --mode phase --platform codex`
+could return empty Phase 2.1 detail.
+
+---
+
+## Mode-Detection Probe Checklist
+
+When a CLI auto-detects a mode by probing a remote resource (e.g., checking if `index.json` exists to decide marketplace vs direct download):
+
+### Before implementing:
+- [ ] Probe runs in **ALL** code paths that use the result (interactive, `-y`, `--flag` combos)
+- [ ] 404 vs transient error are distinguished — don't treat both as "not found"
+- [ ] Transient errors **abort or retry**, never silently switch modes
+- [ ] Shared state (caches, prefetched data) is **reset** when context changes (e.g., user switches source)
+- [ ] **Shortcut paths** (e.g., `--template` skipping picker) must have the same error-handling quality as the probed path — check that downstream functions don't call catch-all wrappers
+
+### After implementing:
+- [ ] Trace every path from probe result to the mode-decision branch — no fallthrough
+- [ ] External format contracts (giget URI, raw URLs) are tested or at least documented as comments
+- [ ] Metadata reads consume a complete response or use a streaming parser — never parse a fixed-size prefix as full JSON
+- [ ] When reconstructing a composite identifier from parsed parts, verify **all** fields are included and in the **correct position** (e.g., `provider:repo/path#ref` not `provider:repo#ref/path`)
+- [ ] Verify that **action functions** called after a shortcut don't internally use the old catch-all fetch — they must use the probe-quality variant when error distinction matters
+
+**Real-world example**: Custom registry flow had 8 bugs across 3 review rounds: (1) probe only ran in interactive mode, (2) transient errors fell through to wrong mode, (3) giget URI had `#ref` in wrong position, (4) prefetched templates leaked across source switches, (5) `--template` shortcut bypassed probe but `downloadTemplateById` internally used catch-all `fetchTemplateIndex`, turning timeouts into "Template not found".
+
+**Real-world example**: Agent-session update hints fetched npm `latest` metadata with `response.read(4096)` and then parsed it as complete JSON. The `@mindfoldhq/trellis` package metadata exceeded 4 KB, so the JSON was truncated, parse failed silently, and the first session injection showed no update hint. Fix: read the complete response before parsing, and add a regression where `version` is followed by an 8 KB metadata tail.
+
+---
+
+## When to Create Flow Documentation
+
+Create detailed flow docs when:
+- Feature spans 3+ layers
+- Multiple teams are involved
+- Data format is complex
+- Feature has caused bugs before