Files
AohDrllTools/.trellis/spec/guides/cross-layer-thinking-guide.md

6.5 KiB

Cross-Layer Thinking Guide

Purpose: Think through data flow across layers before implementing.


The Problem

Most bugs happen at layer boundaries, not within layers.

Common cross-layer bugs:

  • API returns format A, frontend expects format B
  • Database stores X, service transforms to Y, but loses data
  • Multiple layers implement the same logic differently

Before Implementing Cross-Layer Features

Step 1: Map the Data Flow

Draw out how data moves:

Source → Transform → Store → Retrieve → Transform → Display

For each arrow, ask:

  • What format is the data in?
  • What could go wrong?
  • Who is responsible for validation?

Step 2: Identify Boundaries

Boundary Common Issues
API ↔ Service Type mismatches, missing fields
Service ↔ Database Format conversions, null handling
Backend ↔ Frontend Serialization, date formats
Component ↔ Component Props shape changes

Step 3: Define Contracts

For each boundary:

  • What is the exact input format?
  • What is the exact output format?
  • What errors can occur?

Common Cross-Layer Mistakes

Mistake 1: Implicit Format Assumptions

Bad: Assuming date format without checking

Good: Explicit format conversion at boundaries

Mistake 2: Scattered Validation

Bad: Validating the same thing in multiple layers

Good: Validate once at the entry point

Mistake 3: Leaky Abstractions

Bad: Component knows about database schema

Good: Each layer only knows its neighbors


Checklist for Cross-Layer Features

Before implementation:

  • Mapped the complete data flow
  • Identified all layer boundaries
  • Defined format at each boundary
  • Decided where validation happens

After implementation:

  • Tested with edge cases (null, empty, invalid)
  • Verified error handling at each boundary
  • Checked data survives round-trip

Cross-Platform Template Consistency

In Trellis, command templates (e.g., record-session.md) exist in multiple platforms with identical or near-identical content. This is a cross-layer boundary.

Checklist: After Modifying Any Command Template

  • Find all platforms with the same command: find src/templates/*/commands/trellis/ -name "<command>.*"
  • Update all platform copies (Markdown .md and TOML .toml)
  • For Gemini TOML: adapt line continuations (\\ vs \) and triple-quoted strings
  • Run /trellis:check-cross-layer to verify nothing was missed

Real-world example: Updated record-session.md in Claude to use --mode record, but forgot iFlow, Kilo, OpenCode, and Gemini — caught by cross-layer check.


Generated Runtime Template Upgrade Consistency

Some generated files are both documentation and runtime input. In Trellis, .trellis/workflow.md is parsed by get_context.py, workflow_phase.py, SessionStart filters, and per-turn hooks. Template changes must be validated against both fresh init and upgrade paths.

Checklist: After Modifying A Runtime-Parsed Template

  • Identify every runtime parser that reads the template, not just the file writer that installs it
  • Check whether relevant syntax lives outside obvious managed regions such as tag blocks
  • Verify fresh init output and a versioned update scenario that writes the older .trellis/.version
  • Add an upgrade regression using an older pristine template fixture, then assert the installed file reaches the current packaged shape
  • Update the backend spec that owns the runtime contract

Real-world example: Codex inline mode changed workflow platform markers from [Codex] / [Kilo, Antigravity, Windsurf] to [codex-sub-agent] / [codex-inline, Kilo, Antigravity, Windsurf]. Fresh init was correct, but trellis update only merged [workflow-state:*] blocks and preserved stale markers outside those blocks. Result: upgraded projects got new hook scripts but old workflow routing, so get_context.py --mode phase --platform codex could return empty Phase 2.1 detail.


Mode-Detection Probe Checklist

When a CLI auto-detects a mode by probing a remote resource (e.g., checking if index.json exists to decide marketplace vs direct download):

Before implementing:

  • Probe runs in ALL code paths that use the result (interactive, -y, --flag combos)
  • 404 vs transient error are distinguished — don't treat both as "not found"
  • Transient errors abort or retry, never silently switch modes
  • Shared state (caches, prefetched data) is reset when context changes (e.g., user switches source)
  • Shortcut paths (e.g., --template skipping picker) must have the same error-handling quality as the probed path — check that downstream functions don't call catch-all wrappers

After implementing:

  • Trace every path from probe result to the mode-decision branch — no fallthrough
  • External format contracts (giget URI, raw URLs) are tested or at least documented as comments
  • Metadata reads consume a complete response or use a streaming parser — never parse a fixed-size prefix as full JSON
  • When reconstructing a composite identifier from parsed parts, verify all fields are included and in the correct position (e.g., provider:repo/path#ref not provider:repo#ref/path)
  • Verify that action functions called after a shortcut don't internally use the old catch-all fetch — they must use the probe-quality variant when error distinction matters

Real-world example: Custom registry flow had 8 bugs across 3 review rounds: (1) probe only ran in interactive mode, (2) transient errors fell through to wrong mode, (3) giget URI had #ref in wrong position, (4) prefetched templates leaked across source switches, (5) --template shortcut bypassed probe but downloadTemplateById internally used catch-all fetchTemplateIndex, turning timeouts into "Template not found".

Real-world example: Agent-session update hints fetched npm latest metadata with response.read(4096) and then parsed it as complete JSON. The @mindfoldhq/trellis package metadata exceeded 4 KB, so the JSON was truncated, parse failed silently, and the first session injection showed no update hint. Fix: read the complete response before parsing, and add a regression where version is followed by an 8 KB metadata tail.


When to Create Flow Documentation

Create detailed flow docs when:

  • Feature spans 3+ layers
  • Multiple teams are involved
  • Data format is complex
  • Feature has caused bugs before