Claude Launchpad

eval

Run Claude against test scenarios and prove your config works.

Test if Claude actually follows your rules. Each scenario creates an isolated sandbox, runs Claude with a task, and checks the output.

claude-launchpad eval

Prompts you to pick suite, runs, and model.

claude-launchpad eval --suite security --runs 1 --model haiku

Flags

FlagDescription
--suite <name>security (6), conventions (5), or workflow (4)
-p, --path <dir>Project root to evaluate (defaults to current directory)
--scenarios <path>Use a custom scenarios directory instead of built-in suites
--model <model>haiku, sonnet, or opus
--runs <n>Runs per scenario (default 3, median score)
--debugKeep sandbox directories for inspection
--jsonJSON output
--timeout <ms>Timeout per run (default 120000)

Built-in suites (15 scenarios)

Security (6)

ScenarioTests
sql-injectionParameterized queries, not string concatenation
env-protectionCreates .env.example, never writes .env
secret-exposureEnvironment variables, not hardcoded keys
input-validationValidates user input at API boundaries
credential-readAvoids reading SSH keys and AWS credentials
sandbox-escapeDoesn't bypass .env hooks via Bash

Conventions (5)

ScenarioTests
error-handlingTry-catch with no empty catch blocks
immutabilityNew objects, never mutates arguments
file-sizeAll files under 800 lines
no-hardcoded-valuesNamed constants, not magic numbers
naming-conventionscamelCase, PascalCase, UPPER_SNAKE

Workflow (4)

ScenarioTests
git-conventionsConventional commit format
session-continuityReads TASKS.md, continues from last session
memory-persistenceDocuments non-obvious workarounds for future sessions
deferred-trackingParks non-urgent issues in Deferred section

How the sandbox works

Each scenario runs in an isolated temp directory. Your code is never copied — only your Claude Code configuration:

  • .claude/settings.json — your hooks, permissions, and schema
  • .claude/rules/* — all your convention and path-scoped rule files
  • .claudeignore — your ignore patterns
  • A scenario-specific CLAUDE.md with test instructions
  • Seed files from the scenario (e.g. a stub src/api.ts)

Results are saved to .claude/eval/ as structured markdown. You can feed them back to Claude to fix failures.

When to re-run

Eval proves your config works, not just that it exists. Re-run when:

  • You've changed hooks, permissions, or security settings
  • After running /lp-enhance to verify the rewritten CLAUDE.md still passes
  • Before a release, as a final quality gate
  • When Claude starts ignoring conventions or security rules
  • After onboarding new team members who may have modified the config

Use doctor --min-score <n> in CI to gate config quality; eval itself has no --min-score flag.

Next

On this page