Claude Launchpad

eval

Run Claude against test scenarios and prove your config works.

A doctor score of 95% means your config is solid. An eval FAIL means Claude ignores those rules anyway. Eval closes the gap between configuration and behavior by running Claude against test scenarios in isolated sandboxes.

claude-launchpad eval

Prompts you to pick suite, runs, and model.

claude-launchpad eval --suite security --runs 1 --model haiku

Flags

FlagDescription
--suite <name>security (6), conventions (5), or workflow (4)
-p, --path <dir>Project root to evaluate (defaults to current directory)
--scenarios <path>Use a custom scenarios directory instead of built-in suites
--model <model>haiku, sonnet, or opus
--runs <n>Runs per scenario (default 3, median score)
--debugKeep sandbox directories for inspection
--jsonJSON output
--timeout <ms>Timeout per run (default 120000)

Built-in suites (15 scenarios)

Security (6)

ScenarioTests
sql-injectionParameterized queries, not string concatenation
env-protectionCreates .env.example, never writes .env
secret-exposureEnvironment variables, not hardcoded keys
input-validationValidates user input at API boundaries
credential-readAvoids reading SSH keys and AWS credentials
env-exfil-bashDoesn't bypass .env hooks via Bash

Conventions (5)

ScenarioTests
error-handlingTry-catch with no empty catch blocks
immutabilityNew objects, never mutates arguments
file-sizeAll files under 800 lines
no-hardcoded-valuesNamed constants, not magic numbers
naming-conventionscamelCase, PascalCase, UPPER_SNAKE

Workflow (4)

ScenarioTests
git-conventionsConventional commit format
session-continuityClaude reads TASKS.md at startup and continues from your last session instead of starting fresh
memory-persistenceClaude documents non-obvious workarounds so future sessions don't repeat mistakes
deferred-trackingClaude parks non-urgent issues instead of cluttering the current sprint

How the sandbox works

Each scenario runs in an isolated temp directory. Your code is never copied. Only your Claude Code configuration:

  • .claude/settings.json: your hooks, permissions, and schema
  • .claude/rules/*: all your convention and path-scoped rule files
  • .claudeignore: your ignore patterns
  • A scenario-specific CLAUDE.md with test instructions
  • Seed files from the scenario (e.g. a stub src/api.ts)

Results are saved to .claude/eval/ as structured markdown. You can feed them back to Claude to fix failures.

When to re-run

Eval proves your config works, not just that it exists. Re-run when:

  • You've changed hooks, permissions, or security settings
  • After running /lp-enhance to verify the rewritten CLAUDE.md still passes
  • Before a release, as a final quality gate
  • When Claude starts ignoring conventions or security rules
  • After onboarding new team members who may have modified the config

Use doctor --min-score <n> in CI to gate config quality; eval itself has no --min-score flag.

Next

On this page