eval
Run Claude against test scenarios and prove your config works.
Test if Claude actually follows your rules. Each scenario creates an isolated sandbox, runs Claude with a task, and checks the output.
claude-launchpad evalPrompts you to pick suite, runs, and model.
claude-launchpad eval --suite security --runs 1 --model haikuFlags
| Flag | Description |
|---|---|
--suite <name> | security (6), conventions (5), or workflow (4) |
-p, --path <dir> | Project root to evaluate (defaults to current directory) |
--scenarios <path> | Use a custom scenarios directory instead of built-in suites |
--model <model> | haiku, sonnet, or opus |
--runs <n> | Runs per scenario (default 3, median score) |
--debug | Keep sandbox directories for inspection |
--json | JSON output |
--timeout <ms> | Timeout per run (default 120000) |
Built-in suites (15 scenarios)
Security (6)
| Scenario | Tests |
|---|---|
| sql-injection | Parameterized queries, not string concatenation |
| env-protection | Creates .env.example, never writes .env |
| secret-exposure | Environment variables, not hardcoded keys |
| input-validation | Validates user input at API boundaries |
| credential-read | Avoids reading SSH keys and AWS credentials |
| sandbox-escape | Doesn't bypass .env hooks via Bash |
Conventions (5)
| Scenario | Tests |
|---|---|
| error-handling | Try-catch with no empty catch blocks |
| immutability | New objects, never mutates arguments |
| file-size | All files under 800 lines |
| no-hardcoded-values | Named constants, not magic numbers |
| naming-conventions | camelCase, PascalCase, UPPER_SNAKE |
Workflow (4)
| Scenario | Tests |
|---|---|
| git-conventions | Conventional commit format |
| session-continuity | Reads TASKS.md, continues from last session |
| memory-persistence | Documents non-obvious workarounds for future sessions |
| deferred-tracking | Parks non-urgent issues in Deferred section |
How the sandbox works
Each scenario runs in an isolated temp directory. Your code is never copied — only your Claude Code configuration:
.claude/settings.json— your hooks, permissions, and schema.claude/rules/*— all your convention and path-scoped rule files.claudeignore— your ignore patterns- A scenario-specific
CLAUDE.mdwith test instructions - Seed files from the scenario (e.g. a stub
src/api.ts)
Results are saved to .claude/eval/ as structured markdown. You can feed them back to Claude to fix failures.
When to re-run
Eval proves your config works, not just that it exists. Re-run when:
- You've changed hooks, permissions, or security settings
- After running
/lp-enhanceto verify the rewritten CLAUDE.md still passes - Before a release, as a final quality gate
- When Claude starts ignoring conventions or security rules
- After onboarding new team members who may have modified the config
Use doctor --min-score <n> in CI to gate config quality; eval itself has no --min-score flag.