eval
Run Claude against test scenarios and prove your config works.
A doctor score of 95% means your config is solid. An eval FAIL means Claude ignores those rules anyway. Eval closes the gap between configuration and behavior by running Claude against test scenarios in isolated sandboxes.
claude-launchpad evalPrompts you to pick suite, runs, and model.
claude-launchpad eval --suite security --runs 1 --model haikuFlags
| Flag | Description |
|---|---|
--suite <name> | security (6), conventions (5), or workflow (4) |
-p, --path <dir> | Project root to evaluate (defaults to current directory) |
--scenarios <path> | Use a custom scenarios directory instead of built-in suites |
--model <model> | haiku, sonnet, or opus |
--runs <n> | Runs per scenario (default 3, median score) |
--debug | Keep sandbox directories for inspection |
--json | JSON output |
--timeout <ms> | Timeout per run (default 120000) |
Built-in suites (15 scenarios)
Security (6)
| Scenario | Tests |
|---|---|
| sql-injection | Parameterized queries, not string concatenation |
| env-protection | Creates .env.example, never writes .env |
| secret-exposure | Environment variables, not hardcoded keys |
| input-validation | Validates user input at API boundaries |
| credential-read | Avoids reading SSH keys and AWS credentials |
| env-exfil-bash | Doesn't bypass .env hooks via Bash |
Conventions (5)
| Scenario | Tests |
|---|---|
| error-handling | Try-catch with no empty catch blocks |
| immutability | New objects, never mutates arguments |
| file-size | All files under 800 lines |
| no-hardcoded-values | Named constants, not magic numbers |
| naming-conventions | camelCase, PascalCase, UPPER_SNAKE |
Workflow (4)
| Scenario | Tests |
|---|---|
| git-conventions | Conventional commit format |
| session-continuity | Claude reads TASKS.md at startup and continues from your last session instead of starting fresh |
| memory-persistence | Claude documents non-obvious workarounds so future sessions don't repeat mistakes |
| deferred-tracking | Claude parks non-urgent issues instead of cluttering the current sprint |
How the sandbox works
Each scenario runs in an isolated temp directory. Your code is never copied. Only your Claude Code configuration:
.claude/settings.json: your hooks, permissions, and schema.claude/rules/*: all your convention and path-scoped rule files.claudeignore: your ignore patterns- A scenario-specific
CLAUDE.mdwith test instructions - Seed files from the scenario (e.g. a stub
src/api.ts)
Results are saved to .claude/eval/ as structured markdown. You can feed them back to Claude to fix failures.
When to re-run
Eval proves your config works, not just that it exists. Re-run when:
- You've changed hooks, permissions, or security settings
- After running
/lp-enhanceto verify the rewritten CLAUDE.md still passes - Before a release, as a final quality gate
- When Claude starts ignoring conventions or security rules
- After onboarding new team members who may have modified the config
Use doctor --min-score <n> in CI to gate config quality; eval itself has no --min-score flag.