AI Evaluator
How the AI Evaluator audits draft prompts, scores them, and surfaces ranked suggestions.
The AI Evaluator audits draft prompts and surfaces issues that need attention. The sidebar badge is the count of unresolved non-pass results across the workspace.
What the Auditor Checks
The auditor classifies each prompt and scores it across dimensions such as:
| Dimension | What it measures |
|---|---|
| Role identity | Whether the agent's role and expertise are clear. |
| Task definition | Whether the primary objective is specific and singular. |
| Context sufficiency | Whether the prompt gives enough background to act. |
| Input specification | Whether inputs are explicitly described. |
| Output format | Whether output structure is precise. |
| Constraints and guardrails | Whether scope limits and refusal cases are clear. |
| Examples | Whether representative and edge cases are covered. |
| Edge cases | Whether ambiguity and out-of-scope inputs are handled. |
| Internal consistency | Whether instructions conflict. |
| Token efficiency | Whether unnecessary padding is avoided. |
| Variable hygiene | Whether placeholders are consistent and documented. |
Some dimensions may not apply to every prompt category.
Pass vs. Report
The evaluator returns:
- Pass when no material issues are found.
- Report when the prompt needs attention.
Reported issues are ranked by severity:
| Severity | Meaning |
|---|---|
BLOCKER | The prompt cannot reliably do its job. |
MAJOR | Degrades quality on common inputs. |
MINOR | Degrades quality on edge cases. |
NIT | Style or polish only. |
Running an Audit
Click Run audit on the AI Evaluator screen. PromptVault reviews current draft prompts, reuses prior results when possible, and shows the latest suggestions in the dashboard.
Suggestions List
Each non-pass suggestion shows:
| Element | Meaning |
|---|---|
| Prompt slug + folder | Which prompt needs attention. |
| Severity badge | BLOCKER, MAJOR, MINOR, NIT, or REFUSED. |
| Score | Overall quality score. |
| Issue summary | One-sentence description of the primary problem. |
| Proposed fix | Suggested edit. |
| Diff | Side-by-side comparison between current and proposed prompt. |
| Strengths | What the prompt already does well. |
| Open questions | Things the evaluator could not infer. |
Resolving a Suggestion
| Action | What it does |
|---|---|
| Apply | Copies the proposed source into the editor for review. |
| Mark resolved | Removes it from the active list. |
| Dismiss | Hides it for the current session. |
The next audit can recreate the suggestion if the underlying issue still exists.
Refusals
The evaluator may refuse to improve prompts designed to enable catastrophic harm. A refusal appears as a REFUSED suggestion with the reason, and the prompt itself is not modified.