AI Evaluator

How the AI Evaluator audits draft prompts, scores them, and surfaces ranked suggestions.

The AI Evaluator audits draft prompts and surfaces issues that need attention. The sidebar badge is the count of unresolved non-pass results across the workspace.

What the Auditor Checks

The auditor classifies each prompt and scores it across dimensions such as:

Dimension	What it measures
Role identity	Whether the agent's role and expertise are clear.
Task definition	Whether the primary objective is specific and singular.
Context sufficiency	Whether the prompt gives enough background to act.
Input specification	Whether inputs are explicitly described.
Output format	Whether output structure is precise.
Constraints and guardrails	Whether scope limits and refusal cases are clear.
Examples	Whether representative and edge cases are covered.
Edge cases	Whether ambiguity and out-of-scope inputs are handled.
Internal consistency	Whether instructions conflict.
Token efficiency	Whether unnecessary padding is avoided.
Variable hygiene	Whether placeholders are consistent and documented.

Some dimensions may not apply to every prompt category.

Pass vs. Report

The evaluator returns:

Pass when no material issues are found.
Report when the prompt needs attention.

Reported issues are ranked by severity:

Severity	Meaning
`BLOCKER`	The prompt cannot reliably do its job.
`MAJOR`	Degrades quality on common inputs.
`MINOR`	Degrades quality on edge cases.
`NIT`	Style or polish only.

Running an Audit

Click Run audit on the AI Evaluator screen. PromptVault reviews current draft prompts, reuses prior results when possible, and shows the latest suggestions in the dashboard.

Suggestions List

Each non-pass suggestion shows:

Element	Meaning
Prompt slug + folder	Which prompt needs attention.
Severity badge	`BLOCKER`, `MAJOR`, `MINOR`, `NIT`, or `REFUSED`.
Score	Overall quality score.
Issue summary	One-sentence description of the primary problem.
Proposed fix	Suggested edit.
Diff	Side-by-side comparison between current and proposed prompt.
Strengths	What the prompt already does well.
Open questions	Things the evaluator could not infer.

Resolving a Suggestion

Action	What it does
Apply	Copies the proposed source into the editor for review.
Mark resolved	Removes it from the active list.
Dismiss	Hides it for the current session.

The next audit can recreate the suggestion if the underlying issue still exists.

Refusals

The evaluator may refuse to improve prompts designed to enable catastrophic harm. A refusal appears as a REFUSED suggestion with the reason, and the prompt itself is not modified.