-
Notifications
You must be signed in to change notification settings - Fork 1k
simplify #1026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify #1026
Conversation
|
2ad24af
into
add-evaluate-text-to-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR simplifies the Evaluator API by consolidating the evaluate
method into a single ask
method that handles both screenshot-based and text-based evaluation modes. The key changes include:
API Unification: The discriminated union pattern in EvaluateOptions
has been replaced with a simpler structure containing optional screenshot
and answer
parameters. This allows for more flexible evaluation where users can provide a question with either visual context (screenshot), textual context (answer), or both.
Performance Optimization: The default screenshot delay has been reduced from 1000ms to 250ms across both the type definitions and evaluator implementation, speeding up evaluation cycles.
Enhanced Security: The AI SDK client now redacts image URLs in log messages by replacing them with [IMAGE_REDACTED]
placeholders to prevent sensitive visual data from being exposed in logs. This affects initial chat completion logs, cache logs, and response logs.
Simplified Logging: Object response logging has been streamlined to only include essential fields (object
, usage
, finishReason
) and the log level has been changed from 2 to 1 for better visibility.
The refactoring maintains the same core functionality while making the API more intuitive and secure. However, BatchEvaluateOptions
still retains the original discriminated union pattern, creating some API inconsistency that suggests this may be part of an ongoing refactoring effort.
Confidence score: 1/5
- This PR has a critical runtime error that will cause immediate failures in production
- Score reflects a missing
silentLogger
initialization in the Evaluator constructor that will throw undefined reference errors - Pay close attention to
evals/evaluator.ts
- thesilentLogger
property is declared but never initialized, causing runtime crashes
Context used:
Context - We enforce linting and prettier at the CI level, so no code style comments that aren't obvious. (link)
3 files reviewed, 2 comments
export type EvaluateOptions = { | ||
/** The question to ask about the task state */ | ||
question: string; | ||
/** The answer to the question */ | ||
answer?: string; | ||
/** Whether to take a screenshot of the task state */ | ||
screenshot?: boolean; | ||
/** Custom system prompt for the evaluator */ | ||
systemPrompt?: string; | ||
/** Delay in milliseconds before taking the screenshot @default 250 */ | ||
screenshotDelayMs?: number; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: API design inconsistency: EvaluateOptions uses a simple object while BatchEvaluateOptions still uses discriminated unions. Consider applying the same simplification pattern to both for consistency.
let imageBuffer: Buffer; | ||
if (screenshot) { | ||
imageBuffer = await this.stagehand.page.screenshot(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: imageBuffer
is declared but only assigned when screenshot
is true, yet used unconditionally on line 94. Add initialization or move declaration inside the conditional block.
why
what changed
test plan
missing: