Skip to content

Conversation

miguelg719
Copy link
Collaborator

why

what changed

test plan

missing:

  • batchAsk(?)
  • update evals

Copy link

changeset-bot bot commented Aug 26, 2025

⚠️ No Changeset found

Latest commit: 9889d47

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@tkattkat tkattkat marked this pull request as ready for review August 26, 2025 21:53
@tkattkat tkattkat merged commit 2ad24af into add-evaluate-text-to-evaluator Aug 26, 2025
12 of 14 checks passed
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR simplifies the Evaluator API by consolidating the evaluate method into a single ask method that handles both screenshot-based and text-based evaluation modes. The key changes include:

API Unification: The discriminated union pattern in EvaluateOptions has been replaced with a simpler structure containing optional screenshot and answer parameters. This allows for more flexible evaluation where users can provide a question with either visual context (screenshot), textual context (answer), or both.

Performance Optimization: The default screenshot delay has been reduced from 1000ms to 250ms across both the type definitions and evaluator implementation, speeding up evaluation cycles.

Enhanced Security: The AI SDK client now redacts image URLs in log messages by replacing them with [IMAGE_REDACTED] placeholders to prevent sensitive visual data from being exposed in logs. This affects initial chat completion logs, cache logs, and response logs.

Simplified Logging: Object response logging has been streamlined to only include essential fields (object, usage, finishReason) and the log level has been changed from 2 to 1 for better visibility.

The refactoring maintains the same core functionality while making the API more intuitive and secure. However, BatchEvaluateOptions still retains the original discriminated union pattern, creating some API inconsistency that suggests this may be part of an ongoing refactoring effort.

Confidence score: 1/5

  • This PR has a critical runtime error that will cause immediate failures in production
  • Score reflects a missing silentLogger initialization in the Evaluator constructor that will throw undefined reference errors
  • Pay close attention to evals/evaluator.ts - the silentLogger property is declared but never initialized, causing runtime crashes

Context used:

Context - We enforce linting and prettier at the CI level, so no code style comments that aren't obvious. (link)

3 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +1 to +12
export type EvaluateOptions = {
/** The question to ask about the task state */
question: string;
/** The answer to the question */
answer?: string;
/** Whether to take a screenshot of the task state */
screenshot?: boolean;
/** Custom system prompt for the evaluator */
systemPrompt?: string;
/** Delay in milliseconds before taking the screenshot @default 250 */
screenshotDelayMs?: number;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: API design inconsistency: EvaluateOptions uses a simple object while BatchEvaluateOptions still uses discriminated unions. Consider applying the same simplification pattern to both for consistency.

Comment on lines +69 to +72
let imageBuffer: Buffer;
if (screenshot) {
imageBuffer = await this.stagehand.page.screenshot();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: imageBuffer is declared but only assigned when screenshot is true, yet used unconditionally on line 94. Add initialization or move declaration inside the conditional block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants