Skip to content

Conversation

Kylejeong2
Copy link
Contributor

why

The viewport changes to 1288x711 when using advanced stealth, which causes computer use agents to mis-click etc. since they're trained on 1024x768.

what changed

Added a functions to ensure viewport is set to the right size for CUA models.

test plan

tested on evals, watched live sessions to make sure it was resizing correct (if needed) and advanced stealth didn't mess up the viewport

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR addresses a viewport sizing issue that affects Computer Use Assistant (CUA) models when using advanced stealth mode. The core problem is that advanced stealth automatically changes the browser viewport from the standard 1024x768 to 1288x711 to appear more natural, but CUA models from providers like Anthropic are specifically trained on 1024x768 screenshots and rely on this exact viewport size for accurate coordinate-based interactions.

The changes introduce three key modifications to lib/handlers/agentHandler.ts:

  1. New isCUAModel() method (lines 416-426): A utility function that identifies CUA models by checking if the model name contains "claude-3-5-sonnet" or starts with "anthropic.claude-3-5-sonnet". This provides a centralized way to determine when viewport forcing is needed.

  2. Enhanced setupAgentClient() method (lines 123-133): When a CUA model is detected, the method now forces the browser viewport to exactly 1024x768 using this.stagehandPage.page.setViewportSize(). This ensures the model sees the expected viewport size from the start, with proper error handling via try-catch.

  3. Modified updateClientViewport() method (lines 401-414): This method previously synchronized the agent client viewport with the current page viewport. Now it includes logic to preserve the 1024x768 viewport for CUA models regardless of what the actual page viewport might be, while maintaining the original synchronization behavior for other agent types.

The implementation maintains backward compatibility by only affecting CUA models while preserving dynamic viewport behavior for other agents that can adapt to different screen sizes. This targeted approach ensures that CUA models get their required fixed viewport while not disrupting the existing functionality for other use cases.

Confidence score: 4/5

  • This PR addresses a specific, well-understood problem with a targeted solution that should resolve CUA model interaction issues
  • Score reflects solid implementation with proper error handling and backward compatibility, though the model detection logic is fairly simple
  • Pay close attention to the model detection logic in isCUAModel() to ensure it covers all relevant CUA models

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

@miguelg719
Copy link
Collaborator

okay what we need to do here is:

  • on the agentHandler, we get the viewport, height, and devicePixelRatio using page.evaluate()
  • set those before sending to the LLM (these won't change, so we can just override currentViewport)
  • update all coordinate-based actions to multiply by the devicePixelRatio

cc: @Kylejeong2

@miguelg719
Copy link
Collaborator

the question is, do we want this only on advanced stealth?

@Kylejeong2 Kylejeong2 marked this pull request as draft August 26, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants