ComputerAgent — OpenAI, Anthropic, UI-TARS, and others — to control a real iPhone through TapKit.
The integration exposes a single class:
- TapKitComputerHandler — Implements CUA’s
AsyncComputerHandlerinterface, handling screenshot capture, resolution scaling for vision models, and mapping desktop actions to phone equivalents
Installation
Quick Start
Constructor
| Parameter | Type | Default | Description |
|---|---|---|---|
phone | Phone | required | TapKit phone instance |
max_long_edge | int | 1344 | Max pixels for the longest edge of screenshots sent to the model. Lower values reduce token usage; higher values improve accuracy on small targets. |
Supported Actions
| CUA Action | Phone Behavior |
|---|---|
click(left) | Tap at coordinates |
click(right) | Long press (1000ms) |
double_click() | Double tap |
scroll() | Flick in direction |
drag(path) | Drag from first to last point in path |
type() | Type text via shortcut method |
keypress("escape") | Escape / go back |
keypress("home") | Go to home screen |
keypress("alt+tab") | Open app switcher |
screenshot() | Capture scaled screenshot |
wait(ms) | Pause for given duration |
move() | No-op (no cursor on phone) |
How It Differs from Lux
Both are adapters that translate an agent framework’s actions into TapKit phone calls:| CUA | Lux (OAGI) | |
|---|---|---|
| Interface | Method-based (click(), scroll(), etc.) | Action-list (list[Action] objects) |
| Coordinate space | Pixel coordinates (scaled) | 0–1000 normalized space |
| Screenshot handling | Built into the handler | Separate TapKitAsyncImageProvider |
| Classes | TapKitComputerHandler | TapKitAsyncActionHandler + TapKitAsyncImageProvider |
| Model support | Any CUA-compatible model | Lux models via OAGI SDK |