Documentation Index
Fetch the complete documentation index at: https://docs.tapkit.ai/llms.txt
Use this file to discover all available pages before exploring further.
TapKit integrates with CUA, enabling any model supported by CUA’s ComputerAgent — OpenAI, Anthropic, UI-TARS, and others — to control a real iPhone through TapKit.
The integration exposes a single class:
- TapKitComputerHandler — Implements CUA’s
AsyncComputerHandler interface, handling screenshot capture, resolution scaling for vision models, and mapping desktop actions to phone equivalents
Installation
Quick Start
import asyncio
from agent import ComputerAgent
from tapkit import TapKitClient
from tapkit.cua import TapKitComputerHandler
from dotenv import load_dotenv
load_dotenv()
async def main():
phone = TapKitClient().phone("your-phone-id")
handler = TapKitComputerHandler(phone)
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[handler],
only_n_most_recent_images=3,
verbosity=2,
)
task = "Open Safari and search for the weather"
async for result in agent.run([{"role": "user", "content": task}]):
pass
asyncio.run(main())
Constructor
TapKitComputerHandler(phone: Phone, max_long_edge: int = 1344)
| Parameter | Type | Default | Description |
|---|
phone | Phone | required | TapKit phone instance |
max_long_edge | int | 1344 | Max pixels for the longest edge of screenshots sent to the model. Lower values reduce token usage; higher values improve accuracy on small targets. |
Screenshots are automatically scaled down to this resolution before being sent to the model. Coordinates returned by the model are scaled back up to actual phone pixels before executing touch actions.
Supported Actions
| CUA Action | Phone Behavior |
|---|
click(left) | Tap at coordinates |
click(right) | Long press (1000ms) |
double_click() | Double tap |
scroll() | Flick in direction |
drag(path) | Drag from first to last point in path |
type() | Type text via shortcut method |
keypress("escape") | Escape / go back |
keypress("home") | Go to home screen |
keypress("alt+tab") | Open app switcher |
screenshot() | Capture scaled screenshot |
wait(ms) | Pause for given duration |
move() | No-op (no cursor on phone) |
How It Differs from Lux
Both are adapters that translate an agent framework’s actions into TapKit phone calls:
| CUA | Lux (OAGI) |
|---|
| Interface | Method-based (click(), scroll(), etc.) | Action-list (list[Action] objects) |
| Coordinate space | Pixel coordinates (scaled) | 0–1000 normalized space |
| Screenshot handling | Built into the handler | Separate TapKitAsyncImageProvider |
| Classes | TapKitComputerHandler | TapKitAsyncActionHandler + TapKitAsyncImageProvider |
| Model support | Any CUA-compatible model | Lux models via OAGI SDK |