Skip to main content
TapKit integrates with CUA, enabling any model supported by CUA’s ComputerAgent — OpenAI, Anthropic, UI-TARS, and others — to control a real iPhone through TapKit. The integration exposes a single class:
  • TapKitComputerHandler — Implements CUA’s AsyncComputerHandler interface, handling screenshot capture, resolution scaling for vision models, and mapping desktop actions to phone equivalents

Installation

pip install tapkit cua

Quick Start

import asyncio
from agent import ComputerAgent
from tapkit import TapKitClient
from tapkit.cua import TapKitComputerHandler
from dotenv import load_dotenv

load_dotenv()

async def main():
    phone = TapKitClient().phone("your-phone-id")
    handler = TapKitComputerHandler(phone)

    agent = ComputerAgent(
        model="openai/computer-use-preview",
        tools=[handler],
        only_n_most_recent_images=3,
        verbosity=2,
    )

    task = "Open Safari and search for the weather"
    async for result in agent.run([{"role": "user", "content": task}]):
        pass

asyncio.run(main())

Constructor

TapKitComputerHandler(phone: Phone, max_long_edge: int = 1344)
ParameterTypeDefaultDescription
phonePhonerequiredTapKit phone instance
max_long_edgeint1344Max pixels for the longest edge of screenshots sent to the model. Lower values reduce token usage; higher values improve accuracy on small targets.
Screenshots are automatically scaled down to this resolution before being sent to the model. Coordinates returned by the model are scaled back up to actual phone pixels before executing touch actions.

Supported Actions

CUA ActionPhone Behavior
click(left)Tap at coordinates
click(right)Long press (1000ms)
double_click()Double tap
scroll()Flick in direction
drag(path)Drag from first to last point in path
type()Type text via shortcut method
keypress("escape")Escape / go back
keypress("home")Go to home screen
keypress("alt+tab")Open app switcher
screenshot()Capture scaled screenshot
wait(ms)Pause for given duration
move()No-op (no cursor on phone)

How It Differs from Lux

Both are adapters that translate an agent framework’s actions into TapKit phone calls:
CUALux (OAGI)
InterfaceMethod-based (click(), scroll(), etc.)Action-list (list[Action] objects)
Coordinate spacePixel coordinates (scaled)0–1000 normalized space
Screenshot handlingBuilt into the handlerSeparate TapKitAsyncImageProvider
ClassesTapKitComputerHandlerTapKitAsyncActionHandler + TapKitAsyncImageProvider
Model supportAny CUA-compatible modelLux models via OAGI SDK