Skip to main content
TapKit supports various tap gestures for interacting with your iPhone’s touchscreen.

Single Tap

The most basic gesture - tap at specific coordinates:
# Tap at absolute coordinates
phone.tap((100, 200))

# Tap center of screen
phone.tap(phone.screen.center)

# Tap using a Point object
from tapkit.geometry import Point
phone.tap(Point(100, 200))

Using with Bounding Boxes

When working with UI element detection, tap the center of a bounding box:
from tapkit.geometry import BBox

# Bounding box from a vision model
button = BBox(x1=100, y1=200, x2=300, y2=250)

# Tap the center
phone.tap(button.center)

Double Tap

Double tap for actions like zooming in:
phone.double_tap((200, 400))

# Double tap center to zoom
phone.double_tap(phone.screen.center)

Tap and Hold (Long Press)

Press and hold at a location. Useful for context menus or drag preparation:
# Hold for 1 second (default)
phone.tap_and_hold((200, 400))

# Hold for 2 seconds
phone.tap_and_hold((200, 400), duration_ms=2000)

# Hold for 500ms
phone.tap_and_hold((200, 400), duration_ms=500)
ParameterTypeDefaultDescription
pointtuple/PointrequiredCoordinates to tap
duration_msint1000Hold duration in milliseconds

Coordinate Systems

TapKit uses absolute pixel coordinates matching the device’s screen resolution:
phone = client.get_phone()

# Screen dimensions
print(f"Width: {phone.width}")   # e.g., 1170
print(f"Height: {phone.height}") # e.g., 2532

# Top-left corner
phone.tap((0, 0))

# Bottom-right corner
phone.tap((phone.width - 1, phone.height - 1))

Working with Normalized Coordinates

If your vision model returns normalized (0-1) coordinates:
from tapkit.geometry import NormalizedPoint

# From a model that returns 0-1 coordinates
norm_point = NormalizedPoint(x=0.5, y=0.3)

# Convert to absolute
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)
Some models use 0-1000 scale:
# From a model using 0-1000 scale
norm_point = NormalizedPoint.from_1000_scale(500, 300)
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)
See Coordinates for more on coordinate conversion.

Return Value

All tap methods return a Job object:
job = phone.tap((100, 200))

print(f"Job ID: {job.id}")
print(f"Status: {job.status}")

Examples

Tap a Button

# Assume button_coords from UI detection
button_x, button_y = 200, 500
phone.tap((button_x, button_y))

Dismiss a Popup

# Tap outside a modal to dismiss
phone.tap((phone.width // 2, phone.height - 100))

Long Press for Context Menu

# Long press on an item
phone.tap_and_hold((200, 400), duration_ms=800)

Zoom In with Double Tap

# Double tap to zoom into content
phone.double_tap(phone.screen.center)

Next Steps