Gesture methods accept either coordinates (x, y) or natural language descriptions. When you pass a string, TapKit uses vision AI to find and interact with the described element.
Taps
# Coordinates
phone.tap((100, 200))
phone.double_tap((100, 200))
phone.hold((100, 200), duration_ms=1000)
# Natural language (vision AI)
phone.tap("the blue Submit button")
phone.double_tap("the photo to zoom in")
phone.hold("the app icon to delete")
| Method | Parameters | Description |
|---|
tap(target) | coordinates or description | Single tap |
double_tap(target) | coordinates or description | Double tap |
hold(target, duration_ms=1000) | coordinates or description, hold time | Long press |
Swipes
# Coordinates
phone.flick((500, 1000), direction="up")
phone.pan((500, 1000), direction="up", duration_ms=500)
# Natural language (vision AI) - flick only
phone.flick("the message list", "down")
phone.flick("the photo gallery", "left")
| Method | Parameters | Selector Support | Description |
|---|
flick(target, direction) | coordinates or description, direction | Yes | Quick swipe gesture |
pan(point, direction, duration_ms=500) | coordinates only | No | Controlled scroll |
Directions: "up", "down", "left", "right"
Drag
# Coordinates
phone.drag((100, 200), (300, 400))
phone.hold_and_drag((100, 200), (300, 400), hold_duration_ms=500)
# Natural language (vision AI) - drag only
phone.drag("the slider handle", "the right end of the slider")
phone.drag("the file icon", "the trash folder")
| Method | Parameters | Selector Support | Description |
|---|
drag(from_target, to_target) | both coordinates or both descriptions | Yes | Drag between points |
hold_and_drag(from_point, to_point, hold_duration_ms=500) | coordinates only | No | Hold before dragging |
For drag() with selectors, both arguments must be strings. You cannot mix coordinates and descriptions.
Pinch
# Coordinates
phone.pinch((500, 500), action="pinch_out")
# Natural language (vision AI)
phone.pinch("the map", "pinch_out")
phone.pinch("the photo", "pinch_in")
| Method | Parameters | Description |
|---|
pinch(target, action) | coordinates or description, action | Pinch/zoom/rotate gesture |
| Action | Effect |
|---|
pinch_out | Zoom in (fingers apart) |
pinch_in | Zoom out (fingers together) |
rotate_cw | Rotate clockwise |
rotate_ccw | Rotate counter-clockwise |
Tips for Selectors
- Be specific:
"the blue Submit button" works better than "button"
- Include visual details: color, position, text content
- Works best with clearly visible, distinct UI elements