Tap by Description
Tap by Description
Tap an element using natural language description
POST
Tap by Description
Tap an element on screen by describing it in natural language. Uses vision AI to find and tap the described element.Documentation Index
Fetch the complete documentation index at: https://docs.tapkit.ai/llms.txt
Use this file to discover all available pages before exploring further.
Request
Path Parameters
| Parameter | Type | Description |
|---|---|---|
phone_id | string | The phone identifier |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
async | boolean | false | Return immediately with job ID |
Request Body
| Field | Type | Description |
|---|---|---|
selector | string | Natural language description of the element to tap |
Response
Synchronous
Asynchronous
Examples
Tap a Button
Tap a Text Element
Python Example
SDK Usage
The Python SDK provides this through thetap() method with a string argument:
Tips
- Be specific in your descriptions (“the blue Submit button” vs just “button”)
- Include visual characteristics like color, position, or text content
- Works best with clearly visible, distinct UI elements
Related Endpoints
- Tap - Tap at specific coordinates
- Double Tap - Double tap gesture