Skip to main content
POST
/
phones
/
{phone_id}
/
tap
/
select
Tap by Description
curl --request POST \
  --url https://api.example.com/phones/{phone_id}/tap/select
Tap an element on screen by describing it in natural language. Uses vision AI to find and tap the described element.

Request

curl -X POST https://api.tapkit.ai/phones/{phone_id}/tap/select \
  -H "X-API-Key: TK_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"selector": "the blue Submit button"}'

Path Parameters

ParameterTypeDescription
phone_idstringThe phone identifier

Query Parameters

ParameterTypeDefaultDescription
asyncbooleanfalseReturn immediately with job ID

Request Body

{
  "selector": "the blue Submit button"
}
FieldTypeDescription
selectorstringNatural language description of the element to tap

Response

Synchronous

{
  "id": "job_abc123",
  "status": "completed",
  "result": {},
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:30:02Z"
}

Asynchronous

{
  "job_id": "job_abc123"
}

Examples

Tap a Button

curl -X POST https://api.tapkit.ai/phones/abc123/tap/select \
  -H "X-API-Key: TK_..." \
  -H "Content-Type: application/json" \
  -d '{"selector": "the Settings icon"}'

Tap a Text Element

curl -X POST https://api.tapkit.ai/phones/abc123/tap/select \
  -H "X-API-Key: TK_..." \
  -H "Content-Type: application/json" \
  -d '{"selector": "the Sign In link"}'

Python Example

import requests

response = requests.post(
    f"https://api.tapkit.ai/phones/{phone_id}/tap/select",
    headers={
        "X-API-Key": "TK_...",
        "Content-Type": "application/json"
    },
    json={"selector": "the blue Submit button"}
)

job = response.json()
print(f"Tap completed: {job['status']}")

SDK Usage

The Python SDK provides this through the tap() method with a string argument:
phone.tap("the blue Submit button")
phone.tap("the Settings icon")
phone.tap("Sign In link at the bottom")

Tips

  • Be specific in your descriptions (“the blue Submit button” vs just “button”)
  • Include visual characteristics like color, position, or text content
  • Works best with clearly visible, distinct UI elements
  • Tap - Tap at specific coordinates
  • Double Tap - Double tap gesture