Skip to main content
POST
/
v1
/
phones
/
{phone_id}
/
double-tap
/
select
Double Tap by Description
curl --request POST \
  --url https://api.example.com/v1/phones/{phone_id}/double-tap/select
Double tap an element on screen by describing it in natural language. Uses vision AI to find and double tap the described element.

Request

curl -X POST https://api.tapkit.ai/v1/phones/{phone_id}/double-tap/select \
  -H "X-API-Key: TK_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"selector": "the photo to zoom in"}'

Path Parameters

ParameterTypeDescription
phone_idstringThe phone identifier

Query Parameters

ParameterTypeDefaultDescription
asyncbooleanfalseReturn immediately with job ID

Request Body

{
  "selector": "the photo to zoom in"
}
FieldTypeDescription
selectorstringNatural language description of the element to double tap

Response

Synchronous

{
  "id": "job_abc123",
  "status": "completed",
  "result": {},
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:30:02Z"
}

Asynchronous

{
  "job_id": "job_abc123"
}

Examples

Zoom In on an Image

curl -X POST https://api.tapkit.ai/v1/phones/abc123/double-tap/select \
  -H "X-API-Key: TK_..." \
  -H "Content-Type: application/json" \
  -d '{"selector": "the map to zoom in"}'

SDK Usage

The Python SDK provides this through the double_tap() method with a string argument:
phone.double_tap("the photo to zoom in")
phone.double_tap("the map")
phone.double_tap("the text I want to select")