Documentation Index Fetch the complete documentation index at: https://docs.tapkit.ai/llms.txt
Use this file to discover all available pages before exploring further.
TapKit can capture high-resolution screenshots from your connected iPhone. Screenshots are returned as PNG image data.
Basic Usage
from tapkit import TapKitClient
client = TapKitClient()
phone = client.get_phone()
# Capture screenshot
screenshot = phone.screenshot()
# Save to file
with open ( "screen.png" , "wb" ) as f:
f.write(screenshot)
Return Value
The screenshot() method returns raw PNG bytes:
screenshot = phone.screenshot()
print ( type (screenshot)) # <class 'bytes'>
print ( len (screenshot)) # Size in bytes
Working with Screenshots
Save to File
screenshot = phone.screenshot()
with open ( "screenshot.png" , "wb" ) as f:
f.write(screenshot)
Display with PIL/Pillow
from PIL import Image
import io
screenshot = phone.screenshot()
image = Image.open(io.BytesIO(screenshot))
# Display
image.show()
# Get dimensions
print ( f "Size: { image.size } " ) # (width, height)
Use with OpenCV
import cv2
import numpy as np
screenshot = phone.screenshot()
nparr = np.frombuffer(screenshot, np.uint8)
image = cv2.imdecode(nparr, cv2. IMREAD_COLOR )
# Process with OpenCV
gray = cv2.cvtColor(image, cv2. COLOR_BGR2GRAY )
Send to Vision API
import base64
import anthropic
screenshot = phone.screenshot()
base64_image = base64.standard_b64encode(screenshot).decode( "utf-8" )
# Use with Claude
client = anthropic.Anthropic()
message = client.messages.create(
model = "claude-sonnet-4-20250514" ,
max_tokens = 1024 ,
messages = [
{
"role" : "user" ,
"content" : [
{
"type" : "image" ,
"source" : {
"type" : "base64" ,
"media_type" : "image/png" ,
"data" : base64_image,
},
},
{
"type" : "text" ,
"text" : "What's on this screen?"
}
],
}
],
)
Working with Vision Model Coordinates
Different vision models and APIs return coordinates in different formats. TapKit’s geometry utilities help you convert between them:
Format Range Common in Absolute pixels 0 to width/heightDirect screen coordinates Normalized 0.0 to 1.0Many vision APIs 0-1000 scale 0 to 1000Lux, some UI detection models Percentage 0 to 100Some bounding box APIs
Converting Model Output
from tapkit.geometry import NormalizedPoint, NormalizedBBox
# Model returns 0-1 normalized coordinates
norm_point = NormalizedPoint( 0.5 , 0.3 )
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)
# Model returns 0-1000 scale coordinates
norm_point = NormalizedPoint.from_1000_scale( 500 , 300 )
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)
# Model returns bounding box in 0-1000 scale
norm_box = NormalizedBBox.from_1000_scale( 100 , 200 , 300 , 250 )
abs_box = norm_box.to_absolute(phone.width, phone.height)
phone.tap(abs_box.center)
This makes it easy to integrate with any vision model regardless of its coordinate format.
Screenshot Loop
Capture screenshots at intervals:
import time
for i in range ( 10 ):
screenshot = phone.screenshot()
with open ( f "frame_ { i :03d} .png" , "wb" ) as f:
f.write(screenshot)
time.sleep( 1 )
Client-Level Screenshots
You can also capture via the client directly:
# With explicit phone ID
screenshot = client.screenshot( phone_id = "abc123" )
# With default phone set
client.use_phone( "iPhone 15 Pro" )
screenshot = client.screenshot()
Screenshots require a round-trip to the device. Capture only when needed rather than in a tight loop.
If doing heavy image processing, consider processing in a separate thread while capturing the next screenshot.
Use streaming for real-time
For real-time viewing, consider using TapKit’s WebRTC streaming instead of repeated screenshots.
Error Handling
from tapkit import TapKitError
try :
screenshot = phone.screenshot()
except TapKitError as e:
print ( f "Screenshot failed: { e } " )
Common errors:
Phone not connected
Mac app not running
Request timeout
Next Steps
Gestures Learn about tap and swipe gestures
Geometry Coordinate and bounding box utilities