Skip to main content
TapKit can capture high-resolution screenshots from your connected iPhone. Screenshots are returned as PNG image data.

Basic Usage

from tapkit import TapKitClient

client = TapKitClient()
phone = client.get_phone()

# Capture screenshot
screenshot = phone.screenshot()

# Save to file
with open("screen.png", "wb") as f:
    f.write(screenshot)

Return Value

The screenshot() method returns raw PNG bytes:
screenshot = phone.screenshot()

print(type(screenshot))  # <class 'bytes'>
print(len(screenshot))   # Size in bytes

Working with Screenshots

Save to File

screenshot = phone.screenshot()

with open("screenshot.png", "wb") as f:
    f.write(screenshot)

Display with PIL/Pillow

from PIL import Image
import io

screenshot = phone.screenshot()
image = Image.open(io.BytesIO(screenshot))

# Display
image.show()

# Get dimensions
print(f"Size: {image.size}")  # (width, height)

Use with OpenCV

import cv2
import numpy as np

screenshot = phone.screenshot()
nparr = np.frombuffer(screenshot, np.uint8)
image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

# Process with OpenCV
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Send to Vision API

import base64
import anthropic

screenshot = phone.screenshot()
base64_image = base64.standard_b64encode(screenshot).decode("utf-8")

# Use with Claude
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": base64_image,
                    },
                },
                {
                    "type": "text",
                    "text": "What's on this screen?"
                }
            ],
        }
    ],
)

Working with Vision Model Coordinates

Different vision models and APIs return coordinates in different formats. TapKit’s geometry utilities help you convert between them:
FormatRangeCommon in
Absolute pixels0 to width/heightDirect screen coordinates
Normalized0.0 to 1.0Many vision APIs
0-1000 scale0 to 1000Lux, some UI detection models
Percentage0 to 100Some bounding box APIs

Converting Model Output

from tapkit.geometry import NormalizedPoint, NormalizedBBox

# Model returns 0-1 normalized coordinates
norm_point = NormalizedPoint(0.5, 0.3)
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)

# Model returns 0-1000 scale coordinates
norm_point = NormalizedPoint.from_1000_scale(500, 300)
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)

# Model returns bounding box in 0-1000 scale
norm_box = NormalizedBBox.from_1000_scale(100, 200, 300, 250)
abs_box = norm_box.to_absolute(phone.width, phone.height)
phone.tap(abs_box.center)
This makes it easy to integrate with any vision model regardless of its coordinate format.

Screenshot Loop

Capture screenshots at intervals:
import time

for i in range(10):
    screenshot = phone.screenshot()
    with open(f"frame_{i:03d}.png", "wb") as f:
        f.write(screenshot)
    time.sleep(1)

Client-Level Screenshots

You can also capture via the client directly:
# With explicit phone ID
screenshot = client.screenshot(phone_id="abc123")

# With default phone set
client.use_phone("iPhone 15 Pro")
screenshot = client.screenshot()

Performance Tips

Screenshots require a round-trip to the device. Capture only when needed rather than in a tight loop.
If doing heavy image processing, consider processing in a separate thread while capturing the next screenshot.
For real-time viewing, consider using TapKit’s WebRTC streaming instead of repeated screenshots.

Error Handling

from tapkit import TapKitError

try:
    screenshot = phone.screenshot()
except TapKitError as e:
    print(f"Screenshot failed: {e}")
Common errors:
  • Phone not connected
  • Mac app not running
  • Request timeout

Next Steps