Screenshots

TapKit can capture high-resolution screenshots from your connected iPhone. Screenshots are returned as PNG image data.

Basic Usage

from tapkit import TapKitClient

client = TapKitClient()
phone = client.get_phone()

# Capture screenshot
screenshot = phone.screenshot()

# Save to file
with open("screen.png", "wb") as f:
    f.write(screenshot)

Return Value

The screenshot() method returns raw PNG bytes:

screenshot = phone.screenshot()

print(type(screenshot))  # <class 'bytes'>
print(len(screenshot))   # Size in bytes

Working with Screenshots

Save to File

screenshot = phone.screenshot()

with open("screenshot.png", "wb") as f:
    f.write(screenshot)

Display with PIL/Pillow

from PIL import Image
import io

screenshot = phone.screenshot()
image = Image.open(io.BytesIO(screenshot))

# Display
image.show()

# Get dimensions
print(f"Size: {image.size}")  # (width, height)

Use with OpenCV

import cv2
import numpy as np

screenshot = phone.screenshot()
nparr = np.frombuffer(screenshot, np.uint8)
image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

# Process with OpenCV
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Send to Vision API

import base64
import anthropic

screenshot = phone.screenshot()
base64_image = base64.standard_b64encode(screenshot).decode("utf-8")

# Use with Claude
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": base64_image,
                    },
                },
                {
                    "type": "text",
                    "text": "What's on this screen?"
                }
            ],
        }
    ],
)

Working with Vision Model Coordinates

Different vision models and APIs return coordinates in different formats. TapKit’s geometry utilities help you convert between them:

Format	Range	Common in
Absolute pixels	`0` to `width/height`	Direct screen coordinates
Normalized	`0.0` to `1.0`	Many vision APIs
0-1000 scale	`0` to `1000`	Lux, some UI detection models
Percentage	`0` to `100`	Some bounding box APIs

Converting Model Output

from tapkit.geometry import NormalizedPoint, NormalizedBBox

# Model returns 0-1 normalized coordinates
norm_point = NormalizedPoint(0.5, 0.3)
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)

# Model returns 0-1000 scale coordinates
norm_point = NormalizedPoint.from_1000_scale(500, 300)
abs_point = norm_point.to_absolute(phone.width, phone.height)
phone.tap(abs_point)

# Model returns bounding box in 0-1000 scale
norm_box = NormalizedBBox.from_1000_scale(100, 200, 300, 250)
abs_box = norm_box.to_absolute(phone.width, phone.height)
phone.tap(abs_box.center)

This makes it easy to integrate with any vision model regardless of its coordinate format.

Screenshot Loop

Capture screenshots at intervals:

import time

for i in range(10):
    screenshot = phone.screenshot()
    with open(f"frame_{i:03d}.png", "wb") as f:
        f.write(screenshot)
    time.sleep(1)

Client-Level Screenshots

You can also capture via the client directly:

# With explicit phone ID
screenshot = client.screenshot(phone_id="abc123")

# With default phone set
client.use_phone("iPhone 15 Pro")
screenshot = client.screenshot()

Performance Tips

Reduce capture frequency

Screenshots require a round-trip to the device. Capture only when needed rather than in a tight loop.

Process asynchronously

If doing heavy image processing, consider processing in a separate thread while capturing the next screenshot.

Use streaming for real-time

For real-time viewing, consider using TapKit’s WebRTC streaming instead of repeated screenshots.

Error Handling

from tapkit import TapKitError

try:
    screenshot = phone.screenshot()
except TapKitError as e:
    print(f"Screenshot failed: {e}")

Common errors:

Phone not connected
Mac app not running
Request timeout

Next Steps

Gestures

Learn about tap and swipe gestures

Geometry

Coordinate and bounding box utilities

Getting Started

Reference

Integrations

Basic Usage

Return Value

Working with Screenshots

Save to File

Display with PIL/Pillow

Use with OpenCV

Send to Vision API

Working with Vision Model Coordinates

Converting Model Output

Screenshot Loop

Client-Level Screenshots

Performance Tips

Error Handling

Next Steps

Gestures

Geometry

Getting Started

Reference

Integrations

​Basic Usage

​Return Value

​Working with Screenshots

​Save to File

​Display with PIL/Pillow

​Use with OpenCV

​Send to Vision API

​Working with Vision Model Coordinates

​Converting Model Output

​Screenshot Loop

​Client-Level Screenshots

​Performance Tips

​Error Handling

​Next Steps

Gestures

Geometry

Basic Usage

Return Value

Working with Screenshots

Save to File

Display with PIL/Pillow

Use with OpenCV

Send to Vision API

Working with Vision Model Coordinates

Converting Model Output

Screenshot Loop

Client-Level Screenshots

Performance Tips

Error Handling

Next Steps