Why skills matter
Mobile apps are opaque screenshots, not DOMs. An agent looking at an iPhone screen sees pixels — it doesn’t know where buttons are, what tabs exist, or how navigation works. Skills teach the agent what it’s looking at so it can act intelligently. This is TapKit’s competitive advantage over browser automation tools like Browserbase or Browser Use — they can read the DOM. We need skills.The two-layer model
TapKit skills come in two layers:Layer 1 — App Knowledge
“What does the app look like?” App skills are structural — they map out an app’s entire UI: tab layout, navigation patterns, button locations, gestures, and common gotchas. They change when the app updates.Layer 2 — Task Strategy
“What are we trying to do?” Task skills are behavioral — they define a specific playbook for what the agent should do. They build on top of app skills and change with your strategy.Layer 1 is structural — changes when the app updates. Layer 2 is behavioral — changes with your strategy.
How skills compose
When an agent runs a task, it loads three layers:- Core skill — how to use the TapKit CLI (coordinate system, screenshot loop, gestures)
- App skill (L1) — how the target app’s UI works
- Task skill (L2) — the specific strategy to execute