Skip to main content
Skills are Markdown files that teach AI agents how to use apps on your phone. They follow the open Agent Skills standard — the same files work on Claude Code, Codex, OpenClaw, Cursor, and 35+ other agents.

Why skills matter

Mobile apps are opaque screenshots, not DOMs. An agent looking at an iPhone screen sees pixels — it doesn’t know where buttons are, what tabs exist, or how navigation works. Skills teach the agent what it’s looking at so it can act intelligently. This is TapKit’s competitive advantage over browser automation tools like Browserbase or Browser Use — they can read the DOM. We need skills.

The two-layer model

TapKit skills come in two layers:

Layer 1 — App Knowledge

“What does the app look like?” App skills are structural — they map out an app’s entire UI: tab layout, navigation patterns, button locations, gestures, and common gotchas. They change when the app updates.
tapkit:hinge
  Discover/Standouts/Likes/Matches tabs,
  swipe cards, rose system,
  messaging UI, preferences...

Layer 2 — Task Strategy

“What are we trying to do?” Task skills are behavioral — they define a specific playbook for what the agent should do. They build on top of app skills and change with your strategy.
tapkit:twitter-warmup
  Follow 10 accounts, like 20+ posts,
  reply to 5, build activity history,
  engage for 30 min after posting...
Layer 1 is structural — changes when the app updates. Layer 2 is behavioral — changes with your strategy.

How skills compose

When an agent runs a task, it loads three layers:
  1. Core skill — how to use the TapKit CLI (coordinate system, screenshot loop, gestures)
  2. App skill (L1) — how the target app’s UI works
  3. Task skill (L2) — the specific strategy to execute
All three compose into one coherent behavior. The agent knows its tools, understands the UI, and follows the playbook.