Hanzi Browse Documentation

Hanzi Browse gives AI agents a real browser. Use it locally with your own model, or embed it in your product via the API.

Use Hanzi Browse now

One command sets up everything: detects your browsers, installs the Chrome extension, finds AI agents on your machine, and configures MCP.

npx hanzi-browse setup

Supports Claude Code, Cursor, Windsurf, Claude Desktop, and Codex.

What setup does

  1. Checks for the Chrome extension — opens the install page if missing
  2. Scans for supported AI agents on your machine
  3. Adds Hanzi Browse as an MCP server to each agent's config
  4. Imports credentials (Claude Code OAuth, Codex, or API key)

Supported credentials

SourceHow
Claude CodeAuto-detected from claude login
CodexAuto-detected from codex login
API keySet ANTHROPIC_API_KEY env var or enter during setup
Custom endpointAny OpenAI-compatible API (Ollama, LM Studio, etc.)

Manual setup

If you prefer to configure manually:

# Claude Code
claude mcp add browser -- npx -y hanzi-browse

# Cursor / Windsurf (mcp.json)
{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["-y", "hanzi-browse"]
    }
  }
}

Test it

After setup, ask your agent something that needs a browser:

"Go to Hacker News and tell me the top 3 stories right now"

Build with Hanzi Browse

Embed browser automation in your product. Your app calls the Hanzi Browse API, a real browser executes the task, you get the result back.

How it works

Your App POST /v1/tasks GET /v1/tasks/:id Shows results to user API call Hanzi Browse Runs the AI agent Executes browser tools Returns answer Vertex AI (Gemini) WebSocket User's Browser Chrome + Hanzi Browse extension Real signed-in session Clicks, reads, navigates tool results answer

Quick start: let your AI agent build it

Copy this prompt into Claude Code, Cursor, or any AI coding agent. It has everything the agent needs to integrate Hanzi Browse into your project.

Add browser automation to this project using the Hanzi Browse API. Read the codebase first, then ask me:

1. What browser task should Hanzi Browse automate? (e.g. "read patient chart", "fill out a form", "extract data from a web portal")
2. Where in the UI should the browser pairing flow go? (e.g. settings page, onboarding, a dedicated page)
3. Where should task results appear? (e.g. inline in the app, a chat interface, a dashboard)

Then build the integration using this API reference:

## Hanzi Browse API (base URL: https://api.hanzilla.co)

Auth: `Authorization: Bearer hic_live_...` header on all requests.

### Core flow
1. Create pairing token → show user a link → they connect their browser
2. Run tasks against their connected browser → poll for results
3. Show the answer in your app

### Endpoints

POST /v1/browser-sessions/pair
  Body: {"label": "User Name", "external_user_id": "your_user_id"}
  Returns: {"pairing_token": "hic_pair_...", "expires_in_seconds": 300}
  → Build a link: https://api.hanzilla.co/pair/{pairing_token}
  → User clicks it, their Chrome auto-pairs. Token expires in 5 min.

GET /v1/browser-sessions
  Returns: {"sessions": [{"id": "...", "status": "connected", "label": "..."}]}

POST /v1/tasks
  Body: {"task": "description", "browser_session_id": "...", "url": "optional", "context": "optional"}
  Returns: {"id": "task_id", "status": "running"}
  → task: what to do (max 10K chars). Be specific.
  → url: starting page (optional). If set, agent navigates there first.
  → context: extra info like form data, preferences (max 50K chars).

GET /v1/tasks/:id
  Returns: {"status": "running|complete|error", "answer": "...", "steps": 4}
  → Poll every 2s until status != "running". Typical task takes 10-60s.

POST /v1/tasks/:id/cancel
  → Stops a running task.

GET /v1/tasks/:id/steps
  Returns: {"steps": [{"step": 1, "status": "tool_use", "toolName": "navigate", ...}]}
  → Full execution log for debugging.

GET /v1/tasks/:id/screenshots/:step
  Returns: {"screenshot": "iVBORw0KGgo..."}
  → Base64 JPEG screenshot at a specific step. Prefix with data:image/jpeg;base64, to display.

GET /v1/billing/credits
  Returns: {"free_remaining": 20, "credit_balance": 0, "free_tasks_per_month": 20}

### Key details
- 20 free tasks/month, then $0.05 per completed task. Errors are free.
- Tasks timeout after 30 min. Use cancel to stop early.
- Browser sessions last 30 days and auto-reconnect.
- Two key types: secret (hic_live_) for server-side, publishable (hic_pub_) for client-side embed widget.
- POST /v1/tasks accepts optional webhook_url — Hanzi Browse POSTs the result to your URL on completion.
- Embed widget: <script src="https://browse.hanzilla.co/embed.js"></script> + HanziConnect.mount()
- The user needs the Hanzi Browse Chrome extension installed.
  Install link: https://chromewebstore.google.com/detail/iklpkemlmbhemkiojndpbhoakgikpmcd

### Example: Express + SDK (full integration)
  See: https://github.com/hanzili/hanzi-browse/tree/main/examples/partner-quickstart

Read the codebase to understand the stack and project structure, then ask me the 3 questions above. After I answer, build the full integration.

Or follow the steps manually

  1. Install the Chrome extensionfrom the Chrome Web Store. Your users will also need this — pairing fails silently without it.
  2. Sign inopen your developer console (Google or email)
  3. Create an API key — from the console, or via POST /v1/api-keys
  4. Pair a browser — generate a pairing token, send your user a link (/pair/{token}). Token expires in 5 minutes.
  5. Run a taskPOST /v1/tasks with a task and browser session ID
Sample app: See examples/partner-quickstart for a full working integration (Express + SDK + embed widget).

TypeScript SDK

The SDK wraps the REST API with typed methods, automatic polling, and error handling.

npm install @hanzi-browse/sdk
import { HanziClient } from '@hanzi-browse/sdk';

const client = new HanziClient({ apiKey: 'hic_live_...' });

// 1. Create a pairing token (give the URL to your user)
const { pairingToken } = await client.createPairingToken({
  label: 'Dr. Smith',
  externalUserId: 'user_123',
});
// Send user to: https://api.hanzilla.co/pair/{pairingToken}

// 2. Check for a connected session
const sessions = await client.listSessions();
const connected = sessions.find(s => s.status === 'connected');

// 3. Run a task (polls until complete, 5 min timeout)
const result = await client.runTask({
  browserSessionId: connected.id,
  task: 'Go to example.com and read the page title',
});
console.log(result.answer);
console.log(result.status); // 'complete' | 'error' | 'cancelled'

All methods: createPairingToken, listSessions, deleteSession, createTask, getTask, runTask, cancelTask, listTasks, getTaskSteps, getScreenshot, createApiKey, listApiKeys, deleteApiKey, getUsage, getCredits, health.

Errors throw HanziError with .status (HTTP code) and .data (response body). The SDK retries transient polling errors in runTask() automatically.

Authentication

All API endpoints (except /v1/health) require authentication. Two methods are supported:

MethodUse caseHow
API keyServer-to-server, SDKAuthorization: Bearer hic_live_...
Session cookieDeveloper console, browserSet automatically after sign-in via Better Auth
curl
curl https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_your_key_here"

API keys are scoped to a workspace. Each key can access all sessions, tasks, and usage within its workspace. Keys are hashed at rest — the plaintext is shown once on creation.

API Keys

POST /v1/api-keys
Create a new API key for your workspace.
# Request
curl -X POST https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "production"}'

# Response (201)
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "key": "hic_live_a1b2c3d4e5f6...",
  "name": "production",
  "workspace_id": "...",
  "_warning": "Save this key now. It will not be shown again."
}
GET /v1/api-keys
List all API keys. Returns prefixes only, not full keys.
DELETE /v1/api-keys/:id
Delete an API key. Integrations using this key will immediately stop working.

Key types

TypePrefixUse casePermissions
Secrethic_live_Server-side, SDKAll endpoints
Publishablehic_pub_Client-side, embed widgetPair browsers, list sessions only
# Create a publishable key
curl -X POST https://api.hanzilla.co/v1/api-keys \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "frontend-widget", "type": "publishable"}'
Security: Never expose secret keys (hic_live_) in client-side code. Use publishable keys (hic_pub_) for the embed widget. Publishable keys can only pair browsers — they cannot create tasks or access billing.

Browser Sessions

POST /v1/browser-sessions/pair
Create a pairing token (5-minute expiry). The user enters this in the Chrome extension to connect their browser.
# Request
curl -X POST https://api.hanzilla.co/v1/browser-sessions/pair \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"label": "Dr. Smith", "external_user_id": "user_123"}'

# Response (201)
{
  "pairing_token": "hic_pair_a1b2c3...",
  "expires_at": 1710000000000,
  "expires_in_seconds": 300
}
POST /v1/browser-sessions/register
Exchange a pairing token for a session credential. Called by the extension, not your app.
GET /v1/browser-sessions
List all browser sessions with status, label, and external_user_id.
# Response (200)
{
  "sessions": [
    {
      "id": "550e8400-...",
      "status": "connected",
      "label": "Dr. Smith",
      "external_user_id": "user_123",
      "connected_at": 1710000000000,
      "last_heartbeat": 1710000060000
    }
  ]
}
DELETE /v1/browser-sessions/:id
Delete a browser session. The user will need to re-pair.
curl -X DELETE https://api.hanzilla.co/v1/browser-sessions/550e8400-... \
  -H "Authorization: Bearer hic_live_..."

Tasks

POST /v1/tasks
Start a browser automation task. Requires a connected browser session.
# Request
curl -X POST https://api.hanzilla.co/v1/tasks \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Read the patient chart on the current page",
    "browser_session_id": "550e8400-...",
    "url": "https://example.com/chart",
    "context": "Extract: name, medications, allergies"
  }'

# Response (201)
{
  "id": "task_abc123",
  "status": "running",
  "task": "Read the patient chart on the current page",
  "browser_session_id": "550e8400-..."
}
FieldRequiredDescription
taskYesWhat to do (max 10,000 chars)
browser_session_idYesConnected session to run against
urlNoStarting URL (max 2,048 chars)
contextNoExtra context for the agent (max 50,000 chars)
webhook_urlNoURL to POST results to on completion (max 2,048 chars)
GET /v1/tasks/:id
Get task status, answer, steps, and usage.
# Response (200) — completed task
{
  "id": "task_abc123",
  "status": "complete",
  "task": "Read the patient chart on the current page",
  "answer": "Patient: Jane Doe. Medications: Lisinopril 10mg...",
  "steps": 4,
  "usage": { "inputTokens": 12000, "outputTokens": 800, "apiCalls": 5 },
  "browser_session_id": "550e8400-...",
  "created_at": 1710000000000,
  "completed_at": 1710000120000
}
POST /v1/tasks/:id/cancel
Cancel a running task.
GET /v1/tasks/:id/steps
Get the full execution log for a task, including each tool call and result.
# Response (200)
{
  "steps": [
    {"step": 1, "status": "tool_use", "toolName": "navigate", "url": "https://example.com"},
    {"step": 2, "status": "tool_use", "toolName": "read_page"}
  ]
}
GET /v1/tasks/:id/screenshots/:step
Get the screenshot captured at a specific step. Returns base64 JPEG.
# Response (200)
{
  "screenshot": "iVBORw0KGgo..."
}

The screenshot field contains raw base64-encoded image data (JPEG). To display it, prefix with data:image/jpeg;base64,.

Webhooks

Instead of polling, pass a webhook_url when creating a task. Hanzi Browse will POST the result to your URL when the task finishes:

# Create a task with webhook
curl -X POST https://api.hanzilla.co/v1/tasks \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Read the current page title",
    "browser_session_id": "550e8400-...",
    "webhook_url": "https://yourapp.com/api/hanzi-callback"
  }'

# Hanzi Browse POSTs to your URL on completion:
{
  "event": "task.completed",
  "task": {
    "id": "task_abc123",
    "status": "complete",
    "answer": "The page title is...",
    "steps": 3,
    "usage": { "inputTokens": 8000, "outputTokens": 500, "apiCalls": 4 },
    "created_at": 1710000000000,
    "completed_at": 1710000030000
  }
}

Webhook delivery is fire-and-forget with a 10-second timeout. If your endpoint is down, the result is still available via GET /v1/tasks/:id.

GET /v1/tasks
List recent tasks for your workspace.

Usage

GET /v1/usage
Usage summary for your workspace.
# Response (200)
{
  "totalInputTokens": 150000,
  "totalOutputTokens": 12000,
  "totalApiCalls": 45,
  "totalCostUsd": 0.082,
  "taskCount": 8
}

Browser Pairing

Pairing connects a user's Chrome browser to your workspace. Users pair once — the session lasts 30 days and auto-reconnects on browser restart.

How it works

  1. Your backend calls POST /v1/browser-sessions/pair to get a pairing token
  2. Show your user a link: https://api.hanzilla.co/pair/{token}
  3. User clicks the link → their browser auto-pairs → done
# Your backend generates the link:
curl -X POST https://api.hanzilla.co/v1/browser-sessions/pair \
  -H "Authorization: Bearer hic_live_..." \
  -H "Content-Type: application/json" \
  -d '{"label": "Dr. Smith", "external_user_id": "user_123"}'

# Response:
# { "pairing_token": "hic_pair_abc123...", "expires_in_seconds": 300 }

# Give your user this link:
# https://api.hanzilla.co/pair/hic_pair_abc123...

The pairing page detects the Hanzi Browse extension and pairs automatically. If the extension isn't installed, the user sees an "Install" button.

Sessions auto-reconnect on browser restart — no re-pairing needed. Use label and external_user_id to track which session belongs to which user.

Embed widget (recommended)

Drop-in UI component that handles extension detection, pairing, and connection status — like Stripe's checkout widget.

<script src="https://browse.hanzilla.co/embed.js"></script>
<div id="hanzi-connect"></div>
<script>
  HanziConnect.mount('#hanzi-connect', {
    apiKey: 'hic_pub_...',  // publishable key — safe for client-side
    purpose: 'read your EHR on your behalf',
    onConnected: (sessionId) => {
      // Send sessionId to your backend to run tasks
      fetch('/api/set-session', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ sessionId }),
      });
    },
    onDisconnected: () => {
      // Browser disconnected — show re-pair UI
    },
  });
</script>
Security pattern: Use a publishable key (hic_pub_) in the widget. Send the sessionId to your backend. Your backend uses the secret key (hic_live_) to create tasks. Never put secret keys in client-side code.

Session Metadata

When creating a pairing token, attach a label and external_user_id to map Hanzi Browse sessions to your users:

POST /v1/browser-sessions/pair
{
  "label": "Dr. Smith's browser",
  "external_user_id": "user_abc123"
}

Both fields are inherited by the browser session and returned in GET /v1/browser-sessions. Use them to identify whose browser is whose in your system.

Troubleshooting

Extension not detected

Make sure the Chrome extension is installed and enabled. Reload at chrome://extensions if needed.

Agent can't find Hanzi Browse

Restart your AI agent after running setup. MCP config is written to disk but agents need a restart.

Session disconnected

The browser was closed or lost network. Sessions auto-reconnect when the browser reopens. Check GET /v1/browser-sessions for status before creating tasks.

Task fails or times out

Check that the session is connected. Verify credentials are valid. Tasks have a 30-minute timeout. If the page requires login, make sure the user is signed in.

Pairing token expired

Tokens are valid for 5 minutes. Generate a new one via the developer console or POST /v1/browser-sessions/pair.

API key not working

Keys start with hic_live_. Check that you're using the full key (shown once on creation). Verify the key belongs to the correct workspace.

Error Codes

StatusMeaningCommon cause
400Bad RequestMissing required field, input too long, invalid URL
401UnauthorizedMissing or invalid API key / session cookie
402Payment RequiredPlan upgrade needed (when billing is active)
403ForbiddenSession belongs to a different workspace
404Not FoundResource doesn't exist, or belongs to another workspace
409ConflictBrowser session not connected or expired
429Too Many RequestsRate limit exceeded (10 tasks/min, 5 concurrent)
500Server ErrorInternal error — check request_id in response for support
503Service UnavailableBilling not configured, or server degraded

All error responses include a request_id in the X-Request-Id response header for tracing.

# Error response format
{
  "error": "Browser session is not connected. The extension must be running and registered.",
  "request_id": "a1b2c3d4"
}

Security

MechanismDetails
API keysSHA-256 hashed at rest. Plaintext shown once on creation. Prefix stored for display.
Pairing tokensSHA-256 hashed. 5-minute expiry. Single use — cannot be replayed.
Session tokens30-day expiry. Auto-rotated by the relay. Revocable.
Workspace isolationAll resources scoped to workspace. Cross-workspace access returns 404.
BYOM privacyNo data leaves your machine. Screenshots sent only to your chosen provider.

Full privacy policy: PRIVACY.md