Documentation▾

LLM Token Usage & Cost Tracking

Overview

Unclaw extracts token usage from LLM API responses at proxy time, estimates cost via OpenRouter pricing data, and attaches the result to each request log. This enables cost tracking, cache hit analysis, and model usage breakdowns without post-hoc log parsing.

How It Works

Response bodies are already captured for logging
extractTokenUsage() checks if the upstream host is a known LLM provider and parses the usage JSON
Cost is computed using cached pricing from OpenRouter
Token counts + cost are written to the configured analytics store alongside the request

Currently supported providers:

OpenAI (*.openai.com) — usage.prompt_tokens, completion_tokens, prompt_tokens_details.cached_tokens
Anthropic (*.anthropic.com) — usage.input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens

Extraction is zero-cost for non-LLM requests (hostname check short-circuits). Parse failures silently produce zeros.

Pricing via OpenRouter

Pricing data is fetched from OpenRouter's public API:

GET https://openrouter.ai/api/v1/models

No authentication required. Returns model pricing as USD per token in fields: pricing.prompt, pricing.completion, pricing.input_cache_read, pricing.input_cache_write.

The pricing cache refreshes every hour. If the fetch fails, the stale cache is used (or empty = $0 cost). Model lookup tries {provider}/{model} first (e.g. openai/gpt-4o), then the bare model name.

Logged Fields

The following fields are attached to each request log and persisted to the SQLite analytics store.

Field	Description
`LlmProvider`	`openai`, `anthropic`, or `""`
`LlmModel`	Model ID from the response
`LlmInputTokens`	Prompt / input tokens
`LlmOutputTokens`	Completion / output tokens
`LlmCacheReadTokens`	Tokens read from cache
`LlmCacheCreationTokens`	Tokens written to cache
`LlmCostUsd`	Estimated cost in USD

See the RequestRow type in src/analytics.ts for the canonical definition.

Future: Plugin Response Hooks

The current implementation uses hostname-based detection in src/token_usage.ts. A natural evolution is to add a response hook to the plugin system:

// IntegrationEndpoint (future)
extractUsage?: (resp: Response, body: string) =>
  TokenUsage | null;

This would move OpenAI/Anthropic extraction into their plugins and support custom/self-hosted LLMs via third-party plugins. The extractTokenUsage() function serves as a reference implementation.

Key Files

File	Purpose
`src/token_usage.ts`	Provider parsing + OpenRouter pricing
`src/proxy.ts`	WireGuard proxy — calls extractTokenUsage
`src/gateway.ts`	Gateway proxy — calls extractTokenUsage
`src/analytics.ts`	`RequestRow` type and SQLite-backed query API
`src/token_usage_test.ts`	Unit tests