WORLD/01.STUDIO
● ONLINE
PHT
10°43′N · 122°34′E
v3.0 — QUIETLY.BUILT.LOUDLY.SHIPPED
SCROLL
000%
ASTHER · LOUIE · CABARDO · 2026
FULL—STACK · ENGINEER · PH
Back to journal
Guide · AI Engineering · Jan 2026 · 11 min read

The Claude AI efficiency guide — token cost, context, and skills.

A working playbook from someone who lives in this thing — what to cache, when to start fresh, and how to pick a model.

Most people I show Claude to use it like a fancy chat box. They paste a problem, take the answer, close the tab, repeat. That is the most expensive way to use it — both in tokens and in your own time. After a year of working with Claude as my engineering partner, here is the playbook I actually use.

Token cost is the first thing to internalize. Every message you send is the entire conversation so far, sent again. By message 30 of a long thread, you are paying to re-process every word of the previous 29. Two practical fixes: start fresh sessions for new tasks (the cost-equivalent of a clean room), and use prompt caching for the parts of context that do not change. If you have a 4,000-token system prompt that is the same on every call, cache it once and pay roughly a tenth of the price on every subsequent call. It compounds fast.

Manage context like a budget. Long context is a tool, not a default. A 200k-token conversation is not "more thorough" — it is slower, more expensive, and surprisingly more error-prone because the model has to rank what matters across more noise. My rule: if a thread runs past about 30 turns or 50k tokens, summarize and start fresh. I keep a one-paragraph carry-over in a notes file and paste it into the next session.

Use skills. Skills are small, scoped specifications that teach Claude how to do a particular task — write a PR review, generate a migration, do a security audit. Once you write them once, you do not re-explain the rules every time. I have skills for: writing case studies, drafting commit messages, reviewing schemas, and triaging bug reports. Each one saves me 10 to 15 minutes of preamble per use, which adds up to hours per week.

Pick the right model for the task. Opus is the heavyweight — use it for architecture, hard reasoning, and tricky debugging. Sonnet is the workhorse — use it for everyday coding, most tool-use loops, and code review. Haiku is the sprinter — use it for short transformations, simple categorization, and high-volume background tasks. The cost difference between Haiku and Opus on the same task is roughly 15× — and Haiku is the right answer more often than people think.

Tool use efficiency. When you ask Claude to call multiple tools, structure the prompt so independent tools run in parallel. If you have to fetch three files, ask for them in one message — the model can dispatch all three at once. This is the difference between a 20-second turn and a 5-second turn. Same on the inverse: do not bundle tools that depend on each other into one call.

The step-by-step starter setup. (1) Create a CLAUDE.md in your project root with the rules of engagement — your style, your stack, your standing decisions. This becomes the system prompt and benefits from caching. (2) Write 2 or 3 skills for tasks you do every week. Start small. (3) Set up a budget alarm in your account dashboard so token cost never surprises you. (4) Adopt the fresh-session-per-task habit. (5) Once a month, review your usage and trim.

The shape of the workflow matters more than any single trick. The best Claude users I know have one thing in common: they treat the model as a colleague with a memory problem — competent and fast, but bring the context every time. Build your workflow around that, and the cost takes care of itself.

A working playbook from someone who lives in this thing — what to cache, when to start fresh, and how to pick a model.
next entry ↘
Pick infrastructure for the project you have, not the one you wish you had.