05 / 05

Token Consumption

Tokens are the unit of work for AI models — everything the agent reads or generates costs tokens. In a single agentic session this adds up quickly: project context, skill metadata, MCP tool descriptions, your prompts, the agent's responses, and file contents it reads along the way.

When using agentic tools, it's important to manage token consumption effectively to stay within your limits and avoid unexpected costs. The exact controls vary by product, but the principles are the same. Here are some best practices:

Understand your limits — Familiarize yourself with the token limits of the tools you are using. This will help you plan your usage and avoid hitting limits unexpectedly.
Optimize your prompts — Craft your prompts carefully to get the most relevant and concise responses. This reduces the number of tokens used per interaction.
Monitor usage — Keep track of your token usage regularly. Many platforms provide dashboards or APIs to monitor consumption and identify patterns.
Manage your conversation history — Every turn appends to the context: your prompt, the agent's response, tool results, and any file contents it read. Each API call sends the full accumulated history, so a long session gets progressively more expensive — and slower, as the model has more to process. Use your client's summarize, clear, reset features, or start a new session depending on the situation (see below).

What affects token consumption?

Initial context loading

Project context

When you load a project context file (typically AGENTS.md, or a vendor-specific variant such as CLAUDE.md), the content of that file is tokenized and counts towards your token usage. The more detailed and extensive your project context is, the more tokens every session costs — before you've typed a single prompt.

Be mindful of the amount of information you include in your project context files. Only include relevant and necessary information with efficient wording. See Project Context for how to structure these files and how to avoid loading duplicate context when one canonical file will do.

Skills

When an agent starts, it loads the skill metadata (name, description, example) to understand what workflows are available. The skill's full instructions are only loaded when that skill is actually used.

Keep skill descriptions concise. The more focused they are, the easier it is for the agent to find the right skill — and the less startup cost per session.

MCP tools

When the agent starts, it connects to configured MCP servers and each server replies with its available tools along with their descriptions. Those descriptions are tokenized and count towards your initial token usage.

Some agentic tools support enabling or disabling certain MCP servers. Disable any servers that won't be used in the current project to reduce startup token cost. See MCP for details.

Conversation

Every message in a session — prompts, agent responses, tool results, file contents — accumulates in the conversation history. Each API call sends that full history, so the cost grows with every turn.

There are three common ways to manage it. Each tool may have different support for these capabilities and expose them with different commands or interfaces. Refer to your specific tool's documentation for the exact instructions.

Compaction / summarization

Some clients let you summarize the conversation history in place. The session stays open, MCP connections stay active, and you keep working — but the detailed history is replaced with a condensed summary. You lose some detail but retain the thread.

For example, Claude Code provides /compact for this.

Use it when: the conversation is getting long but you're still working on the same task and want to continue without starting over.

Clear / reset history

Some clients let you wipe the conversation history entirely while keeping the current session open. Depending on the tool, MCP connections may stay active, reconnect, or be reinitialized lazily. Note that your project context file, skill metadata, and MCP tool descriptions are usually part of the startup or system context, not the conversation history — so clearing chat history often does not reduce that base cost.

For example, Claude Code provides /clear for this.

Use it when: you're switching to a completely unrelated task and don't need any of the current conversation context.

Starting a new session

From a token perspective, a new session usually has a similar cost to clearing history — conversation history is gone, and the base context (project file, skills, MCP descriptions) is still present again at startup. The difference is that a new session re-reads your project context files and skills from disk and re-establishes MCP connections. If those files haven't changed, the startup context is effectively the same and the token cost is similar. But if you've edited AGENTS.md, changed another context file, or installed a new skill during the current session, only a fresh session is guaranteed to pick up those changes.

Use it when: the agent seems confused or stuck, you've made changes to your project context or your skills and want the agent to pick them up, or you want MCP connections refreshed.

Context window limits

Every model has a context window — a maximum amount of text it can hold in memory at once. When a session approaches this limit:

Older context is dropped and the agent may "forget" earlier parts of the conversation
In some clients the session will error out

If you notice the agent making mistakes it wasn't making earlier in a session, context saturation is a likely cause. Use your client's summarize or clear features, or start a new session if the agent seems stuck.

Summary

Source	When it's loaded	How to reduce
Project context file	Every session start	Keep it concise; prefer one canonical context file, usually `AGENTS.md`
Skill descriptions	Every session start	Write tight, focused descriptions
MCP tool descriptions	Every session start	Disable unused MCP servers
Conversation history	Grows during session	Use your client's compact/clear/reset features
File contents	When agent reads a file	Avoid reading large files unnecessarily

References

Contributors

@AgiMaulana-GDT

@ashlah-gdt View all contributors →