Contents

My Claude Code Was Wasting 63K Tokens Per Session. Here's How I Fixed It.

My Claude Code Was Wasting 63K Tokens Per Session. Here's How I Fixed It. |

/images/claude-code-token-optimization.svg

Background

Here’s a fun fact I discovered last week: every time I opened Claude Code, 153KB of markdown files were silently injected into my system prompt — before I even typed a single word. Blogging plugins. Note-taking plugins. Fourteen workflow skills I rarely used outside my main projects.

That’s 63K tokens gone before the conversation even starts. Over 20 sessions a day? Over a million tokens burned on nothing.

The fix took six lines of JSON. Here’s the full breakdown.

背景

上周发现一个离谱的事实:每次打开 Claude Code,153KB 的 markdown 文件 就被悄悄塞进系统提示词 —— 我一字没写,token 先烧了。博客插件、笔记插件、十四个我非主力项目很少用的 workflow skill。

每次对话 63K tokens 起步蒸发。一天 20 次?超过百万 token 白白烧掉。

修复方案只用了六行 JSON。以下是完整拆解。

1. The Problem

Before touching anything, I needed to understand where tokens were going.

find ~/.claude/plugins/cache -name "SKILL.md" -exec wc -c {} \;
claude --debug --debug-file /tmp/debug.txt --print "hello"

The debug output revealed the full picture:

Metric Before
Enabled plugins 4
Plugin skills loaded 18
Skill file total size 153 KB
SessionStart hook context 5,430 chars
Effort level max
Models used All pro[1m]
Plugin Skills Size
Superpowers 14 108 KB
Obsidian 3 41 KB
Frontend-Design 1 4 KB

That’s ~153KB of markdown instructions loaded into the system prompt before I even typed a message, plus 5,430 chars of hook context. At ~2.5 bytes per token for mixed English/code content, that’s roughly ~63K tokens of overhead per new session.

The principle was simple: load only what you need, only where you need it. Claude Code supports three mechanisms:

  1. Global disable — set plugins to false in ~/.claude/settings.json
  2. Project-level settings.claude/settings.json in project dirs re-enables plugins locally
  3. Runtime flags--plugin-dir loads a plugin for one session
Project Superpowers Obsidian Frontend
~/Work/trade/exchange-rs Yes No No
~/Work/dex Yes No No
~/Documents/.../AI-Brain No Yes No
Everything else No No No

1. 问题定位

动手之前,先搞清楚 token 花在哪了:

# 统计所有 skill 文件大小
find ~/.claude/plugins/cache -name "SKILL.md" -exec wc -c {} \;

# 通过 debug 日志查看运行时行为
claude --debug --debug-file /tmp/debug.txt --print "hello"

Debug 日志给出了精确数据:

指标 优化前
已启用插件 4
已加载 skill 18
Skill 文件总大小 153 KB
SessionStart 钩子上下文 5,430 chars
Effort level max
所有模型 全部用 pro[1m]

18 个 skill 明细:

插件 Skill 数 大小
Superpowers 14 108 KB
Obsidian 3 41 KB
Frontend-Design 1 4 KB

还没敲一个字,153KB markdown 指令 + 5,430 chars hook 上下文已经塞进系统提示词。按每 token 约 2.5 字节算,每次新对话约 ~63K tokens 固定开销。

原则很简单:只加载需要的,只在需要的地方加载。Claude Code 提供三种机制:

  1. 全局禁用~/.claude/settings.json 中将插件设为 false
  2. 项目级配置 — 在项目目录创建 .claude/settings.json,仅在该目录启用
  3. 运行时参数--plugin-dir 临时加载
项目 Superpowers Obsidian Frontend
~/Work/trade/exchange-rs
~/Work/dex
~/Documents/.../AI-Brain
其他所有项目

2. The Fix

Global settings (~/.claude/settings.json) — disable 3 plugins, lower effort, stratify models:

- "superpowers@superpowers-marketplace": true,
- "obsidian@obsidian-skills": true,
- "frontend-design@claude-plugins-official": true,
+ "superpowers@superpowers-marketplace": false,
+ "obsidian@obsidian-skills": false,
+ "frontend-design@claude-plugins-official": false,

- "CLAUDE_CODE_EFFORT_LEVEL": "max",
+ "CLAUDE_CODE_EFFORT_LEVEL": "high",

- "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-pro[1m]",
+ "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-flash",

Project-level overrides — re-enable only where needed:

# ~/Work/trade/exchange-rs/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Work/dex/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Documents/LocalKnowledge/AI-Brain/AI-Brain/.claude/settings.json
{ "enabledPlugins": { "obsidian@obsidian-skills": true } }

That’s it. Three files, six edits. Rust-analyzer-LSP stays global since it’s a language server with no skill overhead.

2. 方案

全局配置 — 禁用 3 个插件,降低 effort,模型分层:

- "superpowers@superpowers-marketplace": true,
- "obsidian@obsidian-skills": true,
- "frontend-design@claude-plugins-official": true,
+ "superpowers@superpowers-marketplace": false,
+ "obsidian@obsidian-skills": false,
+ "frontend-design@claude-plugins-official": false,

- "CLAUDE_CODE_EFFORT_LEVEL": "max",
+ "CLAUDE_CODE_EFFORT_LEVEL": "high",

- "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-pro[1m]",
+ "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-flash",

项目级覆盖 — 哪里需要哪里开:

# ~/Work/trade/exchange-rs/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Work/dex/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Documents/.../AI-Brain/.claude/settings.json
{ "enabledPlugins": { "obsidian@obsidian-skills": true } }

就这些。三个文件,六行改动。Rust-analyzer-LSP 保留全局——它是语言服务器,没有 skill 开销。

3. Results

To verify the optimization actually worked, I ran controlled tests using --debug:

# Test in a neutral directory (no project settings)
cd /tmp && claude --debug --debug-file /tmp/after-debug.txt --print "hello"

Comparing debug outputs before and after:

Metric Before (Mar 19) After (May 5)
Enabled plugins 4 1
Plugin skills loaded 18 0
Skill files in prompt 153 KB 0 KB
Hook context 5,430 chars 0 chars
Effort level max high
Small model pro[1m] flash

The key debug log line:

# Before:  getSkills returning: ... 18 plugin skills ...
# After:   getSkills returning: ... 0 plugin skills ...
Metric Before After Savings
Plugin skills loaded 18 0 -18
Skill files in prompt 153 KB 0 -153 KB
Est. tokens / session ~63K 0 ~63K
Hook context 5,430 chars 0 -5,430
Effort level max high thinking ↓30-50%
Small model pro[1m] flash faster + cheaper

The optimization is transparent for active projects — enter the directory and Claude Code automatically picks up the local settings.json, loading exactly the plugins needed. For one-off tasks: claude --plugin-dir <path>.

3. 效果

--debug 做受控对比:

# 在中性目录测试(无项目配置)
cd /tmp && claude --debug --debug-file /tmp/after-debug.txt --print "hello"

Debug 日志对比:

指标 优化前 (3/19) 优化后 (5/5)
启用插件 4 1
已加载 skills 18 0
Skill 文件注入 153 KB 0 KB
Hook 上下文 5,430 chars 0 chars
Effort level max high
轻量模型 pro[1m] flash

Debug 关键行:

# 优化前:  getSkills returning: ... 18 plugin skills ...
# 优化后:  getSkills returning: ... 0 plugin skills ...
指标 优化前 优化后 节省
已加载 skills 18 0 -18
Skill 文件注入 153 KB 0 -153 KB
每会话估算 token ~63K 0 ~63K
Hook 上下文 5,430 chars 0 -5,430
Effort level max high thinking ↓30-50%
轻量模型 pro[1m] flash 更快更省

主力项目完全无感——进入对应目录,Claude Code 自动读取项目级 settings.json,按需加载插件。临时使用:claude --plugin-dir <path>

4. Takeaways

  • Profile first: find ... -name "SKILL.md" revealed 153KB hidden overhead. Can’t optimize what you can’t measure.
  • Lazy-load plugins: Project-level settings.json scopes plugins to specific directories. Same pattern as direnv / asdf.
  • Tune effort level: max dramatically increases thinking tokens. high is enough for most tasks — save max for hard problems.
  • Stratify models: Not every operation needs the flagship model. Use flash variants for background tasks.
  • Numbers compound: ~63K tokens saved per session × 20 sessions = 1.26M tokens per day.

4. 要点

  • 先剖析: 一行 find ... -name "SKILL.md" 就暴露了 153KB 隐藏开销。不能度量就无法优化。
  • 按需加载: 项目级 settings.json 实现插件按目录隔离,和 direnv / asdf 模式一样。
  • 调优 effort: max 大幅增加思考 token。high 对多数任务够用——难题才用 max
  • 模型分层: 后台操作用 flash 变体,旗舰模型留给关键任务。
  • 积少成多: 每次省 ~63K,每天 20 次 = 1.26M tokens,省出整个上下文窗口。