My Claude Code Was Wasting 63K Tokens Per Session. Here's How I Fixed It.

2026-05-05 1012 words 5 minutes

Contents

My Claude Code Was Wasting 63K Tokens Per Session. Here's How I Fixed It. |

Background

Here’s a fun fact I discovered last week: every time I opened Claude Code, 153KB of markdown files were silently injected into my system prompt — before I even typed a single word. Blogging plugins. Note-taking plugins. Fourteen workflow skills I rarely used outside my main projects.

That’s 63K tokens gone before the conversation even starts. Over 20 sessions a day? Over a million tokens burned on nothing.

The fix took six lines of JSON. Here’s the full breakdown.

背景

上周发现一个离谱的事实：每次打开 Claude Code，153KB 的 markdown 文件 就被悄悄塞进系统提示词 —— 我一字没写，token 先烧了。博客插件、笔记插件、十四个我非主力项目很少用的 workflow skill。

每次对话 63K tokens 起步蒸发。一天 20 次？超过百万 token 白白烧掉。

修复方案只用了六行 JSON。以下是完整拆解。

1. The Problem

Before touching anything, I needed to understand where tokens were going.

        
find ~/.claude/plugins/cache -name "SKILL.md" -exec wc -c {} \;
claude --debug --debug-file /tmp/debug.txt --print "hello"

The debug output revealed the full picture:

Metric	Before
Enabled plugins	4
Plugin skills loaded	18
Skill file total size	153 KB
SessionStart hook context	5,430 chars
Effort level	`max`
Models used	All `pro[1m]`

Plugin	Skills	Size
Superpowers	14	108 KB
Obsidian	3	41 KB
Frontend-Design	1	4 KB

That’s ~153KB of markdown instructions loaded into the system prompt before I even typed a message, plus 5,430 chars of hook context. At ~2.5 bytes per token for mixed English/code content, that’s roughly ~63K tokens of overhead per new session.

The principle was simple: load only what you need, only where you need it. Claude Code supports three mechanisms:

Global disable — set plugins to false in ~/.claude/settings.json
Project-level settings — .claude/settings.json in project dirs re-enables plugins locally
Runtime flags — --plugin-dir loads a plugin for one session

Project	Superpowers	Obsidian	Frontend
`~/Work/trade/exchange-rs`	Yes	No	No
`~/Work/dex`	Yes	No	No
`~/Documents/.../AI-Brain`	No	Yes	No
Everything else	No	No	No

1. 问题定位

动手之前，先搞清楚 token 花在哪了：

        
# 统计所有 skill 文件大小
find ~/.claude/plugins/cache -name "SKILL.md" -exec wc -c {} \;

# 通过 debug 日志查看运行时行为
claude --debug --debug-file /tmp/debug.txt --print "hello"

Debug 日志给出了精确数据：

指标	优化前
已启用插件	4
已加载 skill	18
Skill 文件总大小	153 KB
SessionStart 钩子上下文	5,430 chars
Effort level	`max`
所有模型	全部用 `pro[1m]`

18 个 skill 明细：

插件	Skill 数	大小
Superpowers	14	108 KB
Obsidian	3	41 KB
Frontend-Design	1	4 KB

还没敲一个字，153KB markdown 指令 + 5,430 chars hook 上下文已经塞进系统提示词。按每 token 约 2.5 字节算，每次新对话约 ~63K tokens 固定开销。

原则很简单：只加载需要的，只在需要的地方加载。Claude Code 提供三种机制：

全局禁用 — ~/.claude/settings.json 中将插件设为 false
项目级配置 — 在项目目录创建 .claude/settings.json，仅在该目录启用
运行时参数 — --plugin-dir 临时加载

项目	Superpowers	Obsidian	Frontend
`~/Work/trade/exchange-rs`	✅	❌	❌
`~/Work/dex`	✅	❌	❌
`~/Documents/.../AI-Brain`	❌	✅	❌
其他所有项目	❌	❌	❌

2. The Fix

Global settings (~/.claude/settings.json) — disable 3 plugins, lower effort, stratify models:

        
        
        
    
- "superpowers@superpowers-marketplace": true,
- "obsidian@obsidian-skills": true,
- "frontend-design@claude-plugins-official": true,
+ "superpowers@superpowers-marketplace": false,
+ "obsidian@obsidian-skills": false,
+ "frontend-design@claude-plugins-official": false,

- "CLAUDE_CODE_EFFORT_LEVEL": "max",
+ "CLAUDE_CODE_EFFORT_LEVEL": "high",

- "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-pro[1m]",
+ "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-flash",

Project-level overrides — re-enable only where needed:

        
# ~/Work/trade/exchange-rs/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Work/dex/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Documents/LocalKnowledge/AI-Brain/AI-Brain/.claude/settings.json
{ "enabledPlugins": { "obsidian@obsidian-skills": true } }

That’s it. Three files, six edits. Rust-analyzer-LSP stays global since it’s a language server with no skill overhead.

2. 方案

全局配置 — 禁用 3 个插件，降低 effort，模型分层：

        
        
        
    
- "superpowers@superpowers-marketplace": true,
- "obsidian@obsidian-skills": true,
- "frontend-design@claude-plugins-official": true,
+ "superpowers@superpowers-marketplace": false,
+ "obsidian@obsidian-skills": false,
+ "frontend-design@claude-plugins-official": false,

- "CLAUDE_CODE_EFFORT_LEVEL": "max",
+ "CLAUDE_CODE_EFFORT_LEVEL": "high",

- "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-pro[1m]",
+ "ANTHROPIC_SMALL_FAST_MODEL": "deepseek-v4-flash",

项目级覆盖 — 哪里需要哪里开：

        
# ~/Work/trade/exchange-rs/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Work/dex/.claude/settings.json
{ "enabledPlugins": { "superpowers@superpowers-marketplace": true } }

# ~/Documents/.../AI-Brain/.claude/settings.json
{ "enabledPlugins": { "obsidian@obsidian-skills": true } }

就这些。三个文件，六行改动。Rust-analyzer-LSP 保留全局——它是语言服务器，没有 skill 开销。

3. Results

To verify the optimization actually worked, I ran controlled tests using --debug:

        
# Test in a neutral directory (no project settings)
cd /tmp && claude --debug --debug-file /tmp/after-debug.txt --print "hello"

Comparing debug outputs before and after:

Metric	Before (Mar 19)	After (May 5)
Enabled plugins	4	1
Plugin skills loaded	18	0
Skill files in prompt	153 KB	0 KB
Hook context	5,430 chars	0 chars
Effort level	max	high
Small model	pro[1m]	flash

The key debug log line:

# Before:  getSkills returning: ... 18 plugin skills ...
# After:   getSkills returning: ... 0 plugin skills ...

Metric	Before	After	Savings
Plugin skills loaded	18	0	-18
Skill files in prompt	153 KB	0	-153 KB
Est. tokens / session	~63K	0	~63K
Hook context	5,430 chars	0	-5,430
Effort level	max	high	thinking ↓30-50%
Small model	pro[1m]	flash	faster + cheaper

The optimization is transparent for active projects — enter the directory and Claude Code automatically picks up the local settings.json, loading exactly the plugins needed. For one-off tasks: claude --plugin-dir <path>.

3. 效果

用 --debug 做受控对比：

        
# 在中性目录测试（无项目配置）
cd /tmp && claude --debug --debug-file /tmp/after-debug.txt --print "hello"

Debug 日志对比：

指标	优化前 (3/19)	优化后 (5/5)
启用插件	4	1
已加载 skills	18	0
Skill 文件注入	153 KB	0 KB
Hook 上下文	5,430 chars	0 chars
Effort level	max	high
轻量模型	pro[1m]	flash

Debug 关键行：

# 优化前:  getSkills returning: ... 18 plugin skills ...
# 优化后:  getSkills returning: ... 0 plugin skills ...

指标	优化前	优化后	节省
已加载 skills	18	0	-18
Skill 文件注入	153 KB	0	-153 KB
每会话估算 token	~63K	0	~63K
Hook 上下文	5,430 chars	0	-5,430
Effort level	max	high	thinking ↓30-50%
轻量模型	pro[1m]	flash	更快更省

主力项目完全无感——进入对应目录，Claude Code 自动读取项目级 settings.json，按需加载插件。临时使用：claude --plugin-dir <path>。

4. Takeaways

Profile first: find ... -name "SKILL.md" revealed 153KB hidden overhead. Can’t optimize what you can’t measure.
Lazy-load plugins: Project-level settings.json scopes plugins to specific directories. Same pattern as direnv / asdf.
Tune effort level: max dramatically increases thinking tokens. high is enough for most tasks — save max for hard problems.
Stratify models: Not every operation needs the flagship model. Use flash variants for background tasks.
Numbers compound: ~63K tokens saved per session × 20 sessions = 1.26M tokens per day.

4. 要点

先剖析： 一行 find ... -name "SKILL.md" 就暴露了 153KB 隐藏开销。不能度量就无法优化。
按需加载： 项目级 settings.json 实现插件按目录隔离，和 direnv / asdf 模式一样。
调优 effort： max 大幅增加思考 token。high 对多数任务够用——难题才用 max。
模型分层： 后台操作用 flash 变体，旗舰模型留给关键任务。
积少成多： 每次省 ~63K，每天 20 次 = 1.26M tokens，省出整个上下文窗口。