Skip to main content

Command Palette

Search for a command to run...

Three things that don't port between Claude Code, Codex, and Windsurf

Updated
3 min read
A
I'm Alex Voloshin. I help engineering teams operationalize AI coding agents. This blog is the long-form companion to ai-assets — a public repo of 26 agents, 53 skills, 45 eval rubrics with 270 calibration samples, 18 hooks, and 32 user-invocable workflows that work across Claude Code, Codex, and Windsurf. What you'll find here: - Tri-vendor parity write-ups. Engineering teams adopting AI coding agents shouldn't have to lock into one runtime to get production-grade tooling. Posts here track concrete differences between Claude Code, Codex, and Windsurf, and the patterns that survive across all three. - Eval methodology. Most public discourse on AI coding-agent evaluation is hand-wavy. I write about specific rubrics, calibration sample design, and the failure modes calibration data catches in production. - Formal-methods × agentic dev. LTL, model checking, and formal requirements review apply to agent specifications more directly than people think. Posts at the intersection of academic CS and practical agent orchestration. - Production war stories. Concrete failure modes, post-mortem-style write-ups, and the patterns that grew out of fixing them. What you won't find here: - Generic "AI is changing everything" takes - Tutorials for first-time agent users (better resources exist for that) - Vendor-locked content disguised as agnostic advice 15+ years building production software. Currently formalizing the CS foundation through an MS in Computer Science. Subscribe below if you want one well-edited post per ~2 weeks. No spam, no growth-hack frequency.

Claude Code, Codex, and Windsurf now look alike: all three have skills, hooks, and plugins. So it's natural to assume your setup ports between them. It mostly doesn't.

I maintain a repo that targets all three runtimes and keeps a parity matrix. Here are the three gaps that cost the most time, and what to do about each.

Feature Claude Code Codex Windsurf
Hooks PreToolUse runs before a tool and can block it command hooks run; prompt/agent hooks parsed but skipped Cascade Hooks fire after the action
Skill triggering model-invoked from the description implicit invocation is a toggle rules and workflows
Distribution plugin.json plus a marketplace git repo its own plugin format rules/workflows plus MDM

Will my hooks still block what they blocked before?

Not always. On Claude Code, a PreToolUse hook runs before the tool and can stop it. That is a real guardrail. On Windsurf, Cascade Hooks fire after the agent acts, so they can flag or revert, but they never prevent. Codex runs command hooks, but currently parses and skips prompt and agent hooks. A hook that blocks a dangerous command on one runtime only logs it on another.

Will my skills trigger the same way?

The SKILL.md body ports. The trigger does not. Claude Code auto-invokes a skill from its description. Codex makes implicit invocation a switch: turn allow_implicit_invocation off and the skill runs only when you type $skill yourself. Codex also reads skills from a different path. The skill still works. It just may never fire on its own.

Can I publish a plugin once and install it everywhere?

No. Claude Code, Codex, and Windsurf each have their own manifest, catalog, and versioning. One plugin means three manifests to maintain and three install paths to test. "Publish once, install everywhere" is a slogan, not a workflow. Budget distribution as a per-runtime cost.

What to do

  1. Keep a parity matrix. Write down what you have verified on each runtime. Treat the rest as unknown, not assumed.

  2. Re-test on every runtime, especially any hook that enforces a security rule.

  3. Budget distribution per runtime, not once.

FAQ

Do skills written for Claude Code work in Codex? The SKILL.md body usually does. The discovery path and the invocation policy need adjusting per runtime.

Which runtime has the strongest hook guarantee? Claude Code. Its PreToolUse hook can block a tool call before it runs. Treat hooks on the other runtimes as observe-and-react.

The parity matrix I keep is public, in the repo: github.com/alex-voloshin-dev/ai-skills. If you run more than one of these runtimes, start your own.