Coding Agent Workflow Research: Claude Code & OpenAI Codex CLI¶
Date: 2026-04-28 Focus: Practical workflow improvements, productivity patterns, and new features
1. Claude Code Workflow Improvements¶
1.1 Key Features (2025-2026)¶
Claude Code has evolved from a terminal-only tool into a multi-surface agentic coding platform available across terminal, VS Code, JetBrains, Desktop app, and web browser.
Multi-Surface Availability - Terminal CLI (primary, full-featured) - VS Code extension with inline diffs, @-mentions, plan review - JetBrains plugin (IntelliJ, PyCharm, WebStorm) - Desktop app (macOS, Windows) with visual diff review, multiple sessions - Web-based (claude.ai/code) with no local setup required - iOS app for mobile continuation of tasks
Source: Claude Code Overview
Routines (Scheduled Automation) - Run Claude Code on autopilot with scheduled, API-triggered, or GitHub-event-triggered routines - Execute on Anthropic-managed cloud infrastructure (works when laptop is closed) - Triggers: cron schedule, HTTP POST endpoint, GitHub PR/release events - Use cases: nightly PR reviews, alert triage, docs drift detection, deploy verification - Available on Pro, Max, Team, and Enterprise plans
Source: Claude Code Routines
Agent Teams (Experimental)
- Coordinate multiple Claude Code instances working together
- One session acts as team lead, others as teammates with independent context windows
- Shared task list with self-coordination and inter-agent messaging
- Display modes: in-process (Shift+Down to cycle) or split panes (tmux/iTerm2)
- Use cases: parallel code review (security/performance/tests), competing hypothesis debugging, cross-layer coordination
- Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
Source: Agent Teams
Claude Agent SDK (formerly Claude Code SDK)
- Build production AI agents with Claude Code as a library
- Available in Python (claude-agent-sdk) and TypeScript (@anthropic-ai/claude-agent-sdk)
- Built-in tools: Read, Write, Edit, Bash, Monitor, Glob, Grep, WebSearch, WebFetch, AskUserQuestion
- Features: hooks (callbacks), subagents, MCP integration, permissions control, session management
- Use cases: CI/CD pipelines, custom applications, production automation
Source: Agent SDK Overview
1.2 Hooks System and Automation Patterns¶
Hooks are deterministic automation triggers that execute shell commands, HTTP calls, LLM prompts, or MCP tools at specific lifecycle points.
Five Hook Types 1. Command hooks: Shell scripts receiving JSON on stdin, returning decisions via exit codes 2. HTTP hooks: POST requests to endpoints with JSON event data 3. MCP Tool hooks: Call tools on connected MCP servers 4. Prompt hooks: Send prompts to Claude model for yes/no decisions 5. Agent hooks: Spawn subagents for tool-based verification
Hook Events | Event | When | Use Case | |-------|------|----------| | SessionStart | Session begins | Load context, set env vars | | UserPromptSubmit | User submits prompt | Validate, block, add context | | PreToolUse | Before tool executes | Block dangerous commands, modify input | | PostToolUse | Tool succeeds | Run linters, validate output | | Stop | Claude finishes | Validate final response | | Notification | Claude needs attention | Desktop notifications | | TeammateIdle | Teammate going idle | Quality gates for teams |
Practical Hook Patterns
- Auto-format after every file edit (PostToolUse on Edit/Write)
- Block destructive commands like rm -rf (PreToolUse on Bash)
- Run ESLint after edits (PostToolUse)
- Desktop notifications when Claude needs attention (Notification)
- Environment variable loading from .env files (SessionStart)
- Auto-approve safe operations (PermissionRequest)
Source: Claude Code Hooks
1.3 MCP Server Integrations¶
MCP (Model Context Protocol) enables Claude Code to connect to external data sources and tools.
Key Integration Patterns - Issue trackers (Jira, Linear) for implementing features from tickets - Databases for querying and analyzing data - Monitoring tools (Datadog, Sentry) for analyzing alerts - Design tools (Figma) for implementing designs - Communication (Slack) for sending updates - Browsers (Playwright) for testing web applications - Documentation servers (Context7) for library docs
Connectors for Routines: MCP connectors work with cloud routines, enabling scheduled automations that read from Slack, create Linear tickets, etc.
Source: Claude Code Overview
1.4 CI/CD Integration¶
GitHub Actions (anthropics/claude-code-action@v1)
- Trigger with @claude mention in any PR or issue comment
- Automated code review on every PR
- Custom automation with scheduled workflows
- Supports direct API, AWS Bedrock, and Google Vertex AI
- Skills integration for domain-specific workflows
GitLab CI/CD integration also available.
Non-interactive mode (claude -p "prompt")
- Integrates into build scripts, pre-commit hooks, pipelines
- Output formats: text, JSON, stream-json
- Composable with Unix pipes: cat error.log | claude -p "explain root cause"
Source: GitHub Actions
1.5 Context and Memory Management¶
CLAUDE.md Files (Hierarchical)
- ~/.claude/CLAUDE.md: Global instructions for all sessions
- ./CLAUDE.md: Project-level, shared via git
- ./CLAUDE.local.md: Personal project notes (gitignored)
- Child directories: Loaded on demand when working in those areas
- Supports @path/to/import syntax for including other files
Auto Memory: Claude builds memory as it works, saving learnings like build commands and debugging insights across sessions without manual configuration.
Context Window Management
- /clear between unrelated tasks
- /compact <instructions> for targeted summarization
- /rewind to checkpoint and restore
- /btw for side questions that don't enter conversation history
- Subagents for investigation (separate context, reports back summaries)
- Extended thinking with adaptive reasoning (configurable via effort levels)
Session Management
- claude --continue to resume most recent conversation
- claude --resume <name> to pick from named sessions
- /rename for descriptive session names
- claude --from-pr 123 to resume sessions linked to a PR
- Session picker with search, preview, branch filtering
Source: Best Practices
1.6 Parallel Development with Worktrees¶
Git Worktree Integration
- claude --worktree feature-auth creates isolated worktree with new branch
- Each session gets its own copy of the codebase
- Worktrees created at <repo>/.claude/worktrees/<name>
- Subagents can use worktree isolation with isolation: worktree frontmatter
- .worktreeinclude file copies gitignored files (.env) to new worktrees
- Automatic cleanup: no changes = auto-remove; changes = prompt to keep/remove
Source: Common Workflows
2. OpenAI Codex CLI Workflow Improvements¶
2.1 Overview¶
OpenAI Codex CLI is a lightweight local coding agent written primarily in Rust (96.2% of codebase). Licensed under Apache-2.0 with 78.5k+ GitHub stars.
Installation
npm install -g @openai/codex
# or
brew install --cask codex
Access Methods
- Terminal CLI (codex command)
- IDE extensions (VS Code, Cursor, Windsurf)
- Desktop app (codex app)
- Web version (chatgpt.com/codex - cloud-based, separate product)
Authentication - ChatGPT account sign-in (Plus, Pro, Business, Edu, Enterprise plans) - API key authentication (alternative)
Source: GitHub - openai/codex
2.2 Key Features (Based on Available Documentation)¶
Local Execution Model - Runs entirely on your machine - Sandboxed execution for safety - Works with your local filesystem and tools
Cloud Companion (Codex Web) - Separate cloud-based agent at chatgpt.com/codex - Can run tasks in background, clone repos - Produces PRs and artifacts independently
Multi-Model Support - Access to OpenAI models (GPT-4o, o3, o4-mini) - Model selection based on task complexity
IDE Integration - VS Code, Cursor, Windsurf extensions - Desktop application experience
2.3 Known Capabilities (from community reports)¶
- File reading and editing
- Command execution in sandboxed environment
- Git operations
- Code generation and refactoring
- Multi-file changes
- Approval workflows before modifications
3. Cross-cutting Workflow Patterns¶
3.1 The Plan-Execute-Verify Loop¶
Both tools benefit from separating research, planning, and execution:
- Explore: Use read-only mode to understand codebase (Claude Code's Plan Mode)
- Plan: Create detailed implementation plan before coding
- Execute: Implement with the plan as guide
- Verify: Run tests, check output, validate results
Claude Code specific: Shift+Tab to toggle Plan Mode; Ctrl+G to edit plan in text editor
3.2 Context Window Management¶
The single most important resource to manage across all coding agents:
Strategies
- Clear context between unrelated tasks
- Use subagents/separate sessions for research (keeps main context clean)
- Scope investigations narrowly
- Provide verification criteria upfront (reduces back-and-forth)
- Reference files with @ instead of pasting content
- Pipe data in rather than describing it
Anti-pattern: The "kitchen sink session" where unrelated tasks accumulate in one context.
3.3 Verification-Driven Development¶
The highest-leverage practice across all coding agents:
- Always provide tests, screenshots, or expected outputs
- Let the agent verify its own work
- Include specific test cases in prompts
- Use visual verification for UI changes (screenshots, browser tools)
- Run linters and type checkers as verification gates
3.4 Parallel Session Patterns¶
Writer/Reviewer Pattern - Session A implements a feature - Session B reviews Session A's output with fresh context - Session A addresses feedback
Test-First Pattern - Session A writes tests - Session B writes code to pass them
Multi-file Fan-out
for file in $(cat files.txt); do
claude -p "Migrate $file from React to Vue. Return OK or FAIL." \
--allowedTools "Edit,Bash(git commit *)"
done
3.5 Interview-Driven Requirements¶
For larger features, have the AI interview you first:
I want to build [brief description]. Interview me in detail using the
AskUserQuestion tool. Ask about technical implementation, UI/UX, edge
cases, concerns, and tradeoffs.
Start a fresh session to execute the resulting spec (clean context for implementation).
3.6 Git Workflow Integration¶
- Commit frequently with descriptive messages
- Use feature branches for all work
- Create PRs directly from the agent
- Use worktrees for parallel development
- Name sessions after branches/tasks for easy resume
4. New Ecosystem Tools¶
4.1 Claude Agent SDK Demo Agents¶
Available at github.com/anthropics/claude-agent-sdk-demos:
| Demo | Pattern |
|---|---|
| Email Agent | IMAP integration, agentic email search |
| Research Agent | Multi-agent coordination, parallel processing, web search |
| Resume Generator | Multi-source data gathering, document generation |
| Simple Chat App | React + Express WebSocket interface |
| AskUserQuestion Previews | HTML preview rendering for branding decisions |
4.2 Claude Code Plugins¶
Plugin marketplace (/plugin to browse) bundles skills, hooks, subagents, and MCP servers:
- Code intelligence plugins for language-specific symbol navigation
- Automatic error detection after edits
- Community-contributed workflow extensions
4.3 Routines Infrastructure¶
Cloud-based automation that replaces many custom CI/CD workflows: - Scheduled code reviews - Alert triage with auto-fix PRs - Documentation drift detection - Deploy verification - Library porting between SDKs
4.4 Remote Control and Cross-Device Workflows¶
- Remote Control: Continue local sessions from phone/browser
- Channels: Push events from Telegram, Discord, iMessage, webhooks into sessions
- Teleport:
claude --teleportto pull web/mobile tasks into terminal - Dispatch: Message from phone, get desktop session created
- Slack Integration: Mention
@Claudein Slack with bug report, get PR back
4.5 Notable Community Tools (from ecosystem)¶
- oh-my-claudecode: Plugin framework with autopilot, team orchestration, ultrawork parallel execution
- claude-mem: Persistent cross-session memory with search, timeline, knowledge corpora
- session-wrap: Session lifecycle management, history insights, documentation updates
- skill-creator: Create, modify, and benchmark custom skills
5. Community Tips and Power User Techniques¶
5.1 Session Hygiene (from Anthropic best practices)¶
- Clear aggressively:
/clearbetween unrelated tasks is the single most impactful habit - Two-strike rule: If you've corrected Claude twice on the same issue,
/clearand write a better prompt - Name everything:
/renamesessions for easy resume; treat sessions like branches - Checkpoint before risk: Claude auto-checkpoints; use
/rewindto restore code + conversation
5.2 Prompt Engineering for Agents¶
High-leverage prompt patterns: - Include verification criteria: "run the tests after implementing" - Reference patterns: "look at how HotDogWidget.php is implemented, follow that pattern" - Scope explicitly: "fix the TypeError in user.ts, not the warning in config.js" - Address root causes: "fix the root cause, don't suppress the error"
Low-leverage patterns to avoid: - Vague requests without success criteria - Over-specified CLAUDE.md files that get ignored - Unscoped investigations that fill context
5.3 Subagent Delegation¶
Use subagents for anything that reads many files:
Use subagents to investigate how our authentication system handles token
refresh, and whether we have any existing OAuth utilities I should reuse.
The subagent explores in separate context and reports back a summary, keeping your main conversation clean.
5.4 Auto Mode for Uninterrupted Execution¶
claude --permission-mode auto -p "fix all lint errors"
5.5 Skills for Repeatable Workflows¶
Create .claude/skills/fix-issue/SKILL.md:
---
name: fix-issue
description: Fix a GitHub issue
disable-model-invocation: true
---
Analyze and fix the GitHub issue: $ARGUMENTS.
1. Use `gh issue view` to get details
2. Search codebase for relevant files
3. Implement fix
4. Write and run tests
5. Create PR
Invoke with /fix-issue 1234.
5.6 Hooks for Guaranteed Behaviors¶
Unlike CLAUDE.md instructions (advisory), hooks are deterministic: - Post-edit formatting that never gets skipped - Notification when Claude needs attention - Blocking writes to protected directories - Auto-running tests after changes
6. Comparison Matrix¶
| Feature/Capability | Claude Code | Codex CLI |
|---|---|---|
| Runtime | Node.js binary | Rust binary |
| License | Proprietary (Anthropic subscription) | Apache-2.0 (open source) |
| Models | Claude Opus, Sonnet, Haiku (4.5-4.7) | GPT-4o, o3, o4-mini |
| IDE Integration | VS Code, JetBrains | VS Code, Cursor, Windsurf |
| Desktop App | Yes (macOS, Windows) | Yes |
| Web/Cloud Version | Yes (claude.ai/code) | Yes (chatgpt.com/codex) |
| Mobile Access | iOS app, Remote Control | ChatGPT mobile |
| Hooks System | 5 hook types, 8+ events | Not documented publicly |
| Agent SDK | Python + TypeScript SDK | Not available (open source repo) |
| Multi-Agent Teams | Native (experimental) | Not available |
| Subagents | Built-in with isolation | Not documented |
| MCP Support | Full (connectors, tools, resources) | Not documented |
| Scheduled Routines | Yes (cloud infrastructure) | Not available |
| CI/CD Integration | GitHub Actions, GitLab CI | GitHub integration |
| Worktree Support | Built-in (--worktree flag) |
Manual |
| Plan Mode | Yes (Shift+Tab toggle) | Not documented |
| Permission Modes | Default, Auto, Plan, custom rules | Sandboxed execution |
| Context Management | /clear, /compact, /rewind, subagents | Not documented in detail |
| Session Persistence | Full (resume, fork, named sessions) | Not documented |
| Non-interactive Mode | claude -p with output formats |
Available |
| Pipe/Unix Composability | Full (stdin/stdout, pipes) | Available |
| Plugin System | Marketplace with /plugin | Not available |
| Custom Commands/Skills | .claude/skills/ with SKILL.md | Not documented |
| Memory System | CLAUDE.md + auto memory | Not documented |
| Provider Flexibility | Anthropic, Bedrock, Vertex, Foundry | OpenAI API only |
| Pricing | Subscription (Pro/Max/Team/Enterprise) or API | ChatGPT subscription or API |
| GitHub Integration | @claude in PRs, code review, routines | Repository cloning, PRs |
7. Workflow Anti-patterns¶
7.1 The Kitchen Sink Session¶
Problem: Starting with one task, asking something unrelated, going back to first task. Context fills with irrelevant information.
Fix: /clear between unrelated tasks. Treat sessions as single-purpose.
7.2 Correction Spiral¶
Problem: Claude does something wrong, you correct it, still wrong, correct again. Context polluted with failed approaches.
Fix: After two corrections, /clear and write a better initial prompt with what you learned.
7.3 Over-specified CLAUDE.md¶
Problem: Too many instructions cause Claude to ignore the important ones. Fix: Ruthlessly prune. If Claude already does something correctly without the instruction, delete it. Convert deterministic rules to hooks instead.
7.4 Trust-then-Verify Gap¶
Problem: Agent produces plausible code that doesn't handle edge cases. Fix: Always provide verification (tests, scripts, screenshots). If you can't verify it, don't ship it.
7.5 Infinite Exploration¶
Problem: Asking to "investigate" without scoping. Agent reads hundreds of files, filling context. Fix: Scope investigations narrowly or use subagents so exploration doesn't consume main context.
7.6 Premature Implementation¶
Problem: Jumping straight to coding without understanding the problem. Fix: Use Plan Mode to separate exploration from execution. Research first, then implement.
7.7 Ignoring Context Costs¶
Problem: Not tracking how much context is being consumed.
Fix: Use custom status lines to monitor context usage. Run /compact proactively. Use subagents for research.
7.8 Monolithic Sessions for Parallel Work¶
Problem: Trying to do everything in one session when tasks are independent.
Fix: Use worktrees, agent teams, or multiple terminal sessions. Fan out with claude -p loops.
Key Takeaways¶
-
Context window management is the meta-game. Every other optimization serves this goal. Clear aggressively, delegate to subagents, and keep sessions focused.
-
Verification is the highest-leverage investment. Agents with clear success criteria dramatically outperform agents without them. Tests > screenshots > manual review.
-
Hooks beat instructions for deterministic behavior. CLAUDE.md is advisory; hooks are guaranteed. Use hooks for formatting, linting, notifications, and safety gates.
-
Parallel execution multiplies throughput. Worktrees, agent teams, fan-out scripts, and writer/reviewer patterns all enable horizontal scaling.
-
Claude Code has a significantly deeper feature set for workflow automation (hooks, routines, SDK, teams, MCP), while Codex CLI's open-source nature and Rust implementation offer different tradeoffs.
-
The ecosystem is maturing rapidly. Plugins, skills, routines, and the Agent SDK are creating an extensibility layer that goes beyond simple chat-with-code.