OpenAI Codex CLI: What Developers Can Actually Do With It Today
You're mid-sprint, deep in a codebase, and you want an AI that can actually touch your files — not just paste suggestions into a chat window. OpenAI Codex CLI is designed for exactly that: a terminal-native agent that reads your project, runs commands, and makes edits, all without leaving your shell. Here's what it can genuinely do right now and what you should watch out for before trusting it with anything important.
What Codex CLI Actually Is
Codex CLI is an open-source, terminal-based coding agent released by OpenAI in early 2025. It wraps the gpt-4o model (and optionally o3) in a local command-line interface that can read your filesystem, execute shell commands, write and edit files, and iterate on the output — all in a feedback loop driven by your natural-language instructions.
It is not a VS Code extension or a chat UI. It runs inside your existing terminal, which means it fits naturally into shell scripts, CI pipelines, and any workflow where a GUI feels like friction. The tool is available on macOS and Linux, with Windows Subsystem for Linux (WSL) as the supported Windows path.
What You'll Learn
- How to install and authenticate Codex CLI in under five minutes.
- What the three trust modes do and which one to use for which task.
- Concrete use cases that work well today: running code, editing files, generating docs.
- Where the tool still has rough edges and how to work around them.
- How Codex CLI compares to similar tools you may already use.
Prerequisites
Before you start, you'll need: Node.js 18 or later (Codex CLI is distributed as an npm package), an OpenAI API key with access to gpt-4o or o3, and a basic comfort with the terminal. You don't need any special Python environment or Docker setup for basic usage, though a sandboxed environment is recommended for the more autonomous trust modes.
Installing and Authenticating Codex CLI
Installation is a single npm command. You'll typically want it globally so it's available in any project directory:
npm install -g @openai/codexOnce installed, export your API key so the CLI can find it:
export OPENAI_API_KEY="sk-..."You can add that line to your ~/.zshrc or ~/.bashrc to avoid repeating it. Then run a quick sanity check:
codex --versionIf that returns a version number, you're ready. Launch an interactive session with:
codexOr pass a task directly as a string to run it non-interactively:
codex "List all TODO comments in this repo and summarize what's left to do"The Three Trust Modes: Suggest, Auto-Edit, and Full-Auto
Before you run anything substantive, you need to understand how Codex CLI handles permissions. There are three modes that control what the agent can do without asking you first.
Suggest (default)
In Suggest mode, Codex proposes changes and shell commands but does nothing until you explicitly approve each step. This is the safest mode and the right choice when you're exploring a new codebase or running the tool for the first time. Every proposed file write or shell execution shows up as a diff or command preview, and you press y to confirm or n to skip.
Auto-Edit
Auto-Edit allows Codex to write and modify files automatically but still asks for approval before running shell commands. This is the mode most experienced users settle on for day-to-day tasks: you get fast file edits without giving the agent free rein over your environment.
Full-Auto
Full-Auto lets Codex execute commands and write files without any approval prompts. This is powerful but genuinely risky outside a sandbox. OpenAI's own documentation strongly recommends running Full-Auto only inside Docker or a VM where a runaway command can't damage your real system. For the vast majority of tasks, Auto-Edit is the better choice.
What You Can Do With It Today
The capabilities that work reliably right now fall into a few clear categories. Each one involves a different kind of interaction between the agent, your filesystem, and the shell.
Running Code and Reading Output
Codex CLI can write a small script, run it, read the stdout and stderr, and adjust the code based on what it sees. This is useful for tasks where you'd normally iterate manually in the terminal.
For example, you can ask it to write a Python script that fetches data from an API, run it, and fix any import errors or HTTP exceptions it encounters in the output:
codex "Write a Python script that calls the JSONPlaceholder API, fetches the first 10 posts, and prints each title. Run it and fix any errors."The agent writes the file, executes it with python3, reads the output, and corrects issues in the next turn. In Suggest mode you approve each shell call; in Auto-Edit it handles the file changes automatically but still asks before running the script.
This loop is where Codex CLI earns its place. It's not just autocomplete — it actually observes runtime behavior and responds to it.
Editing Files Across a Project
Codex CLI reads your directory tree and can make coordinated changes across multiple files. This makes it practical for refactoring tasks that would otherwise mean opening a dozen files manually.
A realistic example: renaming a function that's used in several modules, updating all call sites, and adjusting the docstring. You can describe the task in plain English and let the agent produce diffs for each affected file:
codex "Rename the function `get_user_data` to `fetch_user_profile` across the entire src/ directory and update all imports."In Suggest mode, you'll see a file-by-file diff before anything is written. Review it carefully — the model is generally accurate on straightforward renames but can miss dynamic references or string-based lookups (e.g., getattr(obj, 'get_user_data')).
If you're evaluating AI-powered coding tools and want to see how Codex CLI stacks up against editor-integrated options, the comparison in GitHub Copilot vs Cursor AI: which cuts dev time more in 2025 gives a useful frame for thinking about where terminal agents fit versus IDE assistants.
Explaining and Documenting Existing Code
Point Codex CLI at a file or directory and ask it to generate docstrings, README sections, or inline comments. This is one of the lowest-risk uses because it's purely additive — the agent reads code and produces text, and you review the result before it touches anything.
codex "Add Google-style docstrings to all public functions in utils/data_processing.py"The output quality is generally solid for standard Python, JavaScript, and TypeScript. For less common languages or highly idiomatic code, expect to edit the generated docs rather than accept them wholesale.
You can also ask for a plain-English explanation of an unfamiliar module:
codex "Explain what the auth middleware in middleware/auth.js does and what edge cases it might not handle"This is a read-only operation regardless of trust mode, so it's safe to use any time.
Common Pitfalls and Rough Edges
Codex CLI is genuinely useful today, but it's not production-hardened in every scenario. Here are the failure modes you're most likely to hit.
Context window limits on large codebases
The agent reads your files, but it has a finite context window. On a large monorepo, it will silently miss files that don't fit in context. You'll get better results by scoping tasks to a specific directory or set of files rather than asking it to reason about the whole project at once.
Destructive edits in Full-Auto mode
Full-Auto with an ambiguous instruction is a real risk. Phrases like:
Clean up this repository.
leave enormous room for interpretation.
The agent might:
- Delete unused files
- Remove configuration it believes is obsolete
- Rewrite working code
- Reformat hundreds of files
The solution isn't avoiding Full-Auto altogether.
It's writing precise, bounded instructions.
Instead of:
Refactor everything.
write:
Refactor only the authentication module.
Do not modify public APIs.
Do not change tests.
Limit changes to src/auth/.
The narrower the scope, the more reliable the results.
Commands Can Have Real Side Effects
One of Codex CLI's biggest strengths is also one of its biggest risks.
It can execute shell commands.
Examples include:
- Running migrations
- Installing dependencies
- Deleting build artifacts
- Creating files
- Running test suites
This means prompts like:
Fix all failing tests.
may result in:
npm install
pytest
rm -rf build
or other commands you didn't explicitly request.
Even in Auto-Edit mode, review proposed shell commands carefully before approving them.
Working With Git Makes Everything Safer
Before giving Codex CLI permission to modify files, commit your current work.
Example:
git add .
git commit -m "Checkpoint before Codex"
Now every change becomes easy to inspect:
git diff
or undo:
git restore .
Treat Codex as another developer on your team.
You wouldn't merge another developer's work without reviewing the diff.
The same principle applies here.
Refactoring Large Codebases
One area where Codex CLI performs surprisingly well is mechanical refactoring.
Examples include:
- Renaming functions
- Updating imports
- Replacing deprecated APIs
- Standardizing logging
- Adding type hints
- Modernizing syntax
Example prompt:
Replace every use of requests
with httpx inside services/.
Keep the public API unchanged.
Run the existing tests afterward.
This kind of repository-wide transformation is significantly faster than performing the edits manually.
Running Test Suites
Another practical workflow:
Run the unit tests.
Identify failures.
Fix only the failures.
Do not modify passing tests.
Codex CLI can:
- Execute the tests
- Read stack traces
- Edit source files
- Re-run tests
- Repeat until clean
This feedback loop is where terminal-native agents have a clear advantage over chat-only assistants.
Generating Documentation
Documentation generation is one of the safest capabilities.
Examples:
Generate a README
for this package.
or:
Document every public REST endpoint.
or:
Add docstrings to every exported function.
Since documentation changes rarely affect runtime behavior, they're excellent candidates for Auto-Edit mode.
Creating Small Utilities
Codex CLI is also useful for writing project-specific scripts.
Examples:
Create a Python script
that finds duplicate images.
Write a Bash script
that rotates log files older than 30 days.
Generate a SQL migration
for adding indexes.
The ability to immediately execute and verify those scripts reduces iteration time considerably.
Understanding Unknown Projects
Sometimes you don't want changes.
You want understanding.
Example:
Explain this repository.
Identify:
- Entry point
- Main services
- Authentication flow
- Database layer
- Build process
Codex CLI traverses the repository and summarizes its findings.
This makes onboarding to unfamiliar projects significantly faster.
CI/CD Assistance
Because Codex CLI operates inside the terminal, it's naturally suited to DevOps tasks.
Examples:
Review this GitHub Actions workflow.
Identify:
- Broken steps
- Security issues
- Slow jobs
or:
Explain why this Docker build fails.
or:
Analyze this Kubernetes manifest.
The agent can inspect configuration files directly rather than requiring you to paste them into a chat window.
Where Human Review Is Still Essential
Despite its capabilities, there are areas where you should never accept changes blindly.
Always review:
Database Migrations
Check:
- Data loss
- Index changes
- Locking behavior
Security Logic
Examples:
- Authentication
- Authorization
- Encryption
- Secrets handling
Infrastructure
Examples:
- Terraform
- Kubernetes
- CloudFormation
- Networking
Financial Calculations
Verify:
- Currency handling
- Precision
- Tax logic
Concurrency
Review:
- Locks
- Async code
- Race conditions
- Thread safety
These areas require engineering judgment beyond what current AI agents reliably provide.
Best Prompting Practices
Instead of:
Fix my project.
use:
Project:
Python Django API
Goal:
Replace deprecated logging.
Requirements:
- No API changes
- Preserve behavior
- Run tests
- Explain every modification.
Providing:
- Technology stack
- Constraints
- Scope
- Success criteria
consistently produces better results.
Productivity Tips
Keep Tasks Small
Instead of:
Modernize the repository.
break work into:
- Update dependencies
- Replace deprecated APIs
- Improve tests
- Improve documentation
Smaller tasks are easier to review.
Use Directory Scoping
Prefer:
Only modify:
src/payments/
rather than the entire repository.
Ask for a Plan First
Example:
Before making changes,
describe your implementation plan.
Reviewing the plan often catches misunderstandings before edits begin.
Require Explanations
Prompt:
Explain why every modification
is necessary.
This makes code review much easier.
Common Mistakes New Users Make
Giving Ambiguous Instructions
Broad prompts create broad edits.
Skipping Git Commits
Always create a recovery point.
Using Full-Auto Too Early
Start with Suggest mode.
Move to Auto-Edit only after building confidence.
Ignoring Command Output
Shell output often explains why the model made a particular change.
Read it.
Expecting Perfect Repository Awareness
Very large repositories may exceed the model's effective context.
Work in logical modules whenever possible.
Should You Use Codex CLI Today?
For many developers, yes.
It's particularly valuable for:
- Bug fixing
- Mechanical refactoring
- Documentation
- Test-driven iteration
- Repository exploration
- Automation scripts
It's less appropriate for:
- Security-critical changes
- Large architectural redesigns
- Financial systems
- Autonomous production operations
Think of it as an extremely capable junior engineer that works incredibly fast but still benefits from senior code review.
Final Thoughts
OpenAI Codex CLI marks an important shift in how developers interact with AI. Instead of copying snippets from a browser into an editor, you can describe a task in natural language and let an agent inspect your project, edit files, run commands, observe the results, and iterate—all from inside the terminal. That tighter feedback loop is what makes Codex CLI genuinely useful rather than just another autocomplete tool.
At the same time, it's important to keep expectations realistic. Codex CLI is most effective when given precise, well-scoped tasks and clear constraints. It excels at repetitive engineering work, project navigation, documentation, and iterative debugging, but it still requires human judgment for architectural decisions, security-sensitive code, and infrastructure changes. Used thoughtfully—with Git checkpoints, careful review, and appropriate trust modes—it can become a valuable part of a modern development workflow without replacing the engineer at the keyboard.
Frequently Asked Questions
Does OpenAI Codex CLI work offline or does it require an internet connection?
Codex CLI requires an active internet connection because it sends your prompts and file context to OpenAI's API for processing. There is no local model bundled with the tool, so you need both an internet connection and a valid OpenAI API key with sufficient quota.
Which OpenAI models does Codex CLI support?
Codex CLI supports gpt-4o by default and can also use o3 when you specify it via the --model flag. The choice of model affects both response quality and API cost, with o3 generally producing stronger results on complex reasoning tasks.
Is it safe to run Codex CLI in Full-Auto mode on my main development machine?
Running Full-Auto mode directly on your main machine carries real risk because the agent can execute shell commands without approval. OpenAI recommends using Full-Auto only inside a Docker container or virtual machine where runaway commands are contained and reversible.
Can Codex CLI work with non-JavaScript and non-Python projects?
Yes, Codex CLI is language-agnostic at the tool level — it reads files and runs shell commands regardless of the language. The underlying model's code quality varies by language, with strong results for Python, JavaScript, TypeScript, and Go, and more variable results for less common languages.
How does Codex CLI compare to using the OpenAI API directly for coding tasks?
Codex CLI adds a filesystem and shell execution layer on top of the raw API, which means it can read your actual project files, run commands, and observe the output rather than just generating text. For one-off code generation you might not need it, but for iterative or multi-file tasks it saves significant manual effort.
📤 Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!