Setting up an Agentic (AI) Environment

Ever meet a carpenter with only one hammer? No?

Oct 02, 2025

Anchor Image courtesy of the Custom GPT I built for just this purpose… but the writing itself will never be AI

Everyone keeps saying there’s no ROI on Agentic AI. I understand the problems with getting AI slop—workslop—that takes extra time to sift through from your coworkers. That sounds like a nightmare. Like proofreading someone else’s emails.

In the last 2 weeks, however, as a solo project, I’ve built a WatchOS app with a lightweight FastAPI backend without any prior experience in either System Design or in SwiftUI (the language Apple demands for Apple Watch apps).

I’m no genius. Nor am I a Vibe Coder, though. I’d like to point out that, despite seven years of professional experience in Python, my experience amounted to a titled position as a Software Engineer only in the last year and a half, so I don’t have much to wave around.

Still, I have been working as a part of an Agile team long enough to understand how Scrums, Git, Pre-commit, Pull Request Reviews work, it’s true. And yet actually building a working application would have been miles outside my comfort zone before Agentic AI showed up.

It may not be useful as a total Novice. It might make you slower as an experienced Engineer.

But as an Entry-to-Mid Level coder, all the hype about them boosting productivity actually makes perfect sense.

I’ll be using this blog to document what I’m learning as I build apps from scratch using Agentic AI. What better place to start than with a rundown of the current iteration of an AI environment that I am working with.

Here’s the setup.

ChatGPT Project

An excellent learning tool. A bit like getting an answer from a too-encouraging user of Stack Overflow. I like to use the “Projects” feature to keep my conversations organized and to work off of a shared context window, so the bot knows what we’ve already built and can distinguish it from the other apps I am working on simultaneously.

ChatGPT is good for:

Figuring out frameworks, requirements, languages and overall System Design. Think of it as working through questions like you’re conducting a Systems Design interview.
Asking questions about decisions made by the Agentic models so you are actually understanding the code that is going up on your GitHub.
Coming up with more thorough and effective prompts for Agentic models (I’ll have a whole article about this at some point, I promise.)

OpenAI Codex

At some point I’m going to have to write a number of articles about the truly cosmic, powerful feeling you get when you spin up fifteen “Tasks” for Codex at the same time. It feels a bit like being a manager of a team of junior Engineers (especially when you find yourself slogging through their sloppy code in Reviews) mixed with straight up gambling (flashy colors, probabilistic chance of success, illusory sense of control, triggering addictive dopaminergic systems...)

Codex (cloud) is good for:

Coding in type-friendly languages such as Python or React Native with Expo and Typescript
Iterating on ESLint or Ruff / Mypy or pytest until everything passes all Green. Setting up a robust CI pipeline and suite of pre-commit hooks is essential and pays off massively when you are coding with Codex. I have a repo that is a good starting point for an Agentic Python Project.
Following the instructions you put in your AGENTS.md file (again, I have a good example of a robust version of this in the GitHub repo linked above).
Coding many parallelizable tasks at once
Asking questions about your codebase! (”Explain to me how the frontend is constructed using class, function and variable names as appropriate” or “Describe and explain the CRUD endpoints and underlying Data Models available via our API” or “Create an Epic Roadmap to introduce [massive feature]”)

Unfortunately, the Cloud version of Codex is not great at coding SwiftUI. It operates in a Linux VM, so it can’t run things like brew install swiftformat or xcodebuild in order to test that what it’s working on is high quality.

Claude Code

That’s where Claude comes in. Where Codex cloud only interacts with code in virtual machines and you can’t actually touch it yourself unless you open a pull request and check it out using gh, Claude code works from the terminal and can be used with all the mac tooling that your heart desires.

I do find that Claude Code is not as diligent about following system prompt instructions like “iterate on pre-commit until everything passes green.” (This was written prior to the release of Sonnet 4.5. I’m still evaluating the new model.)

However, the upside is that you have much more control. Programming with Claude Code is more like pair programming, you can steer it in different directions while it codes, you can approve the commands it runs, you can see its chain of thought and learn from the strategies that it uses to tackle problems.

Being able to redirect Claude actually resulted in a massive value add on one of my projects. A tricky problem that neither Claude nor Codex had been able to solve (as a result of a poorly named column in my SQL database) might have persisted if I hadn’t seen Claude’s thought process and jumped in. And then Claude happily refactored the entire codebase to use a more descriptive name for the object in question, a tedious job from which I was grateful to be liberated.

Undoubtedly this is yet another story that deserves its own article. Stay tuned!

Claude Code Reviews

Claude can actually be set up to automatically trigger to review any push to a PR on GitHub, working in the same manner as other CI Actions. I’ve found these essential for coding in areas in which I am unfamiliar. I have learned innumerable things from reading Claude Code Reviews.

My strategy has been to set Claude up in opposition to Codex, and I often find myself copying Claude’s feedback directly into Codex’s prompt. This is a step up, but definitely a step on the same staircase, from Vibe Coding, a practice in which we have famously been provoked to copy our error messages into the prompt without commentary.

Unlike that strategy, you have to be aware of what you’re doing with Code Reviews—discernment is required. Some of Claude’s feedback is total bullshit and unrelated to the issue you are currently working on, or outside the scope of the current task, or simply inaccurate and untrue. An error message is precise and, typically, prescriptive of a specific solution. A code review has more wiggle room, more ambiguity, more subjective taste (even when written by a being with no subjective experience of reality.)

A human has to be in the loop.

But here’s the thing…

But as long as you have some basic understanding of programming fundamentals, as long as you are willing to learn, and as long as you actually pay attention to what you are doing, and as long as you set up your Agentic Environment in a sensible, deliberate way, a mid-level engineer can absolutely build an app that looks and feels like it came from a much more seasoned expert.

Let me know what you think!

An Agentic Development

Discussion about this post

Ready for more?