A Journey!

This post should show my ride from my first contact with LLMs mostly ChatGPT to what I'm doing now with them. I wanted to write this post to share my journey with LLMs with colleagues and friends and to provide a comprehensive personal account of my experience from 2023 to the present. Some parts may already be outdated and no longer current, but I think it’s also exciting for me to see how quickly this field is evolving. Furthermore, I believe the learning process itself is very important. I hope I haven’t forgotten anything important; if I have, I’ll add it later.

First experiments

It all started in 2023, when a good friend invited me to try ChatGPT. I had helped him with several projects, so he wanted to return the favor and offered to pay for my ChatGPT subscription for a few months. That was in April 2023. The latest models available were GPT-3.5 and the early version of GPT-4. Like many others, I started by asking silly questions and laughing at the silly answers. After a while, I started using it for programming. Simple things like a file converter, where I didn’t care what the code looked like. Back then, the code was often buggy - you’d just copy the entire code or parts of it back into the chat window and usually get the end result — a script to convert a file. Sometimes the code would just crash, and you’d have to start over from scratch, because the whole context was falling apart. Back then, I also experimented with Richard Socher’s “you.com,” where I generated my first pieces of text. During the COVID-19 pandemic, Richard recorded a great podcast with “Alles Gesagt,” in which he predicted what would happen with LLMs over the past years. In July 2023, custom prompts were introduced by OpenAI to improve the output. Later, custom GPTs were added and the GPT Store was launched in 2024. From then on, you could see and understand how other people tended to optimize their chat output. The model's knowledge base was a major issue back then, since there was no way for bots to search the internet. For example, I used a custom GPT for the latest Angular version because the documentation wasn’t included in ChatGPT’s training data. The challenge was to provide the model with as much information as possible. The output got slightly better. In the summer of 2024, I found these nice articles on how to use LLMs. This is something I still do when I’m using ChatGPT, and I sometimes include it in my AGENTS.md file to tell it what kind of person it should be. I’m not sure if that makes a difference these days, but I hope it doesn’t make the output worse. What We Learned from a Year of Building with LLMs — Part I, Part II, Part III In 2024 I also heard for the first time from Ollama and the possibility to run small models on local hardware and It’s still a cool way to run models locally. I experimented with a fairly small Mistral model and LangChain, and while the results weren’t great, they weren’t too bad for running a model on a MacBook Air—though it was slow. I think with the latest developments in Ollama.cpp, there’s a good chance we’ll be able to run a sufficiently good model on powerful Apple hardware by the end 2026. After all, why would I need a lab model that possesses knowledge I don’t need at all? Maybe it’s time to add a new Mac to the family and start playing again. :)

The slot machine

I first read about this cool new thing called Claude Code on Twitter in 2025. That’s where Mario Zechner started tweeting about it. I followed him because of his work tracking food prices in Austria Heisse Preise and his project Cards for Ukraine, which aims to help Ukrainian refugees in Austria. After following the development of Claude Code, I decided >> I NEED THIS THING <<. So I invested in another subscription and sold out to Claude Code. It was incredible :D. I was completely hooked. The results were insane. I can’t remember where I first read that it’s like playing a slot machine. That’s exactly how it feels. One more prompt, and another, and another. Oh, it’s already 2 a.m. :D At some point, Anthropic introduced stricter rate limits, which improved my sleep schedule again, since I didn’t want to spend money on another subscription. While I was following Mario, he began working on VibeTunnel alongside Armin Ronacher—the developer of the Flask web framework—and Peter Steinberger—the founder of PSPDFKit and OpenClaw 🦞. To be honest, these three have had the greatest influence on my thinking about LLMs and the workflows I use. Here’s an old post by Peter from August 2025: Optimal AI Development Workflow. It evolved into Shipping at Inference Speed in December 2025. Thanks to better models and harnesses, workflows have changed significantly over the past year. There’s also an interesting podcast called “Skill Issue.” Skill Issue Also, a video by Dex Horthy No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer, in which he demonstrated how to solve problems in complex codebases. And the changes in his latest talk since the previous video: Everything We Got Wrong About Research-Plan-Implement The whole field feels like the Wild West because there’s a new idea or concept almost every day. During that time, I increasingly encountered the limitations of the slot machine approach. I had created a codebase that was completely unmaintainable. I was exhausted because it was already late at night and the whole machine was running like crazy. The next day, I reviewed what I had done and saw the chaos. I never tried to fix the whole thing, but instead started over with more restrictions and better control, paying closer attention to the context window and the output. That was a very important learning experience. The project wasn’t large, so I was able to rewrite it in a few days, but it showed me that you still need to understand the fundamentals of your codebase and set strict guidelines.

Pi

At the end of 2025, I canceled my Claude Code subscription and switched back entirely to OpenAI’s Codex. There, I immediately started using Pi as my harness—a good explanation of an agent harness (The Anatomy of an Agent Harness)—and I’ve never regretted it. Check out Pi. It took me a while to grasp the power of this harness’s simplicity. I liked Mario’s idea of modifying the harness rather than changing my own workflow. With its extensions and features, it’s very easy to adapt to your own needs. The tree functionality is also fantastic. If you realize you’re going in the wrong direction, you can jump back with /tree and regain your context. The small system prompt is also simply great compared to the enterprise harness eg. Copilot CLI, which takes up 11% of my context window immediately after starting a session with Opus and I don't know why.

Workflows

At the beginning of the year, I started using Plan.md as a changelog and Architecture.md to document my architectural notes. The files have been referenced by the “AGENTS.md”, so that the agents would be updated to the latest version upon startup. I chose this approach to avoid the “Shit Zone”, which starts at about 40% of the context used. Normally, I would restart the session when I approached this limit and tell the agents to read the markdown files to find out what was going on. However, with longer projects, you still run into problems, because Plan.md and Architecture.md keep getting longer. You can split them into multiple markdown files, but in my opinion, that ultimately makes things very confusing. Last month, I changed my workflow again and stopped writing markdown files for planning and architecture. Since agents are extremely efficient at reading code, I wrote an extension for Pi to thoroughly examine the codebase I want to check out. I got this idea from this talk by Lukas Meijer Love letter to pi and expanded on it with ideas from this great talk by Matt Pocock at AI Engineer Europe Software Fundamentals Matter More Than Ever. I’ve also started using a to-do and answer extension extensively. (Which are custom extensions in pi) Once things are clear and I’m on the same page with the agent, I begin implementation with new sessions and have them work on the todos. You have to be aware that all these things come with problems and drawbacks. The amount of generated code and the amount you have to review is enormous and can be very taxing, at least on my brain, and I sometimes tend to accept what the agent tells me, which I really shouldn’t do, but my brain can’t resist

Slowing down

Another challenge will be the pressure we’ll face, because code can be written quickly, but decision making takes time, and points of friction are crucial for acheiving a good result (Some Things Just Take Time) and ultimately, people are still responsible. Currently, I’m observing that more and more people are being sucked into this slot machine vortex. I’d like to conclude this post with another excellent blog post by Mario. Thoughts on Slowing the Fuck Down