First Steps in Vibe Coding - "look Ma! No hands!"
Here's what I know about vibe coding. There's a lot more to know but here's what I'm confident in right now.
First, what do I mean by vibe coding? I mean the process of describing what you want to a coding agent using langauge and having that coding agent turn your words into working software (for some definitions of 'working' and 'software').
So, with that out of the way, here's what I know about vibe coding.
I know it's got a terrible name (no criticism here, naming things is hard). But 'vibe coding' makes it sound casual like you could do it on the beach in board shorts in between catching waves. It's only casual once you know how to do it well enough to get reliable, repeatable results.
Vibe coding only makes coding look easy.
Vibe coding and coding agents are not a 'no-code' solution. You need to have some understanding of how to solve problems in code.
Why? Because it's not about saying "build a free-to-play game with paid-for add-ons', you'll need to give your coding agent step by step instructions and you'll need to be explicit about what's expected in each of those steps.
If you can't break the work down to the individual steps needed to achieve it, you don't understand the work well enough to tell a coding agent what you want. You are now an architect/ designer rather than an engineer. Your core skill is decomposing problems into unambiguous tasks with clear success criteria.
Everything you know about problem solving still stands. Coding agents might remove the need to fully undestand syntax, libraries, environment set up etc, but they will get insist on deploying approaches that are inefficient, unworkable or just not sensible. That's where you need to step in and question its approach, redirect its focus.
You are still responsible for the coding agents output. It's on you and no one else to ensure that, for example, the '6 Pillars of the Well-Architected Framework': operational excellence, reliability, security, sustainability, performance efficiency and cost optimisation are implemented by your coding agent of choice. Its work goes out in your name, it's your job to protect your name.
You need to think like a product manager. Start small and iterate on a core value loop. What's the smallest implementation of your product that delivers on the key outcome?
You will fail if you start with the whole thing and expect the coding agent to deliver. For example
Build a subscription-based productivity app with to-do lists, deadlines and a time tracker.
might be something someone might pay for but as an instruction to a coding agent it's so lacking in specificity as to be useless.
Here's how I put the above into practice.
This is a relatively trivial single-function appplication but it demonstrates the initial idea 'How could I do X?' to working code journey enabled by vibe coding.
Case Study: Image Text to Kindle
I recently found this article on how to understand anything on Substack. But its multi-column newspaper-style format was hard to read on my mobile phone.
Too much zooming in and out. Too much like hard work.
I wondered, could I code something that would take the images and convert them to kindle format so I could read it there?
Spoiler alert: yes I could if I vibe coded it.
Method:
Let's break down how I achieved this.
Before I started:
- I set up Roocode and Visual Studio to work with OpenAI.
- (this decision is more about me having credit in my OpenAI account that I need to use, Anthropic's Claude would be a great alternative)
- I logged into my OpenAI account and generated an API key so my code can access OpenAI's models
- I created a project directory called 'ImageToKindle' on my laptop.
- I saved the 6 images that make up the article to a directory called 'understandanything' on my laptop within the project directory.
- I named the files sequentially in the order I wanted the app to process them: 001.jpg, 002.jpg etc
- I opened Visual Studio in the project directory.
Here's an example image that I want to convert:

And here's my original instruction to convert the images (there are 6 in total). You'll see I'm only interested in converting the images to plain text. That's my core value loop right there.
use the openai API. Choose model GPT-4o. Write the python code to extract the text from the images in ./understandanything. save the extracted text as a single plain text file. Make sure to get the text in the correct order. Each page is divided into columns, make sure to process each column from top to bottom. In each image, process the outside left column first then each column left to right. Assume there is a local file called '.secret' containing the required API key.
Key points in my instruction:
- Architectural: I've told it to use the OpenAI API, as previously mentioned, this is because I have credit there
- Architectural: Use the Python language. Because a) I understand it enough to know how it's using it and b) it's a good fit for this kind of task.
- Clear outcome: 'save the extracted text as a plain text file'.
- Instructional: I've explicitly broken down how to process the column layout. This is a pre-emptive strike. I don't trust it to not unthinkingly process the text left to right which would result in an unreadable block of text.
- Enablement: I hadn't generated the .secret file containing the API key and I wanted it to get to work without waiting for me to generate it.
- Decomposing the problem: I only want to see if I can convert the images to plain text. The conversion to Kindle-friendly ePub format is a solved problem - once I've got plain text, I'll just use the Calibre app on my desktop then email the ePub to my Kindle.
I then sent the instruction to Roocode.
It complained immediately:
"The file '.secret' containing the API key is not found in the workspace directory."
That's on me I suppose for not telling it explicitly that the file did not exist but it should act as if it was.
So I gave it further instruction:
I have not created it yet. Write the required application and I will create the .secret file. Write the code as if the .secret file already exists.
Then Roocode generated a single file of Python code to convert the images to plain text.
I created the .secret file and Roocode updated the code to include the specified API key.
When I ran the code, it didn't work.
And here is a beautiful thing about vibe coding, I just pasted the error message from the console into Roocode with no context and let it work out the answer.
Roocode came back almost immediately: I was sending too much data to the API, the data for a single image was too big to fit inside OpenAI's pre-defined 'context window'.
The tl;dr: for understanding 'context window' is that it refers to the size of your input to the AI model. This includes your prompt text, any core system instructions ('you are a helpful assistant') and any and all other data you need it to process as part of the prompt - image, spreadsheet, pdf etc
So, to get around that, Roocode suggested an alternative which was to extract the text locally using Python's pytesseract package then send the extracted text to the API which would reduce the input size massively.
It rewrote the code to reflect this new approach.
I was cursing not having thought of that myself in the first place. But Roocode soon re-established my dominance.
When I tried to run the modified script, it errored again. This time, I didn't need to paste the error into Roocode, I could immediately see that whilst Roocode had adapted the code to use the pytesseract package it hadn't installed the package the code depended on.
This is one of the reasons I say coding agents are not yet a 'no code' solution. Yes, you could keep asking Roocode until it worked out it hadn't [installed the packages] but it's so much faster if you know what it should be doing.
Anyway, I manually installed pytesseract and re-ran the script.
This time it worked like a charm.
I reviewed the text output to make sure it was readable and had followed my instructions around parsing the column layout correctly.
Some things you just need a human for. 'Does it make sense?' is one of them.
I took the plain text output and converted it to Kindle ePub format using Calibre.
And here's the finished result:

Final Thoughts
End to end, this process took just 30 minutes and cost less than $1.
Now I have an app that can loop through a directory of images, extract the text from them and render as a single plain text file. This isn't the last time I'll need that.
Yes, it is something I could have hand coded myself but it would've taken me much longer, perhaps half a day to a day to be confident in it.
If you asked me 'would you pay $1 to talk your idea through and have someone deliver it in 30 minutes?' then yes, yes I would. Every day of the week.
As I said before, this is a relatively trivial app, would I put it into a production environment as is? No, I probably wouldn't but production wasn't the goal. I only needed to solve a specific problem I had: 'how can I make this potentially interesting article readable?'
Next steps: can I automate the process from start to finish?
- download images [from a link]
- convert to text
- convert text to [ePub] format
- send the [ePub] to [a nominated send-to-Kindle email address].