AI agents are forgetful humans

AI fish being directed by another fish

made with Gemini. but you knew that already- look at the hands!

Oh god not another article about AI

I apologize in advance. If you've somehow stumbled upon this dark corner of the internet (thanks for being here) I'm sure you don't want to hear yet another insufferable tech bro spout about AI.

But it's.. kinda a big deal.

Vibes are code

Take today for example. I have a problem: my product's usage metrics live across two PostgreSQL instances and a third-party controlled API. (How that happened is the topic of a different post altogether.)

If I want to answer a question like "what is the most successful campaign's Clay table URL and what are sample messages sent from said campaign", it's not just a simple SQL query or API call. What I have to do is query each database one at a time, join their data models together with cross-db foreign keys I've stored and zipper them up into a single unified object I can analyze. Basically, a massive pain in the ass.

I knew exactly what I wanted, and how I'd do it step-by-step. But there's no way to justify hand-writing all the queries and stitching them together.

Enter: Cursor. I've been a casual Cursor enjoyer for a while now, and it definitely saved me some time when writing features.

But Agent mode got really good. It was genuinely mind-boggling how well it performed. All I had to do was point to a schema file here, toss in an example SQL query there and out came a properly joined in-memory data model from three distinct origins. The only remotely challenging part was securely passing the DB and API secrets around without embedding them in source code (which AI is all too ready to do).

It took me 2.5 hours to complete a task that would have taken me ~days to do before. And AI today is the worst it will ever be.

It is magical; it's not magic

I'm skeptical of most things by nature. Yet every time AI does stuff like that, I have to admit it feels like magic.

Of course, it's not actually magic- it's just well assembled engineering hidden behind a chat interface. It took the brightest minds on the planet working tirelessly for years to make it so amateurs can spin up mediocre web apps in an afternoon. Yet deep learning techniques and generative models have been around long before ChatGPT crashed onto the scene- what gives?

LLMs are a game changer because they transact in natural language, just like every human on the planet does. For a long time, coding was only accessible to those who could think like a computer; good luck hand-building a deep learning model in TensorFlow unless you understand how matrix multiplication works. That skillset is actually quite rare amongst the population.

Now, all you need is to talk to a computer like you would talk to a human, and out comes.. something. At the end of the day AI is just a tool- the wielder determines its utility. A knife in the hands of a sushi chef creates sushi, a knife duct taped to a roomba creates chaos.

It's just that using AI has suddenly become as easy as picking up a knife.

Armies at your command

As an undergrad, I briefly worked on analogical reasoning models.

Ok, I copied a grad student's Python scripts to Java for $12/hr. Pretty sweet gig though.

The theory goes that a lot of human cognition relies on analogies to simplify the problem. You take one known structure and map it onto another to draw a conclusion.

Well, a lot of your brain is hardwired to interact with other people. So it's not surprising that many users treat chat interfaces and AI agents like they would their work colleagues. I bet you've caught yourself saying "please" and "thank you" to the machines out of sheer habit.

Humanizing agents, ethical questions notwithstanding, is actually a pretty useful analogy. When you run an agent it's like telling someone to go do something and checking up on them a bit later. You'll find out what happened but not necessarily how they did it. And just like anyone who's managed interns will tell you, even if you give very detailed instructions on how to go about the task, your output quality will vary dramatically.

I've found agents do best on tasks I would trust an unskilled intern to attempt and let me review. One-off scripting is a great example because a) it's easy to tell if they got it right or not and b) I don't care if they ship garbage code since we can always throw it away.

Unlike actual humans though, agents forget anything they learned after their context window is exhausted. Sure, augmented retrieval and compression techniques help a bit, but AFAICT nobody's really cracked long-term memory yet- especially the ability to "forget" obsolete knowledge.

So when you spin up a bunch of agents, it's like summoning an army that vanishes into the night right after depositing the spoils at your feet. Spooky.

Making agents useful

I think we're all still figuring out how to make agents useful. Having an army of untrained interns is awesome, but it's not like they'll magically figure out what they should be doing. And they're prone to create more headache than they're worth.

Reading news and social media hasn't helped either, since everyone wants to exaggerate how much they're using AI. I assure you anyone who's cracked the code is too busy making millions off their handiwork, not shilling 100 step n8n agentic workflow courses on LinkedIn.

At the end of the day, I'm not sure any of us know what we'd do if we had an army of unskilled digital workers at our disposal. But at the rate we're going, we'll find out the hard way soon enough.