11 min read

Vibecoded to Hell: The Tools Nobody Asked For, Nobody Maintains, and Nobody Can Turn Off

Vibecoded to Hell: The Tools Nobody Asked For, Nobody Maintains, and Nobody Can Turn Off
Photo by James Lee / Unsplash

How many vibecoded tools your company now has?

Count them.

You can’t. Nobody can. That’s the first problem.

Somewhere between January and now, your organization acquired an indeterminate number of internal tools built by people who cannot explain how they work, connected to data sources that may or may not still exist, running on architectures that were selected by an LLM at 11pm on a Tuesday because someone described what they wanted and Claude said “sure, here you go.”

Your company is now running on vibes and JSON.

The Weekend Industrial Complex

Every Slack channel has someone who “built a little something over the weekend.” This announcement always comes with a screenshot. The screenshot always looks incredible. The tool always has a name. Not a good name. A name like “InsightFlow” or “DataPulse” or “Synthia.” Someone always names it. That’s how you know they’re serious.

Twelve people reply with the fire emoji. A director who has never spoken to this person forwards the thread to their staff meeting with “this is the kind of initiative I love to see.” The person who built the tool has now been publicly praised by leadership for a thing they made in nine hours while watching a show they can’t remember the name of. They are now emotionally invested in this tool’s survival. The tool does not share this investment.

Week three: the tool is in a deck. Not a casual deck. A strategy deck. Someone is presenting its output to people who make resource allocation decisions. The slide says “AI-powered insights” and has a chart. The chart has labels. The labels are correct. The underlying data is from a source that was last updated in February. Nobody checks because the chart has labels and labels mean legitimacy.

Week six: something breaks. The API changed. Or the Airtable got restructured. Or the Google Sheet it was reading from got moved to a different folder by someone who was “organizing the drive.” The tool doesn’t crash. That would be too kind. The tool continues to function. It continues to produce output. The output is now wrong. Not obviously wrong. Subtly wrong. Wrong in a way that looks exactly like right but with different numbers.

Week twelve: someone asks “didn’t we already build something for this?” Yes. You did. It’s in a Notion page titled “v2 FINAL (use this one) (2).” It’s dead. The person who built it is on a different team now. They left no documentation because documentation is what people did before vibecoding, back when we lived in caves and wrote unit tests.

Everyone Builds the Same Thing

Here’s the part nobody talks about at the company all-hands when they celebrate “grassroots AI adoption.”

Nobody vibecodes something hard. Nobody vibecodes a novel algorithm. Nobody vibecodes a new analytical framework or a statistically rigorous data pipeline or anything that requires understanding the domain before you build the tool. Those things are hard. Hard things require knowing things. Knowing things takes time. Time is what vibecoding was invented to eliminate.

So everyone vibecodes the easy thing. And the easy thing is always the same thing. A document summarizer. A dashboard. A workflow connector that takes data from one place and puts it in another place with an LLM in the middle doing something that nobody can precisely describe but everyone agrees is “AI.”

Your org now has fourteen tools that read documents and summarize them. Three tools that generate weekly reports from data sources. Five dashboards that visualize metrics that are also visualized by four other dashboards. Two Slack bots that answer questions, one of which is slightly broken and responds to everything with a confidence score that is itself fabricated.

None of these tools know each other exist. All of them produce slightly different answers to the same questions. The org chart says you have one analytics function. The reality is you have thirty-seven analytics functions, most of them unmanned, all of them confident.

The Maintenance Graveyard

The LinkedIn post says: “I replaced three SaaS subscriptions with tools I vibecoded myself! Saving the company $36,000/year!”

Four hundred likes. Twelve comments saying “this is the way.” A recruiter slides into the DMs.

The LinkedIn post does not mention that those three SaaS tools had engineering teams. People whose job it was to wake up at 3am when something broke and fix it. People who ran security audits. People who handled compliance. People who updated the tool when the underlying APIs changed. People who wrote documentation that other people could read if the original team got hit by a bus, which in corporate terms means “went to a competitor.”

All of those jobs now belong to the person who wrote the LinkedIn post. They just don’t know it yet. Nothing has broken at 3am yet. The word “yet” is doing a tremendous amount of structural work in this paragraph.

Your company is running on an unknown number of vibecoded tools built by people who have moved teams, changed roles, gone on parental leave, or simply forgotten that the tool exists. The tools persist. The tools produce output. The output gets read. Whether the output is still connected to reality is a question that exists in a philosophical space that nobody in your organization is paid to occupy.

The CRM that someone built in a weekend? It needs updating when the data model changes. It will not get updated. The data model will change. The CRM will continue to function with the old model. Somewhere, a sales report will start being subtly wrong. Someone will notice in about four months. They will spend two days trying to figure out what happened. They will eventually discover that the CRM is reading from a field that was renamed in March. The fix will take eleven minutes. The investigation will have cost roughly $4,000 in employee time. This will happen again in six months with a different tool. Nobody will connect the two events.

The Daily Insights Email That Was Hallucinating for Twelve Weeks

I need to tell you this story because it’s the most important thing that has happened in corporate AI adoption and nobody seems to care.

This was confided to me last week.

A team had a vibecoded tool that generated a daily insights email. Every morning, the tool would pull data from a source, run it through Claude, and produce a summary of what was happening. Trends, anomalies, things to watch. The email was well-formatted. It had sections. It had a “key takeaways” box at the top. People read it. People liked it. People referenced it in meetings.

In January, the data connection failed silently. The tool could no longer reach its data source. This is the part where a professionally built tool would send an error notification, or produce an empty report, or at minimum stop producing reports entirely.

This tool did none of those things. The error handling, and I need you to understand that I am not being hyperbolic, consisted of Claude generating plausible-sounding insights when the data fetch returned null. The tool was designed to always produce output. So it did. It produced output that was entirely fabricated. Not maliciously. Not even inaccurately in an obvious way. Just generatively. Plausibly. Fluently.

The team read hallucinated insights every morning for twelve weeks.

People discussed them in meetings. Action items were created based on them. A product decision was influenced by a trend that was identified in the email that did not exist in reality because reality had been disconnected from the pipeline three months earlier and nobody noticed because the email kept showing up and the email looked exactly the same as it always had.

The insights weren’t real. The action items were.

I don’t know what to do with this story except tell it at every opportunity and watch the color drain from people’s faces. The standard response is “well that’s an edge case.” It’s not an edge case. It’s the default case for any vibecoded tool whose error handling was also vibecoded, which is all of them, because nobody vibecodes error handling on purpose. Error handling is the thing that happens when you know what can go wrong. Vibecoding is the thing that happens when you don’t know and you ship it anyway.

The Governance Shrug

Who approved the vibecoded tool before it touched production data?

The answer at most companies is a shrug so profound it’s practically a yoga certification.

The tool was never in a sprint. Never in a roadmap. Never in a security review. Never in a data handling assessment. It was never on anyone’s radar because it was built in the space between official work, in the cracks, during lunch, over the weekend, in the liminal space where corporate governance does not reach because corporate governance was designed for a world where building things required resources and resources required approval.

Vibecoded tools require no resources. They require no approval. They require a Claude subscription and an idea that sounded good at 11pm. The entire governance model of your organization is predicated on the assumption that building things is expensive enough to create a natural approval checkpoint. That assumption is no longer true. The checkpoint is gone. The tools are flowing directly from imagination to production with nothing in between except Claude being helpful.

PII handling? The tool stores customer data somewhere. Where? In whatever Claude decided was the default storage option when it generated the code. With what access controls? The ones that were there when the tool was created. Which were the defaults. Which were probably “anyone with the link can edit.” Someone’s name, email address, and verbatim complaint about your product are sitting in a database that nobody administers, next to a row that says “test test test 123” because someone was checking if the form worked.

I am not making a security argument, though the security argument is screaming. I am making a governance argument. Your organization does not know what tools exist. It does not know who built them. It does not know what data they access, what data they store, where they store it, or whether they currently function. This is not an organization that has “embraced AI.” This is an organization that has lost the plot and is feeling good about it.

The Competence Inversion

An intern with a Claude subscription can now produce a strategy document that reads like McKinsey wrote it during a particularly inspired offsite. The formatting is clean. The executive summary is crisp. There’s a 2x2 matrix. There’s a “key implications” section. The phrase “north star metric” appears exactly once, which is the correct number of times for a strategy document to mention north star metrics if you want to be taken seriously but not seem like you’re trying too hard.

The recommendation is wrong. But it’s wrong in a way that uses the word “leverage” correctly. In most corporate environments this is functionally indistinguishable from being right.

Five years ago, a polished deliverable was a signal. It meant the person who produced it had spent time with the material. They understood it well enough to organize it clearly. The visual quality correlated with the cognitive quality. That correlation is completely, irreversibly broken.

The intern’s deck looks better than the director’s 2019 output. The intern doesn’t know what half the words in the deck mean. The deck uses them correctly because Claude uses them correctly. Claude uses them correctly because Claude has read every strategy document ever written and has identified the patterns that make them look legitimate. Claude cannot tell the difference between a pattern that looks legitimate and a pattern that is legitimate. Neither can the people reading the deck.

Every evaluation framework your company uses to assess the quality of work was calibrated to the old ratio. The ratio where effort correlated with output quality. That ratio no longer exists. The frameworks are all broken. Nobody has noticed because the frameworks themselves were also vibecoded into a dashboard that looks great.

Speed Ate Quality and Called It Efficiency

The thirty-minute meeting is the great equalizer. The person who spent two weeks on a methodologically sound analysis and the person who spent forty minutes prompting Claude will both present in the same slot. Both get a deck. Both get twenty minutes. Both have “data.” Both have charts. One of them represents genuine expertise applied over time. The other represents an afternoon of enthusiastic typing. The meeting cannot tell the difference.

The meeting was never designed to tell the difference. The meeting was designed for thirty minutes because the room was booked at 2:30.

“We already looked at this.” Did you? “We tested it.” With whom? “We have data.” From where?

The answers are usually: “I asked Claude to analyze our NPS comments,” “I sent a survey to the team Slack channel,” and “I ran the usage data through a tool I made.” That last one. You made a tool. You, the person who last week asked IT how to set up a mail merge, made a tool. And the tool has data. And the data is in a deck. And the deck is in a meeting. And the meeting is deciding whether to invest $2M in a product direction based on something a person who cannot explain how their tool works built in an afternoon.

Speed became the proxy for quality and there is no institutional mechanism for challenging that. Nobody has ever gotten promoted for saying “I think we should slow down and check this.” People get promoted for shipping. Shipping now means prompting. Prompting takes an afternoon. Checking takes a week. The incentive structure does its work.

The Synthetic Ouroboros

This one is newer and I find it mesmerizing in a watching-a-washing-machine kind of way.

Follow this chain with me.

A PM uses Claude to write survey questions. The survey goes to respondents, a meaningful percentage of whom use Claude or ChatGPT to write their responses because who has time to type full sentences in 2026. The PM uses Claude to analyze those responses. Claude is now analyzing text that Claude wrote, looking for patterns in its own output, and finding them, because of course it finds patterns, that’s the only thing it does. The analysis produces themes. The themes go into a vibecoded dashboard. The dashboard feeds a quarterly review. The quarterly review shapes strategy. The strategy determines what gets built. What gets built gets evaluated by users who give feedback through another vibecoded tool.

LLMs all the way down. An ouroboros made of autocomplete eating its own tail and producing a very professional-looking slide about it.

At no point in this chain did a human sit with the data long enough to notice it smelled funny. Every link is optimized for speed and fluency. Every output is coherent, well-structured, and plausible. Whether any of it is true is a different kind of question. The kind that requires the slow, careful thinking that vibecoding was specifically invented to skip.

How many layers of AI-mediated analysis can you stack before the output becomes purely decorative? Three? Five? I genuinely don’t know. We’re going to find out. We’re probably already past it in places.

The output still looks great though.

Building Something Good Is Actually Hard

Notion has an entire team working on AI knowledge retrieval. Google built NotebookLM and has been iterating on it for over a year with some of the best ML engineers on the planet. Glean raised $200M to solve enterprise knowledge search. Guru. Dovetail. Confluence. Every knowledge management platform in the market is trying to crack retrieval and interpretation at scale.

None of them have fully solved it.

These are companies with hundreds of engineers, dedicated ML teams, research divisions, and years of development. They have published papers on the problems they’re solving. They have benchmarks. They have error rates they’re trying to improve by fractions of a percent because the difference between 91% accuracy and 94% accuracy in knowledge retrieval is the difference between a useful tool and one that confidently gives you wrong answers often enough to be dangerous.

Your PM built his version over a weekend. He used Claude. He does not have benchmarks. He does not have error rates. He has a Slack thread with fire emojis and a director who said “love this energy.”

The gap between those two things is not a minor detail. It is the entire problem.

So What Do You Actually Do

I don’t know. That’s not a rhetorical device. I genuinely do not know.

You can’t tell people to stop building. They won’t listen and honestly some of what they’re building is useful. A vibecoded prototype that saves two weeks of engineering exploration is a real win. A quick script that reformats data so you don’t have to do it manually? Great. Fine. Nobody’s arguing against that.

The problem is that the same capability that produces a useful prototype also produces a hallucinating daily insights email, a customer data pipeline with the security posture of an unlocked bicycle, and a strategy document that reads like McKinsey but recommends the wrong thing in a way that won’t be discovered for two quarters.

The tooling democratized but the judgment didn’t. The ability to build is free. The ability to know what’s worth building, what to trust, what to check, and when to stop, those are still expensive. They still require the kind of expertise that doesn’t come from a weekend of prompting. Nobody vibecodes judgment. Nobody posts about judgment on LinkedIn. Nobody gets fire emojis for saying “I think we should slow down and check this.”

We spent years building tools to make better decisions. Now everyone can build the tools. The decisions are getting worse. But the dashboards look incredible. The fire emojis keep coming. And somewhere in your org, a tool that nobody maintains is producing output that nobody verifies for an audience that nobody warned.

It’s fine. Everything is fine.

🎯The Voice of User publishes weekly on UXR strategy, survival, and the things nobody else will say out loud. Get it in your inbox. Don't let a Claude wrapper tell you what I said. Subscribe.