When AI Starts Learning From AI
Most of the AI models we use today were trained on the internet. For all its noise, toxicity, and chaos, that “old” internet had one thing going for it: it was mostly written by people who were actually trying to figure something out in the physical world. There was a direct link between the words on the screen and a human being experiencing friction in real life.
Think about a developer documenting a fix at 3 AM after finally breaking a long-standing bug. Think about a researcher publishing a finding they spent years chasing, or a teacher explaining a concept for the hundredth time to a student who just wasn’t getting it. Even a manager writing a candid email because something went wrong and they had to own it carries a certain weight. That mix of effort, human mistakes, and lived context is a big part of why AI feels so useful today. It learned from people who were doing real work, facing real consequences, and leaving a digital trail of that experience behind.
But let’s be honest with ourselves. The internet was never a perfect library. It has always been part classroom, part diary, and part shouting match. It is full of “facts” that are actually propaganda, and “career advice” that is often just confident guessing. We’ve always had to filter the signal from the noise.
However, the nature of that noise is shifting. A massive and growing share of what we read, from blog posts and FAQs to supposedly “personal” stories, is now being generated by AI. It is starting to feel like we are stepping into a room full of mirrors, where the web is increasingly composed of machine-written echoes of other machines.
This leads to a question that matters more than it sounds: what happens to AI when it mainly eats what other AI has already chewed?
In technical circles, researchers call the resulting decay “model collapse,” but I think of it as the photocopy problem. If you photocopy an original document once, the copy looks fine. But if you take that copy and put it back in the machine to make another, and then use that second copy to make a third, the sharpness begins to fade. The edges blur. The contrast shifts. Tiny errors, little digital artifacts, start to become a permanent part of the image. The final result might look clean at a distance, but it slowly loses detail.
AI faces a similar risk of a slow drift. If these models ingest too much AI-generated content that isn’t grounded in reality, the language stays smooth and the confidence gets louder, but the specificity quietly vanishes. It still reads well, and that is the tricky part. It can feel like knowledge even when it’s mostly resemblance. A loop of “average” thoughts that keeps reinforcing the most common ways of saying things, until the unique, the messy, and the truly insightful get smoothed away.
If you have spent enough time in corporate life, you have seen this movie before. The beautiful slide decks, the crisp language, the strong air of certainty that has almost zero contact with reality. It’s the kind of professional polish that sounds great in a boardroom but falls apart the moment it hits the factory floor or a customer’s hands.
I’m not an AI scientist. I’m coming at this as a professional who is curious and trying to make sense of the change.
It seems to me that if the internet becomes an AI echo chamber, we have to ask where the fresh reality is going to come from. Humans usually write after they have done something. There is a before and an after. If I give you advice on how to manage a team, it’s coming from years of making mistakes and seeing the look on a person’s face when I got it wrong. AI can produce a neat, ten-point list on leadership without ever having tested a thing, sounding experienced without ever having been in the room where the work actually happened.
I suspect the future of AI will depend more on the source of information than the sheer volume of it. For the last decade, we’ve been obsessed with big data. Going forward, provenance and verification will matter more. Where did this come from. Who checked it. What did it cost to produce. Real-world ground truth, the kind you can audit and reproduce, starts to look like the thing everything else depends on.
Even synthetic data isn’t necessarily junk. It can be powerful for teaching a model logic or math. But it needs an anchor. A reliable check. A human reviewer that says, “Yes, this actually holds up.” Without an anchor, synthetic data is just more text added to an ever-growing pile of noise.
In this landscape, I don’t think humans become irrelevant. If anything, our work becomes valuable in a different way. AI is excellent at drafting, compressing, and structuring. It can take a messy pile of notes and turn them into a coherent summary in seconds. But the raw signal, the original observation, the story with actual consequences, still has to come from lived reality.
And maybe that’s what I’m still sitting with.
If the internet becomes an echo chamber, the question isn’t just what models learn. It’s what we stop noticing.
If the future web is mostly machines talking to machines, I don’t think it ends in disaster. I think it ends in something quieter. A world that sounds smarter than it is.

