The Collapse of GPT: Will future artificial intelligence systems perform increasingly poorly due to AI-generated material in their training data?

Pro@programming.dev · 22 days ago

The Collapse of GPT: Will future artificial intelligence systems perform increasingly poorly due to AI-generated material in their training data?

andallthat@lemmy.world · edit-2 22 days ago

Basically, model collapse happens when the training data no longer matches real-world data

I’m more concerned about LLMs collaping the whole idea of “real-world”.

I’m not a machine learning expert but I do get the basic concept of training a model and then evaluating its output against real data. But the whole thing rests on the idea that you have a model trained with relatively small samples of the real world and a big, clearly distinct “real world” to check the model’s performance.

If LLMs have already ingested basically the entire information in the “real world” and their output is so pervasive that you can’t easily tell what’s true and what’s AI-generated slop “how do we train our models now” is not my main concern.

As an example, take the judges who found made-up cases because lawyers used a LLM. What happens if made-up cases are referenced in several other places, including some legal textbooks used in Law Schools? Don’t they become part of the “real world”?

WanderingThoughts@europe.pub · 22 days ago

LLM are not going to be the future. The tech companies know it and are working on reasoning models that can look up stuff to fact check themselves. These are slower, use more power and are still a work in progress.

andallthat@lemmy.world · 22 days ago

Look up stuff where? Some things are verifiable more or less directly: the Moon is not 80% made of cheese,adding glue to pizza is not healthy, the average human hand does not have seven fingers. A “reasoning” model might do better with those than current LLMs.

But for a lot of our knowledge, verifying means “I say X because here are two reputable sources that say X”. For that, having AI-generated text creeping up everywhere (including peer-reviewed scientific papers, that tend to be considered reputable) is blurring the line between truth and “hallucination” for both LLMs and humans

Aux@feddit.uk · 22 days ago

Who said that adding glue to pizza is not healthy? Meat glue is used in restaurants all the time!

kate@lemmy.uhhoh.com · 22 days ago

surely if they start to get worse we’d just use the models that already exist? didnt click the link though

Maestro@fedia.io · 22 days ago

If you do that then models won’t know any new information. For example, a model may think Biden still is president.

3abas@lemm.ee · 22 days ago

This is already a solved problem, we’re well past one model systems, and any competitive AI offering can augment its information from the Internet.

Khanzarate@lemmy.world · 22 days ago

Right the internet that’s increasingly full of AI material.

Aux@feddit.uk · 22 days ago

The Internet was always full of mental diarrhea, if you can’t reason which content is correct and which is not, AI won’t change anything in your life.