OpenAI just admitted it can’t identify AI-generated text. That’s bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

  • @Womble@lemmy.world
    link
    fedilink
    English
    172 years ago

    This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

    • @lily33@lemmy.world
      link
      fedilink
      English
      1
      edit-2
      2 years ago

      I don’t see how that affects my point.

      • Today’s AI detector can’t tell apart the output of today’s LLM.
      • Future AI detector WILL be able to tell apart the output of today’s LLM.
      • Of course, future AI detector won’t be able to tell apart the output of future LLM.

      So at any point in time, only recent text could be “contaminated”. The claim that “all text after 2023 is forever contaminated” just isn’t true. Researchers would simply have to be a bit more careful including it.