• Pekka@feddit.nl
    link
    fedilink
    English
    arrow-up
    18
    ·
    edit-2
    2 years ago

    I already had to use the cached version of a Reddit thread today to solve a technical issue I had with the rust compiler. There is so much valuable content there that is well indexed by search engines, let’s hope they don’t lock down the site even further to prevent AI’s from training on their data.

    • jmp242@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 years ago

      If they actually want to restrict ai training, they also have to restrict search engines. I may be behind the times, but usually those kind of questions have gone to a stack overflow sort of site I would have thought.

      • ram@lemmy.ca
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 years ago

        If they wanted to restrict AI Training they’ll need to prevent AI’s ability to view the website. Removing the API just removes the low bandwidth low impact manner of gathering the data. Scripts can just as easily HTTP scrape as they can use an API, but that’s a lot more resource intensive on Reddit’s side. Heck, this is the whole reason free public APIs became a thing in the first place.

    • BobQuasit@beehaw.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 years ago

      I wonder if the Internet Archive has preserved much of Reddit’s old posts and comments? No one seems to have mentioned it.

      • Osayidan@social.vmdk.ca
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 years ago

        I know there were at least a few projects not affiliated with IA that basically was a mirror copy of reddit. No idea what has happened to them at this point have not checked in a long time.

    • generalpotato@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 years ago

      I’m pretty sure it’s only a matter of time till an LLM can solve any sort of obscure compiler issue. If organic data growth happens outside of reddit, it’s not going to be of much use once search engines catch to those other sources.