• Saik0
    link
    fedilink
    English
    24 days ago

    They can also crawl this publically-accessible social media source for their data sets.

    Crawling would be silly. They can simply setup a lemmy node and subscribe to every other server. Activitypub crawler would be much more efficient as they wouldn’t accidentally crawl things that haven’t changed, but instead can read the activitypub updates.

    • @Strawberry@lemmy.blahaj.zone
      link
      fedilink
      English
      24 days ago

      Sure but we’re in the comments section of an article about wikipedia being crawled, which is silly because they could just download a snapshot of wikipedia

      • TXL
        link
        fedilink
        English
        12 days ago

        That’s right. It’s not humans making careful decisions about what to download. It’s a program that follows links and saves files.