• asudox
    link
    fedilink
    18 months ago

    Block? Nope, robots.txt does not block the bots. It’s just a text file that says: “Hey robot X, please do not crawl my website. Thanks :>”

    • ɐɥO
      link
      fedilink
      18 months ago

      I disallow a page in my robots.txt and ip-ban everyone who goes there. Thats pretty effective.

        • bountygiver [any]
          link
          fedilink
          English
          0
          edit-2
          8 months ago

          humans typically don’t visit [website]/fdfjsidfjsidojfi43j435345 when there’s no button that links to it

          • @Avatar_of_Self@lemmy.world
            link
            fedilink
            English
            18 months ago

            I used to do this on one of my sites that was moderately popular in the 00’s. I had a link hidden via javascript, so a user couldn’t click it (unless they disabled javascript and clicked it), though it was hidden pretty well for that too.

            IP hits would be put into a log and my script would add a /24 of that subnet into my firewall. I allowed specific IP ranges for some search engines.

            Anyway, it caught a lot of bots. I really just wanted to stop automated attacks and spambots on the web front.

            I also had a honeypot port that basically did the same thing. If you sent packets to it, your /24 was added to the firewall for a week or so. I think I just used netcat to add to yet another log and wrote a script to add those /24’s to iptables.

            I did it because I had so much bad noise on my logs and spambots, it was pretty crazy.