●Stories
●Firehose
●All
●Popular
●Polls
●Software
●Thought Leadership
Submit
●
Login
●or
●
Sign up
●Topics:
●Devices
●Build
●Entertainment
●Technology
●Open Source
●Science
●YRO
●Follow us:
●RSS
●Facebook
●LinkedIn
●Twitter
●
Youtube
●
Mastodon
●Bluesky
Follow Slashdot blog updates by subscribing to our blog RSS feed
Forgot your password?
Close
This discussion has been archived.
No new comments can be posted.
Load All Comments
Full
Abbreviated
Hidden
/Sea
Score:
5
4
3
2
1
0
-1
More
Login
Forgot your password?
Close
Close
Log In/Create an Account
●
All
●
Insightful
●
Informative
●
Interesting
●
Funny
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
bySlashbotAgent ( 6477336 ) writes:
Destroying website? No, that's bullshit.
Destroying website page views by giving the user the data without attribution or even visiting the site? Yea. that's totally happening.
It's not damaging any sites. It's damaging the revenue of a few sites and their pissed. Perhaps rightly so. But the horses have left the barn and the barn has burned down.
bychristerk ( 141744 ) writes:
As someone who's actively fighting this type of traffic, let me share my perspective.
I have been running a small-ish website with user peaks at around 50 requests per second. Over the last couple of months, my site is getting hit with loads of up to 300 requests per second by these kinds of bots. They're using distributed IPs, and random user agents making it hard to block.
My site has a lot of data and pages to scan, and despite an appropriate robots.txt, these things ignore that and just scan endlessly. My website isn't designed to be for profit, and I do this more or less as a hobby and therefore has trouble handling a nearly 10x increase in traffic. My DNS costs have gone up significantly, with 150 or so million DNS requests being done this month.
The net effect is that my website slows down and gets unresponsive by these scans, and I am looking at spending more money just to manage this excess traffic.
Is it destroying my site? No, not really. But it absolutely increases costs and forces me to spend more money and hours on infrastructure than I would have needed to. These things are hurting smaller communities generating significant cost increases onto those who may have difficulties covering those costs, so calling it bullshit isn't exactly accurate.
Parent
twitter
facebook
byreanjr ( 588767 ) writes:
I know you're saying it's coming from lots of IP addresses, but I wonder if anyone has looked into geofencing to throttle any requests coming out of major data center cities. Normal users would get full speed access, but anyone in the valley or in Ashburn, VA would experience difficulty scraping.
byHalo1 ( 136547 ) writes:
It's not just data centres, many of the requests from regular broadband IP addresses. I think they're using "services" of bottom feeders like Scraper API [scraperapi.com], or buying from the authors of malicious web browser extensions [arstechnica.com].
Parent
twitter
facebook
byh33t l4x0r ( 4107715 ) writes:
Yeah and it just gets worse if you try to block them because now instead of something like Python requests they're using Selenium/ Playwright to get around those blocks which means loading your css / images / whatever as well like a regular visitor would
bysound+vision ( 884283 ) writes:
It's really no holds barred, with the amount of money we're talking about. This is an industry that's spent the last 3, 4 decades telling us how terrible unauthorized copying and systems access are. (Don't copy that floppy!) Those rules get thrown right out the window when they're eyeing the types of cash they think AI will bring.
Working with malware distributors and botnet admins would not surprise me at all. Particularly in this Project 2025 era where the government's been purchased, whole-hog, by tech br
byCigamit ( 200871 ) writes:
I have had to fight off several, one of which I recorded over 1 million unique IPs, all random and coming out of nearly every Vietnam and Indonesian subnet, mostly residential. My site normally gets 5-10 requests per second and was now getting over 1000+ for 12-14 hours per day for 3 weeks straight. It always started at the same time of day, almost like it was on a timer. Luckily, that one all used a User Agent with the same old version of Chrome in the string and was easily blocked. But the attack conti
bysound+vision ( 884283 ) writes:
"Major data center cities" are also generally major population cities, with a minor in geography. Rate-limiting by GeoIP would also rate-limit big chunks of real users. It might provide a marginal windfall to users outside of those locations - at least until the bots start using rural IP addresses.
There is a boom in rural areas spinning up data centers. That is to say, any random small city may now or in the near future suddenly become a "major data center city" at the behest of a singular tech bro.
byreanjr ( 588767 ) writes:
""Major data center cities" are also generally major population cities"
Yes and no. Data centers are usually NEAR cities. But the economics of data centers keep them out in the suburbs. Data centers are more likely to be surrounded by fields than a high rise apartment.
But it sounds like from other comments that the requests are actually much more diffuse.
byHalo1 ( 136547 ) writes:
Anubis [github.com] has worked well for us to get rid of most of the scrapers from our wiki, including the ones faking regular user agents.
Parent
twitter
facebook
byallo ( 1728082 ) writes:
Anubis has the side effect that it stops the internet archive crawler.
byHalo1 ( 136547 ) writes:
Anubis has the side effect that it stops the internet archive crawler.
Even though it whitelists [github.com] the IA crawlers by default?
byallo ( 1728082 ) writes:
I am not sure what they are whitelisting, but I've seen pages in the archive that only show the anubis girl. Maybe it was from an older version, but I saw the page less than 5 month ago.
byh33t l4x0r ( 4107715 ) writes:
Have you considered offering a rss feed? Bots would rather consume that than html. It tastes better.
bylarryjoe ( 135075 ) writes:
Someone should build an AI tool to detect these AI web crawlers and then send back corrupted information (not misspelling but actual falsehoods). The only way to stop the unneighborly actions is to eliminate the expectation of a reward.
Parent
twitter
facebook
bysound+vision ( 884283 ) writes:
Cloudflare built it, and it's called "AI Labyrinth". I'd like to deploy a similar webpage generator on my Apache server, without Cloudflare. If you know of any such scripts, link me and I'll check them out.
byCigamit ( 200871 ) writes:
I built something like this a decade ago with PHP and a dictionary file. The problem that you run into, is the more bots you trap in the Labyrinth, the most CPU you end up using, because they will blindly just keep slurping up what you are giving them.
In the end I shut it down, as I would rather just block them to begin with instead of wasting CPU cycles for no real gain on my part.
byWidjettyOne ( 10203247 ) writes:
Someone should build an AI tool to detect these AI web crawlers and then send back corrupted information (not misspelling but actual falsehoods). The only way to stop the unneighborly actions is to eliminate the expectation of a reward.
There's Nepenthes [zadzmo.org], and it's open source, though it sends back slow, Markov-chain nonsense rather than actual falsehoods.
●threshold.
There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.
Slashdot
●
●
Submit Story
It is much harder to find a job than to keep one.
●FAQ
●Story Archive
●Hall of Fame
●Advertising
●Terms
●Privacy Statement
●About
●Feedback
●Mobile View
●Blog
Do Not Sell or Share My Personal Information
Copyright © 2026 Slashdot Media. All Rights Reserved.
×
Close
Working...