Adële 🐁! (@adele@social.pollux.casa)

Single post

jump to replies

Since I have put online a bunch of zim files on zim.pollux.casa, many bots are crawling them. I don't understand why they are scanning my wikipedia copies. The original sites are certainly more efficient.

Some are clearly identified by their user agent, but others try to hide themselves under stupid user agent fingerprint...

Here are some examples :

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/4.0.212.0 Safari/532.0

Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20130405 Firefox/22.0

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/10.10 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11

Who is using Chrome 4.0 or Firefox 22.0, and Ubuntu 10.10 ???

Unfortunately, kiwix-serve (serving zim files on my machine) does not provide a robots.txt file to avoid these crawlers, I had to forbid access at web server level (lighttpd) according to the useragent string.

$HTTP["useragent"] =~ "(?i)spider|tiktokspider|claudebot|googlebot|meta-external|scrapy|sogou|petalbot|dotbot|mj12bot|crawl|bingbot|yandex|baidu|duckduckbot|facebook|amazon|grok|facebot|slurp|exabot|ahrefs|mj12bot|semrush|perplexity|gptbot|chatgpt|ccbot" {
    url.access-deny = ("")
}

but faked user agents continue to crawl 😐

Published: Mar 27, 2025, 07:56
Visibility: Public
Replies: 3

Language: English
Favourites: 6
Reblogs: 0

4 replies

Torf und Schnee @torf@c.im

@adele is it the same messy "ai" bots crawling often reported on the last months?

Published: Mar 27, 2025, 18:25
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@torf I haven't seen discussion about that 🤔

Published: Mar 27, 2025, 18:39
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Torf und Schnee @torf@c.im

@adele e.g. https://mastodon.social/@nixCraft/114195156272946496

Published: Mar 27, 2025, 19:10*
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Mar 27, 2025, 19:11; Published Mar 27, 2025, 19:10

Adële 🐁! @adele

@torf thanks

Published: Mar 27, 2025, 19:27
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0