BYB | Block You Bots

History & quick facts about bots

Key milestones

1966 — ELIZA (MIT): early chatbot by Joseph Weizenbaum simulating a therapist.*
1993 — World Wide Web Wanderer: first web crawler (Matthew Gray), led to Wandex.
1993 — Eggdrop: earliest widely used IRC bot (Robey Pointer).
1994 — WebCrawler: first full-text search engine crawler (Brian Pinkerton).
1994 — robots.txt proposed by Martijn Koster.
2005 — XML Sitemaps introduced (Google; later adopted by others).
2000s — CAPTCHA popularized (Luis von Ahn & team).
2022 — robots.txt standardized as RFC 9309.

*ELIZA — explanation

ELIZA belongs to the bot family, but to a different species than crawlers/scanners.

What ELIZA is:
An early chatbot (1966, MIT, Joseph Weizenbaum) that simulated conversation using simple rules/patterns.

Why it is a 'bot':
A bot is an automated program that performs tasks. Chatbots automate dialogue; crawlers automate exploration/indexing. Both are bots, just with different purposes.

Why this is relevant:
ELIZA is considered a starting point for conversational bots (later PARRY, A.L.I.C.E., etc.). It shows how long automation has imitated human interaction—and why we use mechanisms like CAPTCHA/verification to distinguish humans from machines today.

Fast facts

Bots aren’t all bad — search engines and monitors rely on them; harmful ones include scrapers, scanners and DDoS bots.
User-Agent strings are easy to spoof; reverse+forward DNS is more reliable to verify good bots.
robots.txt is voluntary; malicious bots may ignore it — server-side controls are essential.
`crawl-delay` is not supported by Google; use server-side throttling or Search Console crawl rate.
Sitemaps help discovery, not rankings; good internal linking remains crucial.
CAPTCHAs evolve as bots get better; expect periodic changes in anti-bot methods.
Early crawlers mainly measured the Web’s size — the landscape changed once full-text indexing arrived.

What is a bot?

A bot is a computer program that performs repetitive tasks largely automatically, without direct human intervention.
Well-known examples include spiders, crawlers, or web bots that browse websites or collect data.
The term bot is derived from robot but has nothing to do with metal machines. They are usually scripts or complex programs running in the background.

Are all bots bad?

There are useful and harmful bots.

Useful bots include search engine bots, chatbots, or monitoring bots.
Harmful bots include spam bots, scraper bots, brute-force bots, or DDoS bots.

How do bots work?

Once simple scripts, today often equipped with AI and machine learning to mimic human behavior and exploit security vulnerabilities.

How to recognize bot traffic?

Warning signs: many spam comments, sudden traffic spikes, mass form submissions, suspicious logins.

What types of bots exist?

Category	Description
Search engine bots	Crawl pages for Google, Bing, Yandex etc. to index them.
SEO / Backlink bots	Analyze site structure to evaluate rankings or backlinks (e.g. Ahrefs, Semrush).
Monitoring bots	Check your website's availability (e.g. UptimeRobot).
Security scanners	Look for vulnerabilities (e.g. sqlmap, Acunetix, Nikto).
Scrapers	Steal content, prices, or data through automated access.
Fake bots / Malware	Pretend to be legitimate bots but perform scans or attacks.

Why should some bots be blocked?

Server load: Some bots generate a massive number of requests and slow down your website.
Data theft: Scrapers automatically copy your texts, images, or prices.
Security risk: SQL injection bots or exploit scanners try to hack your site.
Traffic distortion: They produce fake visits in logs or analytics.

This page provides an overview of known bots accessing your website, including SQL injection scanners and crawlers.
You can review each bot's purpose and decide whether to block or allow them based on your site's needs.
SQL injection scanners are security tools often used for attacks and should be blocked immediately to protect your site.
Other crawlers and bots can be selectively blocked depending on your server capacity, relevance, and your audience.

SQL Injection Scanners

Bot-Name	Description	Recommendation
acunetix	Commercial website scanner that tests for SQL injection, XSS, and more.	Yes – block immediately
arachni	Security scanner framework used for automated SQLi and XSS discovery.	Yes – block immediately
BurpSuite	Intercepting proxy & scanner used in web pentests.	Yes – block immediately
CensysInspect	Censys HTTP/scan component that inventories services.	Yes – block immediately
commix	Specialized tool to detect command injection and SQL injection points.	Yes – block immediately
dirb	Command-line directory brute forcer similar to DirBuster.	Yes – block immediately
DirBuster	Directory/file brute forcer used to discover hidden paths.	Yes – block immediately
fimap	File inclusion (LFI/RFI) scanner.	Yes – block immediately
havij	Windows-based GUI tool for SQL injection attacks. Used by many amateurs.	Yes – block immediately
jSQL	Java-based SQL injection scanner, easy to use and often misused.	Yes – block immediately
Nessus	Vulnerability scanner that can probe web services.	Yes – block immediately
netsparker	Professional security scanner that detects SQL injection and other flaws.	Yes – block immediately
nikto	Command-line scanner that tests for 6,000+ vulnerabilities including SQLi.	Yes – block immediately
OpenVAS	Open-source vulnerability scanner (Greenbone).	Yes – block immediately
Qualys	Enterprise vulnerability scanner; can trigger HTTP checks.	Yes – block immediately
Shodan	Internet-wide scanner; HTTP fingerprinting.	Yes – block immediately
sqlmap	Powerful open-source tool for automated SQL injection and database takeover.	Yes – block immediately
sqlninja	SQL injection exploitation tool (MSSQL focus).	Yes – block immediately
w3af	Web application attack framework – includes SQL injection modules.	Yes – block immediately
webinspect	Enterprise security scanner by Micro Focus – includes aggressive SQLi tests.	Yes – block immediately
WPScan	WordPress security scanner (enumeration, vulnerabilities).	Yes – block immediately
zap	OWASP ZAP scanner used for SQLi/XSS discovery during pentests.	Yes – block immediately
ZGrab	Banner/HTTP grabber used by Internet-wide scanners.	Yes – block immediately

Other Crawlers and Bots

Bot-Name	Description	Recommendation
Adagiobot	AdTech/revenue-optimization crawler (also seen as 'dagioBot').	Yes – block recommended
AhrefsBot	Backlink crawler from Ahrefs. Very active, often without real benefit.	Yes – block recommended
AhrefsSiteAudit	Technical SEO crawler (Ahrefs Site Audit).	Yes – block recommended
AI2Bot	Research crawler from the Allen Institute for AI (AI2); collects web content to train open models. UA: AI2Bot (and Ai2Bot-Dolma).	Yes – block recommended
Algolia Crawler	Algolia/DocSearch crawler for own indexes.	Optional – allow if you use it
amazon-kendra	Enterprise crawler (AWS Kendra) for internal search/KB.	Optional – allow if you use it
APIs-Google	Google APIs fetcher/push notifications; not an indexing crawler.	Allow – harmless
Apple Podcasts	Apple Podcasts/feeds fetcher; accesses podcast resources.	Optional – allow if you host podcasts
artemis-web-crawler	Quiet feed/web reader (Artemis).	Optional – allow
AspiegelBot	Bot from Huawei's Aspiegel platform. Sometimes aggressive crawling.	Yes – block recommended
Baiduspider	Chinese search engine bot. Often outside target audience.	Optional – depending on your target audience
Barkrowler	Open-source focused crawler; sometimes aggressive.	Selective blocking recommended
BLEXBot	SEO/crawling bot by WebMeUp; can be heavy.	Yes – block recommended
Brightbot	Bright Data crawler (UA: 'Brightbot 1.0').	Yes – block recommended
BrightEdge Crawler	SEO crawler from BrightEdge (incl. Autopilot/analytics).	Yes – block recommended
BUbiNG	Open-source crawler with high parallelism – can heavily load servers.	Yes – block recommended
Bytespider	ByteDance/TikTok crawler; collects content for TikTok/aggregations.	Yes – block recommended
CCBot	Common Crawl crawler that indexes the open web for datasets.	Optional – depending on your policy
ClaudeBot	Anthropic AI crawler for model/data collection (respects robots).	Optional – depending on your policy
crawler	Generic term. Used in many bot user agents.	Selective blocking recommended
curl	Command-line tool. Often used for automated page requests.	Selective blocking recommended
DataForSeoBot	SEO data collection bot.	Yes – block recommended
DeepSeekBot	Crawler for DeepSeek’s AI models/search; collects public web content.	Yes – block recommended
Diffbot	AI extraction/knowledge graph crawler (commercial service).	Optional – allow if you use it
Discordbot	Discord embed preview bot.	Optional – allow if previews matter
DotBot	Unclear crawler, likely Moz. Often provides little real value.	Yes – block recommended
DotNetDotCom	Poorly documented bot with noticeable traffic behavior.	Yes – block recommended
Facebookexternalhit	Facebook/Meta link preview fetcher.	Optional – allow if you need rich previews
facebot	Meta/FB bot variant for link previews.	Optional – allow if you need rich previews
FirecrawlAgent	AI agent/scraper (Firecrawl/Mendable) used to fetch and parse websites.	Yes – block recommended
Go-http-client	Generic Go HTTP client used by many scripts.	Selective blocking recommended
Google-Extended	Google flag UA variant for AI training access (opt-out via robots).	Optional – depending on your policy
GPTBot	OpenAI web crawler that fetches publicly available pages to improve models and safety systems. Identifies as 'GPTBot' and honors robots.txt (opt-out possible).	Optional – depending on your policy; many sites block (robots.txt: User-agent: GPTBot / Disallow: /)
GTmetrix	Performance testing bot.	Optional – allow when testing
Java	Generic Java HTTP client UA (e.g., Java/1.8).	Selective blocking recommended
libwww-perl	Technical library, often misused for automated requests.	Yes – block recommended
Lighthouse	Google Lighthouse/PageSpeed testing.	Optional – allow when testing
LinkedInBot	LinkedIn link preview fetcher.	Optional – allow if previews matter
masscan	Fast port scanner, often used for security or vulnerability scans.	Yes – block recommended
MJ12bot	Link analysis by Majestic. Often causes high server load.	Yes – block recommended
okhttp	Android/Java HTTP client common in scrapers.	Selective blocking recommended
PerplexityBot	Perplexity AI crawler (respects robots; opt-out via robots).	Optional – depending on your policy
PetalBot	Crawler from Huawei (Petal Search). Usually low relevance.	Optional – depending on your target audience
Pingdom	Uptime/performance monitor.	Optional – allow if you use it
python	General user agent for custom Python scripts.	Selective blocking recommended
python-requests	Automation tool for HTTP requests. Common in scraping.	Selective blocking recommended
python-urllib	Python urllib user agent often used in scripts.	Selective blocking recommended
scrapy	Python web scraping framework, frequently used by bots.	Yes – block recommended
SemrushBot	SEO crawler from Semrush. Very active and causes high server load.	Yes – block recommended
SenutoBot	SEO crawler from Senuto (Senuto Sp. z o.o.).	Yes – block recommended
SEOkicks-Robot	German SEO crawler for link analysis. Noticeable server impact.	Yes – block recommended
SeznamBot	Czech search engine bot from Seznam.	Optional – depending on your target audience
SiteCheckerBot	SEO audit/monitoring crawler from Sitechecker.	Yes – block recommended
SiteExplorer	Outdated bot from Yahoo or alternative sources.	Yes – block recommended
Siteimprove	Crawler of the Siteimprove platform (QA/Accessibility/Policy/SEO).	Optional – allow if you use it
Sogou	Crawler from Sogou. Mainly active in Asia.	Yes – block recommended
spider	Like 'crawler' – common term in many bot names.	Selective blocking recommended
StatusCake	Uptime monitor similar to Pingdom.	Optional – allow if you use it
TelegramBot	Telegram link preview fetcher.	Optional – allow if previews matter
Thunderbit	AI scraper/no-code tool; can fetch pages at scale.	Yes – block recommended
Twitterbot	X/Twitter link preview bot.	Optional – allow if previews matter
UptimeRobot	Monitoring service for website availability. Not critical.	Optional – depending on your target audience
Webhook	Generic term for incoming HTTP calls (not a specific bot).	N/A – depends on integration
wget	Tool for scripted content retrieval. Can load page content directly.	Selective blocking recommended
WhatsApp	WhatsApp link preview fetcher.	Optional – allow if previews matter
YandexBot	Russian search engine bot. Not always relevant.	Optional – depending on your target audience
ZoominfoBot	Crawler from ZoomInfo to collect data for company profiles.	Yes – block recommended

Search Engine Bots

Bot-Name	Description	Recommendation
360Spider	Qihoo 360 search crawler (CN).	Optional – only for CN audience
AdIdxBot	Microsoft Advertising (Bing Ads) crawler for destination URLs.	Allow – if you use Microsoft Ads
AdsBot-Google	Google Ads quality crawler.	Allow – if you use Google Ads
Amazonbot	Bot from Amazon, used for Alexa and product-related content.	Block – not relevant for most sites
Applebot	Crawler for Apple (Siri/Spotlight suggestions).	Allow – optional
Applebot-Image	Apple image crawler.	Allow – optional
Baiduspider	Crawler from Baidu (China’s leading search engine).	Optional – only needed for Chinese market
Bingbot	Microsoft’s main crawler for Bing search results.	Allow – if Bing is relevant to your audience
DuckDuckBot	Bot from DuckDuckGo, privacy-focused search engine.	Allow – minimal impact, privacy-oriented
Exabot	French search engine bot – low visibility and relevance.	Optional – rarely useful
Gigabot	Crawler from Gigablast, small-scale search engine.	Optional – rarely needed
Google-InspectionTool	Search Console/inspection fetcher.	Allow – recommended
Googlebot	Crawler from Google used for indexing websites and images.	Allow – important for visibility
Googlebot-Image	Google's image crawler – indexes images in Google Images.	Allow – if image search is desired
Mediapartners-Google	Google AdSense crawler (determines ad relevance).	Allow – if you use AdSense
MojeekBot	Crawler from independent privacy-based search engine Mojeek.	Optional – minimal impact
msnbot	Legacy Microsoft crawler (pre-bingbot).	Optional – mostly legacy
OAI-SearchBot	OpenAI search crawler used for ChatGPT Search (indexing; not model training).	Allow – if ChatGPT Search is relevant
PetalBot	Bot from Huawei’s Petal Search engine.	Optional – usage depends on your region
PhindBot	Crawler for Phind AI search.	Optional – depending on your target audience
Qwantify	Crawler from Qwant, a French privacy-respecting search engine.	Allow – optional
SeekportBot	German search engine bot (seekport).	Optional – regional relevance only
Slurp	Yahoo’s legacy crawler, sometimes inactive but still appearing.	Optional – depending on target market
Sogou Spider	Chinese search engine bot with aggressive behavior.	Optional – may cause high server load
YandexBot	Crawler from Yandex (Russian search engine).	Optional – only if targeting Russian audience
Yeti	Naver search crawler (Korea).	Optional – only for KR audience
YouBot	Crawler for You.com AI search.	Optional – depending on your target audience

General bot descriptions