General bot descriptions
Overview of known bots, SQL injection scanners and crawlers.
Download bot blocklists
Strict (recommended) combines Hard and Recommended.
Available formats: TXT, CSV, JSON, and a ready-to-use .htaccess snippet.
Matching is case-insensitive on User-Agent substrings—please test before going live.
History & quick facts about bots
Key milestones
- 1966 — ELIZA (MIT): early chatbot by Joseph Weizenbaum simulating a therapist.*
- 1993 — World Wide Web Wanderer: first web crawler (Matthew Gray), led to Wandex.
- 1993 — Eggdrop: earliest widely used IRC bot (Robey Pointer).
- 1994 — WebCrawler: first full-text search engine crawler (Brian Pinkerton).
- 1994 — robots.txt proposed by Martijn Koster.
- 2005 — XML Sitemaps introduced (Google; later adopted by others).
- 2000s — CAPTCHA popularized (Luis von Ahn & team).
- 2022 — robots.txt standardized as RFC 9309.
*ELIZA — explanation
ELIZA belongs to the bot family, but to a different species than crawlers/scanners.
What ELIZA is:
An early chatbot (1966, MIT, Joseph Weizenbaum) that simulated conversation using simple rules/patterns.
Why it is a 'bot':
A bot is an automated program that performs tasks. Chatbots automate dialogue; crawlers automate exploration/indexing. Both are bots, just with different purposes.
Why this is relevant:
ELIZA is considered a starting point for conversational bots (later PARRY, A.L.I.C.E., etc.). It shows how long automation has imitated human interaction—and why we use mechanisms like CAPTCHA/verification to distinguish humans from machines today.
Fast facts
- Bots aren’t all bad — search engines and monitors rely on them; harmful ones include scrapers, scanners and DDoS bots.
- User-Agent strings are easy to spoof; reverse+forward DNS is more reliable to verify good bots.
- robots.txt is voluntary; malicious bots may ignore it — server-side controls are essential.
- `crawl-delay` is not supported by Google; use server-side throttling or Search Console crawl rate.
- Sitemaps help discovery, not rankings; good internal linking remains crucial.
- CAPTCHAs evolve as bots get better; expect periodic changes in anti-bot methods.
- Early crawlers mainly measured the Web’s size — the landscape changed once full-text indexing arrived.
What is a bot?
A bot is a computer program that performs repetitive tasks largely automatically, without direct human intervention.Well-known examples include spiders, crawlers, or web bots that browse websites or collect data.
The term bot is derived from robot but has nothing to do with metal machines. They are usually scripts or complex programs running in the background.
Are all bots bad?
There are useful and harmful bots.
- Useful bots include search engine bots, chatbots, or monitoring bots.
- Harmful bots include spam bots, scraper bots, brute-force bots, or DDoS bots.
How do bots work?
Once simple scripts, today often equipped with AI and machine learning to mimic human behavior and exploit security vulnerabilities.
How to recognize bot traffic?
Warning signs: many spam comments, sudden traffic spikes, mass form submissions, suspicious logins.
What types of bots exist?
Category | Description |
---|---|
Search engine bots | Crawl pages for Google, Bing, Yandex etc. to index them. |
SEO / Backlink bots | Analyze site structure to evaluate rankings or backlinks (e.g. Ahrefs, Semrush). |
Monitoring bots | Check your website's availability (e.g. UptimeRobot). |
Security scanners | Look for vulnerabilities (e.g. sqlmap, Acunetix, Nikto). |
Scrapers | Steal content, prices, or data through automated access. |
Fake bots / Malware | Pretend to be legitimate bots but perform scans or attacks. |
Why should some bots be blocked?
- Server load: Some bots generate a massive number of requests and slow down your website.
- Data theft: Scrapers automatically copy your texts, images, or prices.
- Security risk: SQL injection bots or exploit scanners try to hack your site.
- Traffic distortion: They produce fake visits in logs or analytics.
This page provides an overview of known bots accessing your website, including SQL injection scanners and crawlers.
You can review each bot's purpose and decide whether to block or allow them based on your site's needs.
SQL injection scanners are security tools often used for attacks and should be blocked immediately to protect your site.
Other crawlers and bots can be selectively blocked depending on your server capacity, relevance, and your audience.
SQL Injection Scanners
Bot-Name | Description | Recommendation |
---|---|---|
acunetix | Commercial website scanner that tests for SQL injection, XSS, and more. | Yes – block immediately |
arachni | Security scanner framework used for automated SQLi and XSS discovery. | Yes – block immediately |
BurpSuite | Intercepting proxy & scanner used in web pentests. | Yes – block immediately |
CensysInspect | Censys HTTP/scan component that inventories services. | Yes – block immediately |
commix | Specialized tool to detect command injection and SQL injection points. | Yes – block immediately |
dirb | Command-line directory brute forcer similar to DirBuster. | Yes – block immediately |
DirBuster | Directory/file brute forcer used to discover hidden paths. | Yes – block immediately |
fimap | File inclusion (LFI/RFI) scanner. | Yes – block immediately |
havij | Windows-based GUI tool for SQL injection attacks. Used by many amateurs. | Yes – block immediately |
jSQL | Java-based SQL injection scanner, easy to use and often misused. | Yes – block immediately |
Nessus | Vulnerability scanner that can probe web services. | Yes – block immediately |
netsparker | Professional security scanner that detects SQL injection and other flaws. | Yes – block immediately |
nikto | Command-line scanner that tests for 6,000+ vulnerabilities including SQLi. | Yes – block immediately |
OpenVAS | Open-source vulnerability scanner (Greenbone). | Yes – block immediately |
Qualys | Enterprise vulnerability scanner; can trigger HTTP checks. | Yes – block immediately |
Shodan | Internet-wide scanner; HTTP fingerprinting. | Yes – block immediately |
sqlmap | Powerful open-source tool for automated SQL injection and database takeover. | Yes – block immediately |
sqlninja | SQL injection exploitation tool (MSSQL focus). | Yes – block immediately |
w3af | Web application attack framework – includes SQL injection modules. | Yes – block immediately |
webinspect | Enterprise security scanner by Micro Focus – includes aggressive SQLi tests. | Yes – block immediately |
WPScan | WordPress security scanner (enumeration, vulnerabilities). | Yes – block immediately |
zap | OWASP ZAP scanner used for SQLi/XSS discovery during pentests. | Yes – block immediately |
ZGrab | Banner/HTTP grabber used by Internet-wide scanners. | Yes – block immediately |
Other Crawlers and Bots
Bot-Name | Description | Recommendation |
---|---|---|
Adagiobot | AdTech/revenue-optimization crawler (also seen as 'dagioBot'). | Yes – block recommended |
AhrefsBot | Backlink crawler from Ahrefs. Very active, often without real benefit. | Yes – block recommended |
AhrefsSiteAudit | Technical SEO crawler (Ahrefs Site Audit). | Yes – block recommended |
AI2Bot | Research crawler from the Allen Institute for AI (AI2); collects web content to train open models. UA: AI2Bot (and Ai2Bot-Dolma). | Yes – block recommended |
Algolia Crawler | Algolia/DocSearch crawler for own indexes. | Optional – allow if you use it |
amazon-kendra | Enterprise crawler (AWS Kendra) for internal search/KB. | Optional – allow if you use it |
APIs-Google | Google APIs fetcher/push notifications; not an indexing crawler. | Allow – harmless |
Apple Podcasts | Apple Podcasts/feeds fetcher; accesses podcast resources. | Optional – allow if you host podcasts |
artemis-web-crawler | Quiet feed/web reader (Artemis). | Optional – allow |
AspiegelBot | Bot from Huawei's Aspiegel platform. Sometimes aggressive crawling. | Yes – block recommended |
Baiduspider | Chinese search engine bot. Often outside target audience. | Optional – depending on your target audience |
Barkrowler | Open-source focused crawler; sometimes aggressive. | Selective blocking recommended |
BLEXBot | SEO/crawling bot by WebMeUp; can be heavy. | Yes – block recommended |
Brightbot | Bright Data crawler (UA: 'Brightbot 1.0'). | Yes – block recommended |
BrightEdge Crawler | SEO crawler from BrightEdge (incl. Autopilot/analytics). | Yes – block recommended |
BUbiNG | Open-source crawler with high parallelism – can heavily load servers. | Yes – block recommended |
Bytespider | ByteDance/TikTok crawler; collects content for TikTok/aggregations. | Yes – block recommended |
CCBot | Common Crawl crawler that indexes the open web for datasets. | Optional – depending on your policy |
ClaudeBot | Anthropic AI crawler for model/data collection (respects robots). | Optional – depending on your policy |
crawler | Generic term. Used in many bot user agents. | Selective blocking recommended |
curl | Command-line tool. Often used for automated page requests. | Selective blocking recommended |
DataForSeoBot | SEO data collection bot. | Yes – block recommended |
DeepSeekBot | Crawler for DeepSeek’s AI models/search; collects public web content. | Yes – block recommended |
Diffbot | AI extraction/knowledge graph crawler (commercial service). | Optional – allow if you use it |
Discordbot | Discord embed preview bot. | Optional – allow if previews matter |
DotBot | Unclear crawler, likely Moz. Often provides little real value. | Yes – block recommended |
DotNetDotCom | Poorly documented bot with noticeable traffic behavior. | Yes – block recommended |
Facebookexternalhit | Facebook/Meta link preview fetcher. | Optional – allow if you need rich previews |
facebot | Meta/FB bot variant for link previews. | Optional – allow if you need rich previews |
FirecrawlAgent | AI agent/scraper (Firecrawl/Mendable) used to fetch and parse websites. | Yes – block recommended |
Go-http-client | Generic Go HTTP client used by many scripts. | Selective blocking recommended |
Google-Extended | Google flag UA variant for AI training access (opt-out via robots). | Optional – depending on your policy |
GPTBot | OpenAI web crawler that fetches publicly available pages to improve models and safety systems. Identifies as 'GPTBot' and honors robots.txt (opt-out possible). | Optional – depending on your policy; many sites block (robots.txt: User-agent: GPTBot / Disallow: /) |
GTmetrix | Performance testing bot. | Optional – allow when testing |
Java | Generic Java HTTP client UA (e.g., Java/1.8). | Selective blocking recommended |
libwww-perl | Technical library, often misused for automated requests. | Yes – block recommended |
Lighthouse | Google Lighthouse/PageSpeed testing. | Optional – allow when testing |
LinkedInBot | LinkedIn link preview fetcher. | Optional – allow if previews matter |
masscan | Fast port scanner, often used for security or vulnerability scans. | Yes – block recommended |
MJ12bot | Link analysis by Majestic. Often causes high server load. | Yes – block recommended |
okhttp | Android/Java HTTP client common in scrapers. | Selective blocking recommended |
PerplexityBot | Perplexity AI crawler (respects robots; opt-out via robots). | Optional – depending on your policy |
PetalBot | Crawler from Huawei (Petal Search). Usually low relevance. | Optional – depending on your target audience |
Pingdom | Uptime/performance monitor. | Optional – allow if you use it |
python | General user agent for custom Python scripts. | Selective blocking recommended |
python-requests | Automation tool for HTTP requests. Common in scraping. | Selective blocking recommended |
python-urllib | Python urllib user agent often used in scripts. | Selective blocking recommended |
scrapy | Python web scraping framework, frequently used by bots. | Yes – block recommended |
SemrushBot | SEO crawler from Semrush. Very active and causes high server load. | Yes – block recommended |
SenutoBot | SEO crawler from Senuto (Senuto Sp. z o.o.). | Yes – block recommended |
SEOkicks-Robot | German SEO crawler for link analysis. Noticeable server impact. | Yes – block recommended |
SeznamBot | Czech search engine bot from Seznam. | Optional – depending on your target audience |
SiteCheckerBot | SEO audit/monitoring crawler from Sitechecker. | Yes – block recommended |
SiteExplorer | Outdated bot from Yahoo or alternative sources. | Yes – block recommended |
Siteimprove | Crawler of the Siteimprove platform (QA/Accessibility/Policy/SEO). | Optional – allow if you use it |
Sogou | Crawler from Sogou. Mainly active in Asia. | Yes – block recommended |
spider | Like 'crawler' – common term in many bot names. | Selective blocking recommended |
StatusCake | Uptime monitor similar to Pingdom. | Optional – allow if you use it |
TelegramBot | Telegram link preview fetcher. | Optional – allow if previews matter |
Thunderbit | AI scraper/no-code tool; can fetch pages at scale. | Yes – block recommended |
Twitterbot | X/Twitter link preview bot. | Optional – allow if previews matter |
UptimeRobot | Monitoring service for website availability. Not critical. | Optional – depending on your target audience |
Webhook | Generic term for incoming HTTP calls (not a specific bot). | N/A – depends on integration |
wget | Tool for scripted content retrieval. Can load page content directly. | Selective blocking recommended |
WhatsApp link preview fetcher. | Optional – allow if previews matter | |
YandexBot | Russian search engine bot. Not always relevant. | Optional – depending on your target audience |
ZoominfoBot | Crawler from ZoomInfo to collect data for company profiles. | Yes – block recommended |
Search Engine Bots
Bot-Name | Description | Recommendation |
---|---|---|
360Spider | Qihoo 360 search crawler (CN). | Optional – only for CN audience |
AdIdxBot | Microsoft Advertising (Bing Ads) crawler for destination URLs. | Allow – if you use Microsoft Ads |
AdsBot-Google | Google Ads quality crawler. | Allow – if you use Google Ads |
Amazonbot | Bot from Amazon, used for Alexa and product-related content. | Block – not relevant for most sites |
Applebot | Crawler for Apple (Siri/Spotlight suggestions). | Allow – optional |
Applebot-Image | Apple image crawler. | Allow – optional |
Baiduspider | Crawler from Baidu (China’s leading search engine). | Optional – only needed for Chinese market |
Bingbot | Microsoft’s main crawler for Bing search results. | Allow – if Bing is relevant to your audience |
DuckDuckBot | Bot from DuckDuckGo, privacy-focused search engine. | Allow – minimal impact, privacy-oriented |
Exabot | French search engine bot – low visibility and relevance. | Optional – rarely useful |
Gigabot | Crawler from Gigablast, small-scale search engine. | Optional – rarely needed |
Google-InspectionTool | Search Console/inspection fetcher. | Allow – recommended |
Googlebot | Crawler from Google used for indexing websites and images. | Allow – important for visibility |
Googlebot-Image | Google's image crawler – indexes images in Google Images. | Allow – if image search is desired |
Mediapartners-Google | Google AdSense crawler (determines ad relevance). | Allow – if you use AdSense |
MojeekBot | Crawler from independent privacy-based search engine Mojeek. | Optional – minimal impact |
msnbot | Legacy Microsoft crawler (pre-bingbot). | Optional – mostly legacy |
OAI-SearchBot | OpenAI search crawler used for ChatGPT Search (indexing; not model training). | Allow – if ChatGPT Search is relevant |
PetalBot | Bot from Huawei’s Petal Search engine. | Optional – usage depends on your region |
PhindBot | Crawler for Phind AI search. | Optional – depending on your target audience |
Qwantify | Crawler from Qwant, a French privacy-respecting search engine. | Allow – optional |
SeekportBot | German search engine bot (seekport). | Optional – regional relevance only |
Slurp | Yahoo’s legacy crawler, sometimes inactive but still appearing. | Optional – depending on target market |
Sogou Spider | Chinese search engine bot with aggressive behavior. | Optional – may cause high server load |
YandexBot | Crawler from Yandex (Russian search engine). | Optional – only if targeting Russian audience |
Yeti | Naver search crawler (Korea). | Optional – only for KR audience |
YouBot | Crawler for You.com AI search. | Optional – depending on your target audience |