Website Bot Test Tool



Download bot blocklists
Strict (recommended) combines Hard and Recommended.
Available formats: TXT, CSV, JSON, and a ready-to-use .htaccess snippet.
Matching is case-insensitive on User-Agent substrings—please test before going live.



History & quick facts about bots


Key milestones

  • 1966 — ELIZA (MIT): early chatbot by Joseph Weizenbaum simulating a therapist.*
  • 1993 — World Wide Web Wanderer: first web crawler (Matthew Gray), led to Wandex.
  • 1993 — Eggdrop: earliest widely used IRC bot (Robey Pointer).
  • 1994 — WebCrawler: first full-text search engine crawler (Brian Pinkerton).
  • 1994 — robots.txt proposed by Martijn Koster.
  • 2005 — XML Sitemaps introduced (Google; later adopted by others).
  • 2000s — CAPTCHA popularized (Luis von Ahn & team).
  • 2022 — robots.txt standardized as RFC 9309.

*ELIZA — explanation

ELIZA belongs to the bot family, but to a different species than crawlers/scanners.

What ELIZA is:
An early chatbot (1966, MIT, Joseph Weizenbaum) that simulated conversation using simple rules/patterns.

Why it is a 'bot':
A bot is an automated program that performs tasks. Chatbots automate dialogue; crawlers automate exploration/indexing. Both are bots, just with different purposes.

Why this is relevant:
ELIZA is considered a starting point for conversational bots (later PARRY, A.L.I.C.E., etc.). It shows how long automation has imitated human interaction—and why we use mechanisms like CAPTCHA/verification to distinguish humans from machines today.

Fast facts

  • Bots aren’t all bad — search engines and monitors rely on them; harmful ones include scrapers, scanners and DDoS bots.
  • User-Agent strings are easy to spoof; reverse+forward DNS is more reliable to verify good bots.
  • robots.txt is voluntary; malicious bots may ignore it — server-side controls are essential.
  • `crawl-delay` is not supported by Google; use server-side throttling or Search Console crawl rate.
  • Sitemaps help discovery, not rankings; good internal linking remains crucial.
  • CAPTCHAs evolve as bots get better; expect periodic changes in anti-bot methods.
  • Early crawlers mainly measured the Web’s size — the landscape changed once full-text indexing arrived.

What is a bot?

A bot is a computer program that performs repetitive tasks largely automatically, without direct human intervention.
Well-known examples include spiders, crawlers, or web bots that browse websites or collect data.
The term bot is derived from robot but has nothing to do with metal machines. They are usually scripts or complex programs running in the background.

Are all bots bad?

There are useful and harmful bots.

  • Useful bots include search engine bots, chatbots, or monitoring bots.
  • Harmful bots include spam bots, scraper bots, brute-force bots, or DDoS bots.

How do bots work?

Once simple scripts, today often equipped with AI and machine learning to mimic human behavior and exploit security vulnerabilities.

How to recognize bot traffic?

Warning signs: many spam comments, sudden traffic spikes, mass form submissions, suspicious logins.

What types of bots exist?

Category Description
Search engine bots Crawl pages for Google, Bing, Yandex etc. to index them.
SEO / Backlink bots Analyze site structure to evaluate rankings or backlinks (e.g. Ahrefs, Semrush).
Monitoring bots Check your website's availability (e.g. UptimeRobot).
Security scanners Look for vulnerabilities (e.g. sqlmap, Acunetix, Nikto).
Scrapers Steal content, prices, or data through automated access.
Fake bots / Malware Pretend to be legitimate bots but perform scans or attacks.

Why should some bots be blocked?

  • Server load: Some bots generate a massive number of requests and slow down your website.
  • Data theft: Scrapers automatically copy your texts, images, or prices.
  • Security risk: SQL injection bots or exploit scanners try to hack your site.
  • Traffic distortion: They produce fake visits in logs or analytics.
This page provides an overview of known bots accessing your website, including SQL injection scanners and crawlers.
You can review each bot's purpose and decide whether to block or allow them based on your site's needs.
SQL injection scanners are security tools often used for attacks and should be blocked immediately to protect your site.
Other crawlers and bots can be selectively blocked depending on your server capacity, relevance, and your audience.

SQL Injection Scanners

Bot-Name Description Recommendation
acunetix Commercial website scanner that tests for SQL injection, XSS, and more. Yes – block immediately
arachni Security scanner framework used for automated SQLi and XSS discovery. Yes – block immediately
BurpSuite Intercepting proxy & scanner used in web pentests. Yes – block immediately
CensysInspect Censys HTTP/scan component that inventories services. Yes – block immediately
commix Specialized tool to detect command injection and SQL injection points. Yes – block immediately
dirb Command-line directory brute forcer similar to DirBuster. Yes – block immediately
DirBuster Directory/file brute forcer used to discover hidden paths. Yes – block immediately
fimap File inclusion (LFI/RFI) scanner. Yes – block immediately
havij Windows-based GUI tool for SQL injection attacks. Used by many amateurs. Yes – block immediately
jSQL Java-based SQL injection scanner, easy to use and often misused. Yes – block immediately
Nessus Vulnerability scanner that can probe web services. Yes – block immediately
netsparker Professional security scanner that detects SQL injection and other flaws. Yes – block immediately
nikto Command-line scanner that tests for 6,000+ vulnerabilities including SQLi. Yes – block immediately
OpenVAS Open-source vulnerability scanner (Greenbone). Yes – block immediately
Qualys Enterprise vulnerability scanner; can trigger HTTP checks. Yes – block immediately
Shodan Internet-wide scanner; HTTP fingerprinting. Yes – block immediately
sqlmap Powerful open-source tool for automated SQL injection and database takeover. Yes – block immediately
sqlninja SQL injection exploitation tool (MSSQL focus). Yes – block immediately
w3af Web application attack framework – includes SQL injection modules. Yes – block immediately
webinspect Enterprise security scanner by Micro Focus – includes aggressive SQLi tests. Yes – block immediately
WPScan WordPress security scanner (enumeration, vulnerabilities). Yes – block immediately
zap OWASP ZAP scanner used for SQLi/XSS discovery during pentests. Yes – block immediately
ZGrab Banner/HTTP grabber used by Internet-wide scanners. Yes – block immediately

Other Crawlers and Bots

Bot-Name Description Recommendation
Adagiobot AdTech/revenue-optimization crawler (also seen as 'dagioBot'). Yes – block recommended
AhrefsBot Backlink crawler from Ahrefs. Very active, often without real benefit. Yes – block recommended
AhrefsSiteAudit Technical SEO crawler (Ahrefs Site Audit). Yes – block recommended
AI2Bot Research crawler from the Allen Institute for AI (AI2); collects web content to train open models. UA: AI2Bot (and Ai2Bot-Dolma). Yes – block recommended
Algolia Crawler Algolia/DocSearch crawler for own indexes. Optional – allow if you use it
amazon-kendra Enterprise crawler (AWS Kendra) for internal search/KB. Optional – allow if you use it
APIs-Google Google APIs fetcher/push notifications; not an indexing crawler. Allow – harmless
Apple Podcasts Apple Podcasts/feeds fetcher; accesses podcast resources. Optional – allow if you host podcasts
artemis-web-crawler Quiet feed/web reader (Artemis). Optional – allow
AspiegelBot Bot from Huawei's Aspiegel platform. Sometimes aggressive crawling. Yes – block recommended
Baiduspider Chinese search engine bot. Often outside target audience. Optional – depending on your target audience
Barkrowler Open-source focused crawler; sometimes aggressive. Selective blocking recommended
BLEXBot SEO/crawling bot by WebMeUp; can be heavy. Yes – block recommended
Brightbot Bright Data crawler (UA: 'Brightbot 1.0'). Yes – block recommended
BrightEdge Crawler SEO crawler from BrightEdge (incl. Autopilot/analytics). Yes – block recommended
BUbiNG Open-source crawler with high parallelism – can heavily load servers. Yes – block recommended
Bytespider ByteDance/TikTok crawler; collects content for TikTok/aggregations. Yes – block recommended
CCBot Common Crawl crawler that indexes the open web for datasets. Optional – depending on your policy
ClaudeBot Anthropic AI crawler for model/data collection (respects robots). Optional – depending on your policy
crawler Generic term. Used in many bot user agents. Selective blocking recommended
curl Command-line tool. Often used for automated page requests. Selective blocking recommended
DataForSeoBot SEO data collection bot. Yes – block recommended
DeepSeekBot Crawler for DeepSeek’s AI models/search; collects public web content. Yes – block recommended
Diffbot AI extraction/knowledge graph crawler (commercial service). Optional – allow if you use it
Discordbot Discord embed preview bot. Optional – allow if previews matter
DotBot Unclear crawler, likely Moz. Often provides little real value. Yes – block recommended
DotNetDotCom Poorly documented bot with noticeable traffic behavior. Yes – block recommended
Facebookexternalhit Facebook/Meta link preview fetcher. Optional – allow if you need rich previews
facebot Meta/FB bot variant for link previews. Optional – allow if you need rich previews
FirecrawlAgent AI agent/scraper (Firecrawl/Mendable) used to fetch and parse websites. Yes – block recommended
Go-http-client Generic Go HTTP client used by many scripts. Selective blocking recommended
Google-Extended Google flag UA variant for AI training access (opt-out via robots). Optional – depending on your policy
GPTBot OpenAI web crawler that fetches publicly available pages to improve models and safety systems. Identifies as 'GPTBot' and honors robots.txt (opt-out possible). Optional – depending on your policy; many sites block (robots.txt: User-agent: GPTBot / Disallow: /)
GTmetrix Performance testing bot. Optional – allow when testing
Java Generic Java HTTP client UA (e.g., Java/1.8). Selective blocking recommended
libwww-perl Technical library, often misused for automated requests. Yes – block recommended
Lighthouse Google Lighthouse/PageSpeed testing. Optional – allow when testing
LinkedInBot LinkedIn link preview fetcher. Optional – allow if previews matter
masscan Fast port scanner, often used for security or vulnerability scans. Yes – block recommended
MJ12bot Link analysis by Majestic. Often causes high server load. Yes – block recommended
okhttp Android/Java HTTP client common in scrapers. Selective blocking recommended
PerplexityBot Perplexity AI crawler (respects robots; opt-out via robots). Optional – depending on your policy
PetalBot Crawler from Huawei (Petal Search). Usually low relevance. Optional – depending on your target audience
Pingdom Uptime/performance monitor. Optional – allow if you use it
python General user agent for custom Python scripts. Selective blocking recommended
python-requests Automation tool for HTTP requests. Common in scraping. Selective blocking recommended
python-urllib Python urllib user agent often used in scripts. Selective blocking recommended
scrapy Python web scraping framework, frequently used by bots. Yes – block recommended
SemrushBot SEO crawler from Semrush. Very active and causes high server load. Yes – block recommended
SenutoBot SEO crawler from Senuto (Senuto Sp. z o.o.). Yes – block recommended
SEOkicks-Robot German SEO crawler for link analysis. Noticeable server impact. Yes – block recommended
SeznamBot Czech search engine bot from Seznam. Optional – depending on your target audience
SiteCheckerBot SEO audit/monitoring crawler from Sitechecker. Yes – block recommended
SiteExplorer Outdated bot from Yahoo or alternative sources. Yes – block recommended
Siteimprove Crawler of the Siteimprove platform (QA/Accessibility/Policy/SEO). Optional – allow if you use it
Sogou Crawler from Sogou. Mainly active in Asia. Yes – block recommended
spider Like 'crawler' – common term in many bot names. Selective blocking recommended
StatusCake Uptime monitor similar to Pingdom. Optional – allow if you use it
TelegramBot Telegram link preview fetcher. Optional – allow if previews matter
Thunderbit AI scraper/no-code tool; can fetch pages at scale. Yes – block recommended
Twitterbot X/Twitter link preview bot. Optional – allow if previews matter
UptimeRobot Monitoring service for website availability. Not critical. Optional – depending on your target audience
Webhook Generic term for incoming HTTP calls (not a specific bot). N/A – depends on integration
wget Tool for scripted content retrieval. Can load page content directly. Selective blocking recommended
WhatsApp WhatsApp link preview fetcher. Optional – allow if previews matter
YandexBot Russian search engine bot. Not always relevant. Optional – depending on your target audience
ZoominfoBot Crawler from ZoomInfo to collect data for company profiles. Yes – block recommended

Search Engine Bots

Bot-Name Description Recommendation
360Spider Qihoo 360 search crawler (CN). Optional – only for CN audience
AdIdxBot Microsoft Advertising (Bing Ads) crawler for destination URLs. Allow – if you use Microsoft Ads
AdsBot-Google Google Ads quality crawler. Allow – if you use Google Ads
Amazonbot Bot from Amazon, used for Alexa and product-related content. Block – not relevant for most sites
Applebot Crawler for Apple (Siri/Spotlight suggestions). Allow – optional
Applebot-Image Apple image crawler. Allow – optional
Baiduspider Crawler from Baidu (China’s leading search engine). Optional – only needed for Chinese market
Bingbot Microsoft’s main crawler for Bing search results. Allow – if Bing is relevant to your audience
DuckDuckBot Bot from DuckDuckGo, privacy-focused search engine. Allow – minimal impact, privacy-oriented
Exabot French search engine bot – low visibility and relevance. Optional – rarely useful
Gigabot Crawler from Gigablast, small-scale search engine. Optional – rarely needed
Google-InspectionTool Search Console/inspection fetcher. Allow – recommended
Googlebot Crawler from Google used for indexing websites and images. Allow – important for visibility
Googlebot-Image Google's image crawler – indexes images in Google Images. Allow – if image search is desired
Mediapartners-Google Google AdSense crawler (determines ad relevance). Allow – if you use AdSense
MojeekBot Crawler from independent privacy-based search engine Mojeek. Optional – minimal impact
msnbot Legacy Microsoft crawler (pre-bingbot). Optional – mostly legacy
OAI-SearchBot OpenAI search crawler used for ChatGPT Search (indexing; not model training). Allow – if ChatGPT Search is relevant
PetalBot Bot from Huawei’s Petal Search engine. Optional – usage depends on your region
PhindBot Crawler for Phind AI search. Optional – depending on your target audience
Qwantify Crawler from Qwant, a French privacy-respecting search engine. Allow – optional
SeekportBot German search engine bot (seekport). Optional – regional relevance only
Slurp Yahoo’s legacy crawler, sometimes inactive but still appearing. Optional – depending on target market
Sogou Spider Chinese search engine bot with aggressive behavior. Optional – may cause high server load
YandexBot Crawler from Yandex (Russian search engine). Optional – only if targeting Russian audience
Yeti Naver search crawler (Korea). Optional – only for KR audience
YouBot Crawler for You.com AI search. Optional – depending on your target audience