🛡️

Stop Bot Traffic — The Complete Guide

Nearly half of all internet traffic is bots. Some are harmless — search engine crawlers, uptime monitors, RSS readers. But a growing share is malicious: scrapers stealing your content, credential stuffers attacking login pages, fake signup bots polluting your database, and click fraud draining your ad budget.

If you're seeing unexplained traffic spikes, high bounce rates from datacenter IPs, or suspicious patterns in your analytics, you have a bot problem. This guide walks you through every practical technique to stop bot traffic — from quick wins you can deploy in minutes to production-grade IP intelligence that catches even sophisticated residential proxy bots.

1. How to Identify Bot Traffic

Before you can stop bots, you need to see them. Most analytics tools (Google Analytics, PostHog, Plausible) filter out known bots by default, which means your dashboard might look clean while your server logs tell a different story.

Red Flags in Your Data

Traffic spikes at odd hours — Bots don't sleep. A 3 AM traffic spike with 100% bounce rate from a single ASN is almost always automated.
Datacenter IP addresses — Real users browse from residential ISPs. Traffic from AWS, Google Cloud, DigitalOcean, or Hetzner is almost certainly automated.
Identical request patterns — Same pages hit in the same order, at the same interval. Humans don't browse like metronomes.
Zero JavaScript execution — If your analytics show page views with no JS events (no scrolls, no clicks, no time-on-page), those "visitors" aren't running a browser.
Missing or fake User-Agent strings — python-requests/2.31.0, Go-http-client/1.1, or suspiciously outdated Chrome versions.
High form submission rates — 500 signups in an hour, all from different IPs but the same /24 subnet.

⚠️

Don't Block All Bots

Googlebot, Bingbot, and other search engine crawlers are bots too — but blocking them tanks your SEO. Same with uptime monitors, payment webhooks, and legitimate API integrations. The goal is to block malicious bots while allowing good ones through. This guide focuses on that distinction.

Check Your Server Logs

The fastest way to confirm a bot problem is to check your access logs directly. Here's a quick one-liner to find the top IPs hitting your site:

# Top 20 IPs by request count (Nginx access log)
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

# Filter to only POST requests (signups, logins, forms)
grep "POST " /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -20

# Check a suspicious IP against IPASIS
curl -s "https://api.ipasis.com/v1/lookup?ip=203.0.113.42" | jq '.risk_score, .is_vpn, .is_datacenter'

2. Quick Wins: Deploy in 5 Minutes

These techniques won't stop sophisticated bots, but they'll filter out the lazy ones — which account for 60-70% of malicious bot traffic.

robots.txt (Polite Bots Only)

robots.txt asks bots to follow rules. Legitimate crawlers (Googlebot, Bingbot) respect it. Malicious bots ignore it completely. Still, it's the baseline.

# robots.txt — block aggressive crawlers, allow search engines
User-agent: *
Disallow: /api/
Disallow: /admin/
Disallow: /internal/

# Allow search engines explicitly
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Block known bad bots
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /

Rate Limiting (Nginx)

Rate limiting at the reverse proxy layer is fast and free. This catches the most obvious automated traffic:

# nginx.conf — rate limit by IP
http {
    # 10 requests/second per IP, burst up to 20
    limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;

    # Stricter for auth endpoints: 3 requests/second
    limit_req_zone $binary_remote_addr zone=auth:10m rate=3r/s;

    server {
        # General rate limit
        location / {
            limit_req zone=general burst=20 nodelay;
            proxy_pass http://backend;
        }

        # Strict rate limit on login/signup
        location ~ ^/(login|signup|api/auth) {
            limit_req zone=auth burst=5 nodelay;
            limit_req_status 429;
            proxy_pass http://backend;
        }
    }
}

Block Known Bad ASNs

Some Autonomous System Numbers (ASNs) are almost exclusively used by bots. You can block or challenge traffic from hosting providers that don't serve real users:

# Block or challenge traffic from known bot-heavy ASNs
# (Use with caution — some legitimate users use VPNs)
# Better approach: use IP intelligence to get real-time risk scores

# iptables example (blocks entire ASN ranges)
iptables -A INPUT -s 198.51.100.0/24 -j DROP

# Better: Nginx geo block with risk scoring
# See Section 3 for the IP intelligence approach

💡

Quick Wins Are Not Enough

Rate limiting and robots.txt stop simple bots. But modern bots rotate IPs, use residential proxies, and mimic real browser fingerprints. To catch these, you need real-time IP intelligence — which is what the rest of this guide covers.

3. IP Intelligence — The Most Effective Bot Filter

Every request to your website comes from an IP address. That IP carries a massive amount of signal: Is it a datacenter or residential ISP? Is it a known VPN or proxy? Has it been involved in abuse before? What country and ASN does it belong to?

IP intelligence turns every incoming IP into a risk profile in real-time. Instead of maintaining static blocklists that go stale within hours, you query a service that aggregates data from millions of sources and returns a risk score, proxy/VPN status, abuse history, and geolocation — all in under 20ms.

What IP Intelligence Tells You

🏢 Connection Type

Is the IP from a residential ISP (real user), datacenter/cloud provider (likely bot), or mobile carrier? Datacenter traffic to a consumer website is 95%+ automated.

🔒 VPN / Proxy / Tor

Is the user hiding behind a VPN, proxy, or Tor exit node? While privacy-conscious users use VPNs, bots use them to rotate IPs and evade blocks.

⚠️ Abuse History

Has this IP been reported for spam, brute-force attacks, or fraud across other websites? IPs with recent abuse history are 40x more likely to be malicious.

📊 Risk Score

A composite score (0-100) that combines all signals. Set thresholds: allow below 30, challenge between 30-70, block above 70. Tune based on your tolerance.

Why It Works Better Than Everything Else

Static blocklists go stale in hours. CAPTCHAs add friction for every user. User-Agent checks are trivially spoofed. Rate limiting catches volume but misses distributed attacks.

IP intelligence works because the IP can't be spoofed at the TCP level. A bot can fake its User-Agent, run headless Chrome, solve CAPTCHAs with AI, and rotate cookies — but it can't change the fact that its IP belongs to a datacenter, has been flagged for abuse, or is routing through a residential proxy network. The IP is the one signal the attacker can't fully control.

// Quick IP check with IPASIS — one API call, all signals
const response = await fetch('https://api.ipasis.com/v1/lookup?ip=' + clientIp, {
  headers: { 'Authorization': 'Bearer ' + process.env.IPASIS_API_KEY }
});
const data = await response.json();

// data returns:
// {
//   "risk_score": 82,
//   "is_vpn": true,
//   "is_proxy": false,
//   "is_tor": false,
//   "is_datacenter": true,
//   "is_crawler": false,
//   "abuse_score": 75,
//   "country": "US",
//   "asn": { "number": 14061, "name": "DIGITALOCEAN-ASN" },
//   "connection_type": "datacenter"
// }

if (data.risk_score > 70) {
  // High risk — block or serve CAPTCHA
  return res.status(403).json({ error: 'Access denied' });
}

if (data.is_datacenter && !isKnownBot(req)) {
  // Datacenter IP that isn't Googlebot — suspicious
  return serveCaptcha(req, res);
}

4. User-Agent Analysis and Header Fingerprinting

The User-Agent header is the easiest signal to check and the easiest to fake. Still, it catches a surprising number of bots that don't bother spoofing it.

Obvious Bot User-Agents

# These User-Agents are definitely bots
python-requests/2.31.0
Go-http-client/1.1
axios/1.6.0
curl/8.4.0
Java/11.0.2
Scrapy/2.11.0
Wget/1.21
libwww-perl/6.67

Beyond obvious bot strings, look for header inconsistencies:

Missing headers — Real browsers send Accept, Accept-Language, Accept-Encoding, and Sec-Fetch-* headers. Bots often skip them.
Header order — Chrome sends headers in a specific order. Bots using raw HTTP libraries send them differently.
TLS fingerprint mismatch — A request claiming to be Chrome 120 but with a Go TLS fingerprint (JA3/JA4) is a bot.
Outdated browser versions — Chrome 89 hasn't been current since 2021. Any request with it in 2026 is almost certainly automated.

// Header consistency check
function checkHeaderConsistency(req) {
  const ua = req.headers['user-agent'] || '';
  const signals = [];

  // No User-Agent at all
  if (!ua) signals.push('missing_ua');

  // Known bot libraries
  if (/python-requests|Go-http-client|axios|curl|wget|scrapy/i.test(ua)) {
    signals.push('bot_library');
  }

  // Claims to be a browser but missing browser headers
  if (/Chrome|Firefox|Safari/.test(ua)) {
    if (!req.headers['accept-language']) signals.push('missing_accept_language');
    if (!req.headers['accept-encoding']) signals.push('missing_accept_encoding');
    if (!req.headers['sec-fetch-mode'])  signals.push('missing_sec_fetch');
  }

  // Outdated Chrome (2+ years old)
  const chromeMatch = ua.match(/Chrome\/(\d+)/);
  if (chromeMatch && parseInt(chromeMatch[1]) < 110) {
    signals.push('outdated_chrome');
  }

  return signals;
}

5. Behavioral Analysis: Catching Bots That Look Human

Sophisticated bots use real browsers (Playwright, Puppeteer), rotate residential proxy IPs, and even simulate mouse movements. To catch these, you need to analyze behavior rather than static signals.

Server-Side Behavioral Signals

Request velocity — A user reading a blog post takes 30-120 seconds. A bot scraping pages takes <1 second between requests.
Navigation patterns — Humans browse nonlinearly (homepage → pricing → blog → pricing again). Bots traverse sequentially or alphabetically.
Session depth — A bot visiting 200 pages in 5 minutes with zero JavaScript events is not a human.
Form timing — Humans take 5-30 seconds to fill a login form. Bots submit in <1 second.
Honeypot fields — Hidden form fields that real users never see or fill. Bots parsing HTML will fill them.

<!-- Honeypot technique — add a hidden field that bots will fill -->
<form action="/signup" method="POST">
  <input type="email" name="email" required />
  <input type="password" name="password" required />

  <!-- Honeypot: invisible to humans, bots will fill it -->
  <div style="position: absolute; left: -9999px;" aria-hidden="true">
    <input type="text" name="website_url" tabindex="-1" autocomplete="off" />
  </div>

  <!-- Timing: record when the form was rendered -->
  <input type="hidden" name="_form_rendered" value="1711972800" />

  <button type="submit">Sign Up</button>
</form>

// Server-side: check honeypot and timing
app.post('/signup', (req, res) => {
  // Honeypot check — if filled, it's a bot
  if (req.body.website_url) {
    console.log('Bot detected: honeypot filled', req.ip);
    return res.status(200).json({ success: true }); // Fake success to fool the bot
  }

  // Timing check — form submitted too fast
  const renderTime = parseInt(req.body._form_rendered);
  const submitTime = Math.floor(Date.now() / 1000);
  if (submitTime - renderTime < 3) {
    console.log('Bot detected: form submitted in < 3 seconds', req.ip);
    return res.status(200).json({ success: true }); // Fake success
  }

  // Combine with IP intelligence for layered defense
  // ... proceed with real signup logic
});

6. CAPTCHAs: When to Use Them (and When Not To)

CAPTCHAs are the most visible anti-bot tool — and the most controversial. They work, but at a cost: every CAPTCHA challenge loses 10-20% of legitimate users who abandon the form rather than prove they're human.

When CAPTCHAs Make Sense

As a fallback for medium-risk traffic (risk score 30-70) — don't show it to everyone
On high-value forms: password reset, payment, account recovery
After multiple failed attempts: login failures, rate limit hits
For anonymous actions: public API signups, free trial registration

When CAPTCHAs Are Wrong

On every page load — Kills user experience and SEO (Google can't crawl through CAPTCHAs)
For logged-in users — You already know who they are
As the only defense — AI CAPTCHA solvers have 90%+ success rates in 2026. CAPTCHAs alone don't stop modern bots.
For API endpoints — APIs don't have a UI to show challenges in

✅

Best Practice: Risk-Based CAPTCHAs

Use IP intelligence to score every request first. Low risk? Let them through. High risk? Block silently. Medium risk? That's when you show a CAPTCHA. This approach reduces CAPTCHA challenges by 80% while catching more bots than showing it to everyone. See CAPTCHA vs Bot Detection: Which Is Better?

7. WAF and CDN-Level Bot Protection

Web Application Firewalls (WAFs) and CDNs like Cloudflare, AWS WAF, and Fastly offer built-in bot management. They're useful as a first layer but have significant limitations:

Cloudflare Bot Management — Uses JS challenges and ML scoring. Good for high-volume DDoS but expensive ($$$) for bot management. Catches obvious bots, struggles with residential proxies.
AWS WAF — Rule-based blocking (rate limiting, geo-blocking, IP sets). No native bot intelligence — you write rules manually.
Akamai Bot Manager — Enterprise-grade, expensive. Strong but requires months of tuning.

The problem with CDN-level protection: it's either too aggressive (blocking real users) or too permissive (missing sophisticated bots). CDNs don't have your application context. They don't know that a POST to /api/signup should be treated differently than a GET to /blog.

The most effective approach is layered defense: CDN/WAF for DDoS and obvious bot traffic, plus application-level IP intelligence for nuanced, context-aware decisions. See our Cloudflare Bot Management comparison for details.

8. Code Examples: Node.js, Python, Go

Here are production-ready middleware examples that combine IP intelligence with rate limiting and header checks. Each one is a drop-in addition to your existing stack.

Node.js / Express

import express from 'express';
import { LRUCache } from 'lru-cache';

const app = express();
const ipCache = new LRUCache({ max: 10000, ttl: 5 * 60 * 1000 }); // 5 min TTL

// Bot detection middleware
async function botDetection(req, res, next) {
  const ip = req.headers['x-forwarded-for']?.split(',')[0]?.trim() || req.ip;

  // Check cache first
  let risk = ipCache.get(ip);
  if (!risk) {
    try {
      const resp = await fetch(`https://api.ipasis.com/v1/lookup?ip=${ip}`, {
        headers: { Authorization: `Bearer ${process.env.IPASIS_API_KEY}` },
        signal: AbortSignal.timeout(2000), // 2s timeout — fail open
      });
      risk = await resp.json();
      ipCache.set(ip, risk);
    } catch {
      return next(); // Fail open on timeout
    }
  }

  // Block high-risk
  if (risk.risk_score > 75) {
    return res.status(403).json({ error: 'Blocked' });
  }

  // Flag medium-risk for downstream handling
  req.ipRisk = risk;
  next();
}

// Apply to all routes
app.use(botDetection);

// Stricter on auth endpoints
app.post('/login', (req, res) => {
  if (req.ipRisk?.risk_score > 40 || req.ipRisk?.is_vpn) {
    return res.status(403).json({ error: 'Please verify your identity' });
  }
  // ... login logic
});

app.listen(3000);

Python / Flask

import os, time, requests
from functools import lru_cache
from flask import Flask, request, jsonify, abort

app = Flask(__name__)
IPASIS_KEY = os.environ["IPASIS_API_KEY"]

# Simple TTL cache (production: use Redis)
_cache = {}

def get_ip_risk(ip: str) -> dict:
    now = time.time()
    if ip in _cache and now - _cache[ip]["ts"] < 300:  # 5 min
        return _cache[ip]["data"]

    try:
        resp = requests.get(
            f"https://api.ipasis.com/v1/lookup?ip={ip}",
            headers={"Authorization": f"Bearer {IPASIS_KEY}"},
            timeout=2,
        )
        data = resp.json()
        _cache[ip] = {"data": data, "ts": now}
        return data
    except Exception:
        return {"risk_score": 0}  # Fail open

@app.before_request
def bot_detection():
    ip = request.headers.get("X-Forwarded-For", request.remote_addr).split(",")[0].strip()
    risk = get_ip_risk(ip)

    if risk.get("risk_score", 0) > 75:
        abort(403, description="Access denied")

    # Attach to request for downstream use
    request.ip_risk = risk

@app.route("/signup", methods=["POST"])
def signup():
    risk = getattr(request, "ip_risk", {})
    if risk.get("is_datacenter") or risk.get("risk_score", 0) > 40:
        abort(403, description="Suspicious signup blocked")
    # ... signup logic
    return jsonify({"success": True})

Go / net/http

package main

import (
    "encoding/json"
    "fmt"
    "net"
    "net/http"
    "os"
    "sync"
    "time"
)

type IPRisk struct {
    RiskScore    int    `json:"risk_score"`
    IsVPN        bool   `json:"is_vpn"`
    IsDatacenter bool   `json:"is_datacenter"`
    IsTor        bool   `json:"is_tor"`
    CachedAt     time.Time
}

var (
    cache   = sync.Map{}
    apiKey  = os.Getenv("IPASIS_API_KEY")
    client  = &http.Client{Timeout: 2 * time.Second}
)

func getIPRisk(ip string) *IPRisk {
    if cached, ok := cache.Load(ip); ok {
        risk := cached.(*IPRisk)
        if time.Since(risk.CachedAt) < 5*time.Minute {
            return risk
        }
    }

    url := fmt.Sprintf("https://api.ipasis.com/v1/lookup?ip=%s", ip)
    req, _ := http.NewRequest("GET", url, nil)
    req.Header.Set("Authorization", "Bearer "+apiKey)

    resp, err := client.Do(req)
    if err != nil {
        return &IPRisk{} // Fail open
    }
    defer resp.Body.Close()

    var risk IPRisk
    json.NewDecoder(resp.Body).Decode(&risk)
    risk.CachedAt = time.Now()
    cache.Store(ip, &risk)
    return &risk
}

func botDetection(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ip, _, _ := net.SplitHostPort(r.RemoteAddr)
        if fwd := r.Header.Get("X-Forwarded-For"); fwd != "" {
            ip = fwd
        }

        risk := getIPRisk(ip)
        if risk.RiskScore > 75 {
            http.Error(w, "Forbidden", http.StatusForbidden)
            return
        }

        next.ServeHTTP(w, r)
    })
}

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Hello, human!")
    })
    http.ListenAndServe(":8080", botDetection(mux))
}

For more detailed integration guides with caching, circuit breakers, and route-level protection, see our framework-specific guides: Next.js, Express.js, Django, Rails, Laravel, Go.

9. Advanced: Residential Proxies and Headless Browsers

The hardest bots to detect use residential proxy networks (real home IP addresses rented from services like Bright Data, Oxylabs, or SOAX) combined with headless browsers (Playwright, Puppeteer) that execute JavaScript and pass basic bot checks.

Detecting Residential Proxies

Residential proxies are hard because the IPs belong to real ISPs. Traditional datacenter detection won't catch them. However, residential proxy traffic has patterns:

Geographic inconsistency — IP geolocates to Lagos but browser timezone is set to America/New_York
Unusual usage patterns — A residential IP making 500 requests in 10 minutes isn't a human on their home Wi-Fi
Known proxy network ranges — Services like IPASIS maintain real-time databases of IPs currently being used as residential proxies
Port scanning signatures — Residential proxy connections often come through non-standard ports

Detecting Headless Browsers

Modern headless Chrome looks almost identical to real Chrome. But there are still detectable differences:

WebGL renderer — Headless Chrome reports SwiftShader or Google SwiftShader instead of a real GPU
navigator.webdriver — Set to true in automated mode (though sophisticated bots override this)
Missing plugins/codecs — Real Chrome has PDF viewer, Widevine, and codec support that headless often lacks
Canvas/WebGL fingerprint anomalies — Headless rendering produces subtly different fingerprints than real GPU-accelerated rendering
CDP artifacts — Chrome DevTools Protocol connections leave traces in window.__cdp and Runtime.evaluate patterns

The best defense against sophisticated bots is layered detection: combine IP intelligence (catches 90%+ of bots) with behavioral analysis (catches the remaining sophisticated ones) and device fingerprinting (catches multi-account fraud). No single signal catches everything.

10. Monitoring and Continuous Improvement

Bot defense isn't "set and forget." Attackers adapt. New proxy networks appear. Bypass techniques evolve. You need monitoring to know when your defenses are working and when they're not.

Key Metrics to Track

Bot-to-human ratio — What percentage of your traffic is bots? Track this weekly. If it's rising, your defenses are being evaded.
False positive rate — How many legitimate users are being blocked? Monitor support tickets and conversion drops after deploying new rules.
Block rate by source — Which ASNs, countries, or IP ranges are generating the most blocked requests? This tells you where attacks originate.
Request patterns — Track requests per second by endpoint. Sudden spikes on /login or /api/signup indicate targeted attacks.
CAPTCHA solve rate — If 99% of CAPTCHAs are solved instantly, bots are using AI solvers and your CAPTCHA isn't helping.

// Structured logging for bot detection events
const logBotEvent = (ip, risk, action, endpoint) => {
  console.log(JSON.stringify({
    timestamp: new Date().toISOString(),
    event: 'bot_detection',
    ip,
    risk_score: risk.risk_score,
    is_vpn: risk.is_vpn,
    is_datacenter: risk.is_datacenter,
    is_tor: risk.is_tor,
    action, // 'allow', 'challenge', 'block'
    endpoint,
    country: risk.country,
    asn: risk.asn?.name,
  }));
};

// Dashboard query (e.g., in Grafana/Loki)
// sum(rate({event="bot_detection", action="block"}[5m])) by (asn)
// Shows block rate per ASN — identifies attack sources

Putting It All Together: The Bot Defense Stack

The most effective bot protection uses multiple layers, each catching what the previous one misses:

CDN / WAF Layer

DDoS protection, known bot signatures, basic rate limiting. Stops ~30% of bot traffic.

IP Intelligence Layer

Real-time risk scoring, VPN/proxy/Tor detection, abuse history. Stops ~60% of remaining bots. This is the highest-ROI layer.

Behavioral Analysis Layer

Request patterns, form timing, honeypots, navigation analysis. Catches sophisticated bots using residential proxies.

Challenge Layer (Risk-Based CAPTCHA)

Only shown to medium-risk traffic. Reduces CAPTCHA challenges by 80% while maintaining protection.

Start with IP intelligence — it's the single most effective technique with the least user friction. One API call per request tells you more about a visitor than any number of CAPTCHA challenges or User-Agent checks. Layer in behavioral analysis and CAPTCHAs for the traffic that gets past the first filter.

Stop Bot Traffic in Under 10 Minutes

IPASIS provides real-time IP intelligence, VPN/proxy detection, and risk scoring in a single API call. 1,000 free lookups/month. No credit card required.

Start Free Trial API Docs

How to Stop Bot Traffic
on Your Website