ipasis
Blog/Security Engineering

Edge Security Architecture: Detecting and Blocking Bot Traffic Before Origin

January 17, 20265 min read

Moving bot mitigation to the edge—utilizing CDNs, WAFs, or serverless edge workers—is no longer optional for high-scale applications. Processing malicious traffic at the origin wastes bandwidth, consumes compute resources, and increases the attack surface for DDoS vectors. This guide details the architecture and implementation of bot detection strategies at the network edge.

The Edge Mitigation Architecture

Effective edge security relies on layers of filtration that occur before a request reaches your primary infrastructure. The goal is to filter noise (automated scripts, scrapers, credential stuffers) while minimizing latency for legitimate users.

  1. Static Analysis: Blocking known bad user-agents and inconsistent headers.
  2. Behavioral Analysis: Rate limiting based on IP or session tokens.
  3. IP Intelligence: Querying external datasets to identify datacenter IPs, proxies, and known threat actors.

1. Header Validation and Fingerprinting

Basic scripts often fail to emulate a full browser TLS handshake or HTTP header order. At the edge, you should inspect:

  • TLS JA3 Fingerprints: diverse clients (cURL vs Chrome) have distinct SSL/TLS handshake signatures.
  • HTTP Header Ordering: Browsers send headers in specific orders. Deviations suggest automation.
  • User-Agent: While easily spoofed, empty or malformed user-agents should be dropped immediately.

2. Real-time IP Reputation Analysis

Static rules fail against sophisticated bots rotating through residential proxies. To catch these, you must integrate IP intelligence. This involves checking the incoming request's IP address against a reputation database to determine if it is a VPN, Proxy, or Tor exit node.

Implementation: Cloudflare Worker (JavaScript)

The following example demonstrates a Cloudflare Worker that intercepts a request, queries the IPASIS API to check the IP's reputation, and blocks high-risk traffic.

export default {
  async fetch(request, env, ctx) {
    const clientIP = request.headers.get('CF-Connecting-IP');
    const API_KEY = env.IPASIS_API_KEY;

    // 1. Allowlist specific paths (e.g., static assets)
    const url = new URL(request.url);
    if (url.pathname.match(/.(jpg|css|js|png)$/)) {
      return fetch(request);
    }

    // 2. Query IPASIS Intelligence API
    try {
      const response = await fetch(`https://api.ipasis.com/json/${clientIP}?key=${API_KEY}`);
      const data = await response.json();

      // 3. Logic: Block Data Center IPs and known Proxies
      // 'is_crawler' usually denotes benign bots (Googlebot), so we exclude those from blocking if verified.
      if (data.is_proxy || (data.is_datacenter && !data.is_crawler)) {
        return new Response(JSON.stringify({
          error: 'Access Denied',
          reason: 'Automated traffic detected via Proxy/Datacenter IP',
          ip: clientIP
        }), { status: 403, headers: { 'Content-Type': 'application/json' } });
      }
    } catch (err) {
      // Fail open to prevent blocking legitimate traffic during API outages
      console.error("IP Intelligence Lookup Failed", err);
    }

    // 4. Forward to Origin
    return fetch(request);
  }
};

3. Rate Limiting with State

Blocking IPs is binary. Rate limiting handles the gray area. Implement a sliding window counter (using Redis or Edge Key-Value storage) based on the IP address.

  • Standard Users: 60 requests / minute.
  • Datacenter IPs (if allowed): 10 requests / minute.

If a user exceeds the threshold, return a 429 Too Many Requests or serve a cryptographic challenge (CAPTCHA).

FAQ

Q: Will querying an IP API add significant latency?

A: It depends on the provider. IPASIS is designed for sub-millisecond lookups. Furthermore, you can cache the IP analysis result in your edge cache (KV store) for 5-10 minutes to eliminate network calls for repeat requests from the same IP.

Q: Should I block all Datacenter IPs?

A: For B2C applications, blocking datacenter IPs is often safe, as human users typically arrive via ISP (Residential) or Mobile networks. However, ensure you whitelist verified crawler bots (Google, Bing) to preserve SEO.

Q: How do I handle false positives?

A: Never block "suspected" traffic silently. Either serve a 403 with a clear error code or, preferably, redirect to a challenge page. If the user solves the challenge, issue a temporary signed cookie bypassing the checks.

Secure Your Perimeter with IPASIS

Building your own bot detection dataset is resource-intensive and prone to error. IPASIS provides enterprise-grade IP intelligence, allowing you to detect VPNs, proxies, and tor nodes with a single API call.

Stop bots before they hit your database. Get your free API Key today and start filtering traffic at the edge.

Start detecting VPNs and Bots today.

Identify anonymized traffic instantly with IPASIS.

Get API Key