ipasis
Blog/Security Engineering

Advanced Detection of Headless Browsers: Playwright & Puppeteer

February 13, 20264 min read

Modern bot detection requires moving beyond simple User-Agent blocking. Automation frameworks like Playwright and Puppeteer have evolved to mimic legitimate user behavior with increasing sophistication. However, they introduce subtle inconsistencies in the browser environment, network stack, and connection origins.

This guide outlines technical strategies for identifying headless browsers and automation tools.

JavaScript Environment Anomalies

The most direct method of detection lies in client-side execution. While flags like navigator.webdriver are easily overwritten by plugins (e.g., puppeteer-extra-plugin-stealth), deeper property consistency checks are harder to spoof.

1. Prototype and Property Overrides

When automation tools attempt to hide their presence, they often inject scripts to redefine native properties. You can detect these overrides by examining the string representation of native functions.

Node.js / Client-Side Logic:

function detectOverrides() {
    // Check if navigator.webdriver is true (basic check)
    if (navigator.webdriver) return true;

    // Check for inconsistent permissions
    // Headless Chrome often has inconsistent permission states compared to headful
    try {
        if (window.Notification.permission === 'denied' && 
            navigator.permissions.query({name: 'notifications'}).then(p => p.state === 'prompt')) {
            return true;
        }
    } catch (e) {}

    // Check if native functions have been monkey-patched
    const toString = Function.prototype.toString;
    const documentProto = Object.getPrototypeOf(document);
    
    // If the toString method itself is proxied, it's a red flag
    if (toString.call(documentProto.createElement).indexOf('native code') === -1) {
        return true;
    }

    return false;
}

2. Rendering Engine Consistency

Headless environments rely on software rendering (Mesa, SwiftShader) or specific GPU configurations that differ from standard consumer hardware. WebGL fingerprinting allows you to extract the renderer string.

If the renderer contains "SwiftShader" or "llvmpipe," the traffic is likely originating from a headless server environment rather than a user device.

TLS Fingerprinting (JA3)

Playwright and Puppeteer, by default, utilize the underlying Node.js or browser network stack. This results in a TLS handshake fingerprint (JA3) that differs significantly from a standard Chrome or Firefox instance running on Windows or macOS.

Automated scripts often lack the specific GREASE (Generate Random Extensions And Sustain Extensibility) values or header ordering found in legitimate browsers. Implementing JA3 hashing at your ingress controller or load balancer allows you to drop connections where the TLS signature matches known automation libraries rather than standard browser distributions.

IP Intelligence: The Infrastructure Layer

Even the most sophisticated stealth plugin cannot hide the origin of the request. Automation at scale requires IP rotation. Attackers utilize two primary proxy types:

  1. Datacenter IPs: AWS, DigitalOcean, Linode. These should rarely browse consumer endpoints.
  2. Residential Proxies: Rotated IPs from legitimate ISPs.

Using IP intelligence is the most performant way to filter automation before it executes JavaScript on your backend.

Python Example (IPASIS API):

import requests

def check_ip_reputation(client_ip):
    # Query IPASIS for detailed IP metadata
    response = requests.get(f"https://api.ipasis.com/v1/{client_ip}")
    data = response.json()

    # Block if the IP is a known proxy or belongs to a datacenter
    if data['is_proxy'] or data['is_datacenter']:
        return {
            "allow": False,
            "reason": "Automated infrastructure detected"
        }
    
    return {"allow": True}

# Example usage
result = check_ip_reputation("45.33.12.1")
if not result['allow']:
    print(f"Access Denied: {result['reason']}")

Behavioral Analysis

Finally, analyze event triggers. Humans generate mousemove and keydown events with entropy (randomness). Automation tools generate:

  • Straight-line cursor movements.
  • Instantaneous clicks (0ms duration).
  • Bursts of input events faster than humanly possible.

FAQ

Q: Can puppeteer-extra-plugin-stealth bypass all checks? A: No. While it patches common leaks like navigator.webdriver, it struggles to mask deep TLS inconsistencies, font rendering discrepancies, and the IP reputation of the source connection.

Q: Should I block all headless browsers? A: It depends on your business model. If you are an SEO-heavy site, you must whitelist legitimate crawlers (Googlebot, Bingbot) via reverse DNS verification while blocking unauthorized automation.

Q: How effective is CAPTCHA against Playwright? A: Modern solvers (AI-based or human click farms) can bypass CAPTCHAs. Passive detection (fingerprinting + IP analysis) is preferred as it adds friction without degrading UX.

Secure Your Infrastructure with IPASIS

Code-level detection is an arms race. The most reliable signal remains the network origin. IPASIS provides enterprise-grade IP intelligence to detect VPNs, proxies, and datacenter traffic used by automation networks.

Integrate the IPASIS API today to block bots at the edge.

Start detecting VPNs and Bots today.

Identify anonymized traffic instantly with IPASIS.

Get API Key