πŸ‡¨πŸ‡¦VancouverπŸ‡¨πŸ‡¦TorontoπŸ‡ΊπŸ‡ΈLos AngelesπŸ‡ΊπŸ‡ΈOrlandoπŸ‡ΊπŸ‡ΈMiami
1-855-KOO-TECH
KootechnikelKootechnikel
Insights Β· Field notes from the SOC
Plain-language briefings from the people watching the alerts.
Weekly Β· No spam
Back to News
Data, Privacy & Trust InfrastructureIndustry

Anubis Open Source AI Bot Firewall Gains Traction Among Small Web Operators

AuthorZe Research Writer
Published
Read Time8 min read
Views0
Anubis Open Source AI Bot Firewall Gains Traction Among Small Web Operators

Anubis Open Source AI Bot Firewall Gains Traction Among Small Web Operators

Anubis, an open source proof-of-work bot firewall developed by Xe Iaso and Techaro, emerges as a defensive tool for small website operators seeking protection from aggressive AI crawler traffic.

## Executive Brief

Technical diagram showing vulnerability chain
Figure 1: Visual representation of the BeyondTrust vulnerability chain

Executive Brief

A new open source project called Anubis has emerged as a defensive tool for website operators seeking protection from the growing volume of AI crawler and scraper traffic. Developed by Xe Iaso under the Techaro organization, Anubis functions as a "Web AI Firewall Utility" that implements proof-of-work challenges to filter automated requests from legitimate human visitors.

The project, released under the MIT license, positions itself as a lightweight solution specifically designed for operators of smaller websites and community platforms who lack the resources to deploy enterprise-grade bot management systems. According to the project documentation, Anubis "weighs the soul of your connection using one or more challenges in order to protect upstream resources from scraper bots."

Website operators affected by aggressive AI training data collection have expressed interest in the tool, with the project gaining significant attention on developer forums. The Hacker News discussion thread accumulated 319 points and 208 comments as of April 12, 2025, indicating substantial community engagement with the topic.

The tool acknowledges its aggressive approach in its own documentation, describing itself as "a bit of a nuclear response" that will block smaller scrapers and may inhibit beneficial automated services like the Internet Archive. Operators can configure bot policy definitions to explicitly allowlist specific crawlers they wish to permit.

Sponsors listed on the project include Raptor Computing Systems, Databento, Distrust, and Terminal Trove, suggesting early commercial interest in supporting the development effort.

What Happened

Xe Iaso, a developer known for technical writing and systems programming work, released Anubis through the Techaro organization on GitHub. The project appeared on Hacker News on April 12, 2025, where it generated substantial discussion about the ethics and practicality of blocking AI crawlers.

The timing coincides with ongoing industry debates about AI companies' data collection practices. Multiple website operators have reported significant increases in crawler traffic from AI training operations, with some describing the volume as disruptive to normal site operations.

Anubis implements a challenge-response system that presents visitors with a proof-of-work puzzle before granting access to protected resources. The challenge page displays the message "Making sure you're not a bot!" while the browser completes the computational task.

The project documentation indicates the tool is designed to be deployed as a reverse proxy in front of existing web applications. Configuration options allow operators to define policies for different types of automated traffic.

Authentication bypass flow diagram
Figure 2: How the authentication bypass vulnerability works

Key Claims and Evidence

The Anubis project makes several technical claims about its approach to bot filtering:

Lightweight Resource Usage: The documentation states the tool is designed to be "as lightweight as possible to ensure that everyone can afford to protect the communities closest to them." The project targets operators who cannot afford commercial bot management services.

Proof-of-Work Challenge System: Anubis implements computational challenges that require browsers to perform work before accessing protected content. The challenge difficulty can be configured based on operator preferences.

Configurable Bot Policies: The system supports policy definitions that allow operators to create allowlists for specific user agents or IP ranges. The documentation specifically mentions the Internet Archive as an example of a "good bot" that operators might want to permit.

Self-Hosting Capability: The tool is designed for self-hosting, giving operators full control over their bot filtering infrastructure without relying on third-party services.

Pros and Opportunities

Website operators gain a free, open source option for addressing AI crawler traffic without subscription costs or vendor lock-in. The MIT license permits modification and redistribution, allowing the community to adapt the tool for specific use cases.

Small community forums, personal blogs, and independent publishers can implement bot protection without the technical complexity or cost of enterprise solutions. The self-hosted nature of the tool means operators retain full control over their traffic filtering decisions.

The configurable policy system allows operators to make nuanced decisions about which automated traffic to permit. Organizations that wish to support legitimate archival efforts can explicitly allowlist services like the Internet Archive while blocking commercial AI scrapers.

Open source development enables community review of the filtering logic, providing transparency about how traffic decisions are made. Operators can audit the code to understand exactly what criteria determine whether a request is challenged.

Privilege escalation process
Figure 3: Privilege escalation from user to SYSTEM level

Cons, Risks, and Limitations

The project documentation explicitly acknowledges significant limitations. The tool is described as "a bit of a nuclear response" that will block legitimate automated services alongside unwanted scrapers.

Proof-of-work challenges impose computational costs on all visitors, including legitimate users. Devices with limited processing power, such as older smartphones or low-end computers, may experience delays or failures when completing challenges.

Accessibility concerns arise from challenge-based systems. Users relying on assistive technologies or automated accessibility tools may encounter barriers when accessing protected content.

The approach does not distinguish between AI crawlers collecting training data and AI-powered services that users might intentionally invoke, such as AI assistants helping users navigate websites.

False positives remain a risk. Legitimate users behind corporate proxies, VPNs, or unusual network configurations may be incorrectly flagged as automated traffic.

The tool requires technical expertise to deploy and configure correctly. Misconfiguration could result in blocking legitimate traffic or failing to stop unwanted crawlers.

How the Technology Works

Anubis operates as a reverse proxy that intercepts incoming HTTP requests before they reach the protected web application. When a request arrives, the system evaluates it against configured policies to determine whether to challenge the visitor.

For requests that require verification, Anubis serves a challenge page containing JavaScript code that performs proof-of-work computation. The browser must complete a computational puzzle, typically involving finding a hash value that meets certain criteria. The difficulty of this puzzle can be adjusted through configuration.

Once the browser completes the challenge, it submits the solution to Anubis, which verifies the work and issues a session token. Subsequent requests from the same browser session can proceed without additional challenges, provided the token remains valid.

The system supports multiple challenge types, with the default being a "metarefresh" challenge that uses browser-based computation. The challenge page includes metadata about the current Anubis version and challenge parameters.

Technical context (optional): The proof-of-work approach draws from concepts used in cryptocurrency mining and email spam prevention systems like Hashcash. By requiring computational work, the system increases the cost of making large numbers of requests, making bulk scraping economically less attractive while imposing minimal burden on individual human visitors.

Bot policy definitions allow operators to specify rules based on user agent strings, IP addresses, or other request characteristics. Requests matching allowlist rules bypass the challenge system entirely.

Industry Context

The emergence of tools like Anubis reflects broader tensions between website operators and AI companies over data collection practices. Multiple AI training operations have been documented crawling websites at high volumes, sometimes ignoring robots.txt directives or rate limiting requests.

Website operators have limited recourse when AI companies disregard voluntary compliance mechanisms. Legal frameworks around web scraping remain unsettled, with ongoing litigation in multiple jurisdictions examining the boundaries of permissible automated data collection.

The "small web" community, comprising personal blogs, independent forums, and community-run platforms, has been particularly vocal about the impact of AI crawler traffic. These operators often lack the technical resources or bandwidth to absorb high-volume automated requests.

Commercial bot management services exist but typically target enterprise customers with corresponding pricing. Anubis represents an attempt to provide similar functionality to operators who cannot afford commercial solutions.

The project's acknowledgment that it may block beneficial services like the Internet Archive highlights the difficulty of distinguishing between wanted and unwanted automated traffic. No technical solution can perfectly align with the diverse preferences of all website operators.

What Remains Unclear

The long-term effectiveness of proof-of-work challenges against determined scrapers remains unproven. Sophisticated operators could potentially solve challenges at scale using dedicated computing resources.

How AI companies will respond to widespread adoption of challenge-based systems is unknown. Technical countermeasures or legal challenges could emerge if such tools gain significant adoption.

The project's sustainability depends on continued volunteer development effort. The current sponsor list suggests some commercial support, but the long-term funding model for the project is not publicly documented.

Performance impact on protected websites under various traffic conditions has not been independently benchmarked. Operators considering deployment would benefit from testing in their specific environments.

The legal status of proof-of-work challenges as a bot filtering mechanism has not been tested in court. While blocking automated traffic is generally permissible, specific implementations could face legal scrutiny depending on jurisdiction and circumstances.

What to Watch Next

Adoption metrics for Anubis will indicate whether the tool addresses a genuine market need. GitHub stars, forks, and issue activity provide observable signals of community engagement.

Responses from AI companies to challenge-based filtering systems will shape the effectiveness of this approach. Technical adaptations or policy changes from major AI training operations would be significant developments.

The Internet Archive's position on proof-of-work challenges could influence how operators configure their allowlists. Statements from archival organizations about the impact of bot filtering on preservation efforts would be relevant.

Integration with existing web server software and content management systems would lower the barrier to adoption. Plugin or module development for popular platforms like WordPress, Nginx, or Caddy would expand the potential user base.

Community forks or alternative implementations could emerge if the project gains traction. The MIT license permits derivative works, and different operators may have different requirements for bot filtering logic.

Sources

  1. Anubis GitHub Repository - https://github.com/TecharoHQ/anubis (accessed April 12, 2025)
  2. Hacker News Discussion - https://news.ycombinator.com/item?id=43668433 (April 12, 2025)
  3. Techaro BotStopper Documentation - https://anubis.techaro.lol/docs/admin/botstopper (accessed April 12, 2025)

Sources & References

Related Topics

web-securityai-scrapingopen-sourcebot-protectionproof-of-work