cloudflare-scrape
A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests. Python versions 2.6 - 3.7 are supported. Cloudflare changes their techniques periodically, so I will update this repo frequently.
This can be useful if you wish to scrape or crawl a website protected with Cloudflare. Cloudflare's anti-bot page currently just checks if the client supports JavaScript, though they may add additional techniques in the future.
Due to Cloudflare continually changing and hardening their protection page, cloudflare-scrape requires Node.js to solve JavaScript challenges. This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's JavaScript.
Note: This only works when regular Cloudflare anti-bots is enabled (the "Checking your browser before accessing..." loading page). If there is a reCAPTCHA challenge, you're out of luck. Thankfully, the JavaScript check page is much more common.
For reference, this is the default message Cloudflare uses for these sorts of pages:
Checking your browser before accessing website.com.
This process is automatic. Your browser will redirect to your requested content shortly.
Please allow up to 5 seconds...
Any script using cloudflare-scrape will sleep for 5 seconds for the first visit to any site with Cloudflare anti-bots enabled, though no delay will occur after the first request.
Installation
Simply run pip install cfscrape. You can upgrade with pip install -U cfscrape. The PyPI package is at https://pypi.python.org/pypi/cfscrape/
Alternatively, clone this repository and run python setup.py install.
Node.js dependency
Node.js version 10 or above is required to interpret Cloudflare's obfuscated JavaScript challenge.
Your machine may already have Node installed (check with node -v). If not, you can install it with apt-get install nodejs on Ubuntu >= 18.04 and Debian >= 9 and brew install node on macOS. Otherwise, you can get it from Node's download page or their package manager installation page.
Updates
Cloudflare regularly modifies their anti-bot protection page and improves their bot detection capabilities.
If you notice that the anti-bot page has changed, or if this module suddenly stops working, please create a GitHub issue so that I can update the code accordingly.
- Many issues are a result of users not updating to the latest release of this project. Before filing an issue, please run the following command to update cloudflare-scrape to the latest version:
pip install -U cfscrape
If you are still encountering a problem, create a GitHub issue and please include:
- The version number from
pip show cfscrape. - The relevant code snippet that's experiencing an issue or raising an exception.
- The full exception and traceback, if applicable.
- The URL of the Cloudflare-protected page which the script does not work on.
- A Pastebin or Gist containing the HTML source of the protected page.
If you've upgraded and are still experiencing problems, click here to create a GitHub issue and fill out the pertinent information.
Usage
The simplest way to use cloudflare-scrape is by calling create_scraper().
import cfscrape scraper = cfscrape.create_scraper() # returns a CloudflareScraper instance # Or: scraper = cfscrape.CloudflareScraper() # CloudflareScraper inherits from requests.Session print scraper.get("http://somesite.com").content # => "<!DOCTYPE html><html><head>..."

