Skip to main content

Command Palette

Search for a command to run...

How To By-Pass Cloudflare While Scraping?

Updated
2 min read
How To By-Pass Cloudflare While Scraping?
R

My name is Ronnie & a fellow geek like you😎. I am passionately tech curious.

I blog, tweet, write, code, vlog, discuss and eat anything about tech.

😍 Feel Free To Connect 😍

Hey 👋, welcome to my little world! Let's bypass Cloudflare easily with Python!

While scraping websites you may come across some sites that are using Cloudflare protections that make them much more difficult to scrape like Opensea and you can't directly scrape their content.

CLOUDF.gif

Today, we shall use the cloudscraper package that is available on PyPI and with this tool, we are able to bypass Cloudflare.

🔸 What is Cloudflare?

Cloudflare, Inc. is an American web infrastructure and website security company that provides content delivery network and DDoS mitigation services.

image.png Its services occur between a website's visitor and the Cloudflare customer's hosting provider, acting as a reverse proxy for websites.

🔸 The Code

We shall demonstrate this on Opensea NFT Collection Stats page.

pip install cloudscraper

A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Requests.

I love this package because it is actively being updated and developed & it has over 1600 Stargazers!

Let's import them in our newly created cloudpass.py file

from bs4 import BeautifulSoup as beauty
import cloudscraper

Let's create a cloud scraper instance and define our target URL;

scraper = cloudscraper.create_scraper(delay=10, browser='chrome') 
url = "https://opensea.io/rankings"

We initialised it with a browser argument and a delay time period, you can omit these.

Let's now scrape our target;

info = scraper.get(url).text
soup = beauty(info, "html.parser")
soup = soup.find_all('script')

The text method returns the text from the scraper response, we then create a soup using html.parser in order to be able to find the particular data where our data is residing.

And finally, we can loop through to get our scraped data which is a nested dictionary. I will be writing another blog on how you can later dump this into a CSV file.

for data in soup:
    print(data.get_text())

GitHub Repo:

That's it!

🔸 Conclusion

Once again, hope you learned something today from my little closet.

Please consider subscribing or following me for related content, especially about Tech, Python & General Programming.

You can show extra love by buying me a coffee to support this free content and I am also open to partnerships, technical writing roles, collaborations and Python-related training or roles.

Buy Ronnie A Coffee 📢 You can also follow me on Twitter : ♥ ♥ Waiting for you! 🙂
M

Thanks for the help, great article

P

Hi Ronnie, great article but unluckily this solution doesn't always work, because of the constant update of Cloudflare anti-bot and different configuration of websites. I would suggest a headful configuration with Playwright to mimic a real user. Some examples can be found on my blog, The Web Scraping Club.

S

please send me a link

P

Sammy Hamdani https://substack.thewebscraping.club/p/scraping-cloudflare-websites-2023-q1-update?utm_source=%2Fsearch%2Fcloduflare&utm_medium=reader2

S

Hi Ronnie - should this work against any site using CloudFlare? It's not working against URLs on Faire.com for me.

F
Flo Rider4y ago

cloudscraper error.png Here's the screenshot :)

R

I think you may not use the command line to directly create the scraper because Cloudflare automatically detects headless scripts..

sorry delayed to reply this

F
Flo Rider4y ago

Hi Ronnie, very nice article! I tried running your script on my computer (windows) but I'm still getting denied access. Do you know why? Thanks!

R

Can I see a screenshot, please?

More from this blog

R

Ronnie Atuhaire's Blog 🤓

156 posts

I blog, tweet, write, code, vlog, discuss and eat anything about tech.

Well am a Pythonista and a Tech-Virtuoso. Much welcome to my blog✌.