Scrapy project

Repositories

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling hacktoberfest

Python 8,910 38,879 448 (32 issues need help) 293 Updated Nov 12, 2020
itemloaders

Library to populate items using XPath and CSS with a convenient API

Python BSD-3-Clause 5 20 12 3 Updated Nov 12, 2020
scrapyd-client

Command line client for Scrapyd server

Python 107 562 19 10 Updated Nov 11, 2020
itemadapter

Common interface for data container classes

python metadata python3 scrapy hacktoberfest python-dataclasses python-attrs

Python BSD-3-Clause 2 15 1 3 Updated Nov 11, 2020
w3lib

Python library of web-related functions

python

Python 84 313 12 13 Updated Nov 6, 2020
cssselect

CSS Selectors for Python

css python selectors

Python 44 228 14 12 Updated Nov 6, 2020
protego

A pure-Python robots.txt parser with support for modern conventions.

python robots-txt robots-parser

DIGITAL Command Language BSD-3-Clause 12 16 0 1 Updated Oct 14, 2020
scrapy.org

The scrapy.org website

html

HTML 146 39 2 5 Updated Oct 11, 2020
parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

css python xml scraping selectors xpath lxml

Python 90 584 22 (1 issue needs help) 8 Updated Oct 7, 2020
scrapy-bench

A CLI for benchmarking Scrapy.

python web-crawler scrapy command-line-tool benchmark-suite scrapy-bench

Python MIT 15 25 6 1 Updated Sep 21, 2020
queuelib

Collection of persistent (disk-based) queues

Python 45 201 2 5 Updated Aug 26, 2020
booksbot
Forked from stummjr/books_crawler
A crawler for http://books.toscrape.com

Python 674 29 0 2 Updated Aug 8, 2020
scrapyd

A service daemon to run Scrapy spiders

Python 509 2,160 102 24 Updated Jul 31, 2020
quotesbot

This is a sample Scrapy project for educational purposes

Python MIT 639 957 0 6 Updated Jul 13, 2020
scrapy-itemloader Archived

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

Python BSD-3-Clause 7 5 2 0 Updated May 5, 2020
scrapely

A pure-python HTML screen-scraping library

HTML 254 1,714 25 5 Updated Nov 28, 2019
loginform

Fill HTML login forms automatically

Python 69 234 9 2 Updated Oct 18, 2019
scurl

Performance-focused replacement for Python urllib

python cython chromium gurl urlparse

Python Apache-2.0 6 16 10 (1 issue needs help) 1 Updated Oct 2, 2018
url-chromium

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url

chromium gurl

C++ 2 0 0 0 Updated Aug 7, 2018
base-chromium

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

C++ 3 1 0 0 Updated Jul 31, 2018
dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]

Python 1,131 1,605 0 0 Updated Oct 27, 2017
scrapy-bench-speedcenter
Forked from Parth-Vader/scrapy-bench-speedcenter
Codespeed for scrapy-bench

Python 2 2 0 0 Updated Aug 28, 2017
pypydispatcher

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

Python 3 12 1 0 Updated Jul 3, 2017
slybot

60 221 5 0 Updated Apr 27, 2015
gsoc2014-integration-tests

GSoC2014 - Scrapy Integration tests project

Shell 3 3 0 0 Updated Mar 18, 2014

Top languages

Loading…

Most used topics

python hacktoberfest css scrapy selectors

Oct	NOV	Dec
	13
2019	2020	2021

Grow your team on GitHub

Repositories

scrapy-itemloader Archived

Top languages

Most used topics

People

Essential cookies

Always active

Analytics cookies