| Oct | NOV | Dec |
| 01 | ||
| 2019 | 2020 | 2021 |
COLLECTED BY
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
Collection: Archive Team: URLs
| Linux | macOS | Windows | |
|---|---|---|---|
| Chromium 86.0.4238.0 | |||
| WebKit 14.0 | |||
| Firefox 80.0b8 |
pip install playwright python -m playwright installThis installs Playwright and browser binaries for Chromium, Firefox and WebKit. Playwright requires Python 3.7+.
# Pass --help to see all options
python -m playwright codegen
Playwright offers both sync (blocking) API and async API. They are identical in terms of capabilities and only differ in how one consumes the API.
from playwright import sync_playwright with sync_playwright() as p: for browser_type in [p.chromium, p.firefox, p.webkit]: browser = browser_type.launch() page = browser.newPage() page.goto('http://whatsmyuseragent.org/') page.screenshot(path=f'example-{browser_type.name}.png') browser.close()
import asyncio from playwright import async_playwright async def main(): async with async_playwright() as p: for browser_type in [p.chromium, p.firefox, p.webkit]: browser = await browser_type.launch() page = await browser.newPage() await page.goto('http://whatsmyuseragent.org/') await page.screenshot(path=f'example-{browser_type.name}.png') await browser.close() asyncio.get_event_loop().run_until_complete(main())
def test_playwright_is_visible_on_google(page): page.goto("https://www.google.com") page.type("input[name=q]", "Playwright GitHub") page.click("input[type=submit]") page.waitForSelector("text=microsoft/Playwright")
>>> from playwright import sync_playwright >>> playwright = sync_playwright().start() # Use playwright.chromium, playwright.firefox or playwright.webkit # Pass headless=False to see the browser UI >>> browser = playwright.chromium.launch() >>> page = browser.newPage() >>> page.goto("http://whatsmyuseragent.org/") >>> page.screenshot(path="example.png") >>> browser.close() >>> playwright.stop()
from playwright import sync_playwright with sync_playwright() as p: iphone_11 = p.devices['iPhone 11 Pro'] browser = p.webkit.launch(headless=False) context = browser.newContext( **iphone_11, locale='en-US', geolocation={ 'longitude': 12.492507, 'latitude': 41.889938 }, permissions=['geolocation'] ) page = context.newPage() page.goto('https://maps.google.com') page.click('text="Your location"') page.screenshot(path='colosseum-iphone.png') browser.close()
import asyncio from playwright import async_playwright async def main(): async with async_playwright() as p: iphone_11 = p.devices['iPhone 11 Pro'] browser = await p.webkit.launch(headless=False) context = await browser.newContext( **iphone_11, locale='en-US', geolocation={ 'longitude': 12.492507, 'latitude': 41.889938 }, permissions=['geolocation'] ) page = await context.newPage() await page.goto('https://maps.google.com') await page.click('text="Your location"') await page.screenshot(path='colosseum-iphone.png') await browser.close() asyncio.get_event_loop().run_until_complete(main())
from playwright import sync_playwright with sync_playwright() as p: browser = p.firefox.launch() page = browser.newPage() page.goto('https://www.example.com/') dimensions = page.evaluate('''() => { return { width: document.documentElement.clientWidth, height: document.documentElement.clientHeight, deviceScaleFactor: window.devicePixelRatio } }''') print(dimensions) browser.close()
import asyncio from playwright import async_playwright async def main(): async with async_playwright() as p: browser = await p.firefox.launch() page = await browser.newPage() await page.goto('https://www.example.com/') dimensions = await page.evaluate('''() => { return { width: document.documentElement.clientWidth, height: document.documentElement.clientHeight, deviceScaleFactor: window.devicePixelRatio } }''') print(dimensions) await browser.close() asyncio.get_event_loop().run_until_complete(main())
from playwright import sync_playwright with sync_playwright() as p: browser = p.chromium.launch() page = browser.newPage() def log_and_continue_request(route, request): print(request.url) route.continue_() # Log and continue all network requests page.route('**', lambda route, request: log_and_continue_request(route, request)) page.goto('http://todomvc.com') browser.close()
import asyncio from playwright import async_playwright async def main(): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.newPage() def log_and_continue_request(route, request): print(request.url) asyncio.create_task(route.continue_()) # Log and continue all network requests await page.route('**', lambda route, request: log_and_continue_request(route, request)) await page.goto('http://todomvc.com') await browser.close() asyncio.get_event_loop().run_until_complete(main())
camelCase instead of snake_case) for its methods. We recognize that this is not ideal, but it was done deliberately, so that you could rely upon Stack Overflow answers and existing documentation.
options parameter into every call as in the Node.js API. So when you see example like this in JavaScript
await webkit.launch({ headless: false });It translates into Python like this:
webkit.launch(headless=False)If you are using an IDE, it will suggest parameters that are available in every call.
page.evaluate accepts JavaScript functions, while this does not make any sense in the Python version.
In JavaScript it will be documented as:
const result = await page.evaluate(([x, y]) => { return Promise.resolve(x * y); }, [7, 8]); console.log(result); // prints "56"And in Python that would look like:
result = page.evaluate(""" ([x, y]) => { return Promise.resolve(x * y); }""", [7, 8]) print(result) # prints "56"The library will detect that what are passing it is a function and will invoke it with the given parameters. You can opt out of this function detection and pass
force_expr=True to all evaluate functions, but you probably will never need to do that.
page.waitFor* methods, we recommend using corresponding page.expect_* context manager.
In JavaScript it will be documented as:
const [ download ] = await Promise.all([ page.waitForEvent('download'), // <-- start waiting for the download page.click('button#delayed-download') // <-- perform the action that directly or indirectly initiates it. ]); const path = await download.path();And in Python that would look much simpler:
with page.expect_download() as download_info: page.click("button#delayed-download") download = download_info.value path = download.path()Similarly, for waiting for the network response:
const [response] = await Promise.all([ page.waitForResponse('**/api/fetch_data'), page.click('button#update'), ]);Becomes
with page.expect_response("**/api/fetch_data"): page.click("button#update")