mobile wallpaper 1mobile wallpaper 2mobile wallpaper 3mobile wallpaper 4
51 words
1 minute
Running pyspider on Windows 11 with Docker
2024-01-02

There were issues installing pyspider on Windows 11, with multiple errors occurring.

I found that the official website offers a Docker-based installation method.

Directly via Docker#

# mysql
docker run --name mysql -d -v /data/mysql:/var/lib/mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=yes mysql:latest
# rabbitmq
docker run --name rabbitmq -d rabbitmq:latest
# phantomjs
docker run --name phantomjs -d binux/pyspider:latest phantomjs
# result worker
docker run --name result_worker -m 128m -d --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider:latest result_worker
# processor, run multiple instance if needed.
docker run --name processor -m 256m -d --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider:latest processor
# fetcher, run multiple instance if needed.
docker run --name fetcher -m 256m -d --link phantomjs:phantomjs --link rabbitmq:rabbitmq binux/pyspider:latest fetcher --no-xmlrpc
# scheduler
docker run --name scheduler -d --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider:latest scheduler
# webui
docker run --name webui -m 256m -d -p 5000:5000 --link mysql:mysql --link rabbitmq:rabbitmq --link scheduler:scheduler --link phantomjs:phantomjs binux/pyspider:latest webui

Using docker-compose#

services:
phantomjs:
image: binux/pyspider:latest
command: phantomjs
result:
image: binux/pyspider:latest
external_links:
- mysql
- rabbitmq
command: result_worker
processor:
image: binux/pyspider:latest
external_links:
- mysql
- rabbitmq
command: processor
fetcher:
image: binux/pyspider:latest
external_links:
- rabbitmq
links:
- phantomjs
command : fetcher
scheduler:
image: binux/pyspider:latest
external_links:
- mysql
- rabbitmq
command: scheduler
webui:
image: binux/pyspider:latest
external_links:
- mysql
- rabbitmq
links:
- scheduler
- phantomjs
command: webui
ports:
- "5000:5000"

Then just run: docker-compose up -d After running successfully, if you visit http://localhost<5000>/ and see the content below, it indicates that pyspider is running successfully.

202401022235683.png

Share

If this article helped you, please share it with others!

Running pyspider on Windows 11 with Docker
https://dreaife.tokyo/en/posts/docker-pyspider-win/
Author
dreaife
Published at
2024-01-02
License
CC BY-NC-SA 4.0

Some information may be outdated

Related Posts Smart
1
Web Crawling Basics
spider A web crawler is an automated program used to obtain information from web pages. Its basic workflow includes sending HTTP requests to retrieve page source code, extracting the required data, and saving it. Since web pages are built from HTML, CSS, and JavaScript, crawlers need to handle both static and dynamic pages. Sessions and cookies maintain user state, while proxy servers can hide the real IP address. Common request methods include GET and POST, and response status codes indicate request results. Crawlers should follow anti-scraping constraints and use proxies and proper headers to improve efficiency.
2
Python Web Crawler Environment Setup
spider Setting up a Python web crawler environment includes installing Python 3, request libraries (such as requests and selenium), parsing libraries (such as lxml and beautifulsoup4), databases (such as MySQL and MongoDB), storage libraries (such as PyMySQL and PyMongo), web libraries (such as Flask and Tornado), app crawling tools (such as mitmproxy and appium), and crawler frameworks (such as pyspider and scrapy). Installation commands and notes for each library are provided in detail.
3
Learning Basic Spider Libraries
spider This article studies basic web scraping libraries, including Python urllib and requests. It introduces HTTP request construction, exception handling, URL parsing, regular expression usage, and how to extract information from the Maoyan movie ranking page. It also emphasizes advanced usage such as request headers, cookies, proxy settings, and session persistence.
4
Getting Started with Docker
infra Docker is a technology for solving microservice deployment problems by packaging applications and their dependencies into isolated containers, avoiding inconsistent environments and dependency conflicts. Compared with virtual machines, Docker starts faster and uses fewer resources. Its architecture includes images and containers, and users can share and obtain images through Docker Hub. Basic operations include creating and managing images and containers and using volumes for data persistence and host-container decoupling. Docker Compose can simplify distributed application deployment.
5
The First Round of Selection in the New Era
life With the development of AI technology, the cost of using advanced models may lead to social stratification, where only those with strong financial means can use these models. Although current prices are still acceptable, future price increases may make them unaffordable for most people, thus forming the first round of selection. The author feels anxious about this phenomenon, while also realizing that AI applications have moved beyond programming and into broader industries. Facing the challenges and opportunities of a new world, individuals continue to explore under the momentum of the times.

Table of Contents