新闻电讯爬虫。 http://mrdx.cn/

天问 fbc7c0524c Update 'README.md' 1 year ago
bin fc64b0de08 优化项目结构 2 years ago
conf 6119ef84b5 0 1 year ago
crawl_mrdx 6119ef84b5 0 1 year ago
data efdb552c5a init 5 years ago
screenshot fc64b0de08 优化项目结构 2 years ago
shell f55227ae78 开启打印 5 years ago
test 3d95ef6752 更改起步日期 5 years ago
utils 7d32cc1fea 修改 README.md 5 years ago
.gitignore b51d93929f add build 1 year ago
LICENSE 9bdfd1ceee add license 2 years ago
README.md fbc7c0524c Update 'README.md' 1 year ago
main.py 6119ef84b5 0 1 year ago
main.spec b51d93929f add build 1 year ago
requirements.txt b51d93929f add build 1 year ago

README.md

新闻电讯爬虫

初步完成,单线程,文明爬虫(每次爬虫1-3s休息)。

cd my_project_dir
virtualenv -p /opt/python/3.8.5/bin/python3 .venv
source .venv/bin/activate
pip install -r requirements.txt

# method 1
python main.py --start 20230822 --end 20230823

# method 2,先配置 conf/config.json
python main.py

Ubuntu 打包:

pip install pyinstaller
pyinstaller -F -c main.py

截图

目前下载到 ./data/20130822/07.pdf ,2275天的资讯日报,总共16G。

History

python main.py --start 20220822 --end 20230823