## 新闻电讯爬虫 [![Version](https://img.shields.io/badge/version-v1.1.0-brightgreen)](https://git.yoqi.me/lyq/crawl_mrdx) [![.Python](https://img.shields.io/badge/Python-v3.8.5-brightgreen?style=plastic)](https://git.yoqi.me/lyq/crawl_mrdx) 初步完成,单线程,文明爬虫(每次爬虫1-3s休息)。 ``` cd my_project_dir virtualenv -p /opt/python/3.8.5/bin/python3 .venv source .venv/bin/activate pip install -r requirements.txt # method 1 python main.py --start 20230822 --end 20230823 # method 2,先配置 conf/config.json python main.py ``` Ubuntu 打包: ``` pip install pyinstaller pyinstaller -F -c main.py ``` docker 打包: ``` docker build -t jianboy/crawl_mrdx:v1.0.5 . docker run -it --rm -v /data/crawl-mrdx:/app jianboy/crawl_mrdx:v1.0.5 ``` ### 截图 ![](screenshot/1.jpg) 目前下载到 ./data/20130822/07.pdf ,2275天的资讯日报,总共16G。 ## History 2015年开始 ``` python main.py --start 20210822 --end 20220822 python main.py --start 20220822 --end 20230418 ``` 之后不生成pdf版本了 ## License Licensed under the [Apache 2.0](LICENSE) © [liuyuqi.gov@msn.cn](https://github.com/jianboy)