语雀文档 一键导出 markdown

liuyuqi-dellpc ef6b7a42c8 add license 11 months ago
.github 80c43dc8da add github action 1 year ago
crawl_yuque e1273ea9cd fix error 11 months ago
.env.example 4512daab28 refactor: update CLI commands and improve argument parsing 11 months ago
.gitignore e1273ea9cd fix error 11 months ago
LICENSE ef6b7a42c8 add license 11 months ago
README.md e1273ea9cd fix error 11 months ago
crawl_yuque.spec 2fcffdc869 fix error 11 months ago
gui.py 86c87435ce add yunque 11 months ago
main.py e1273ea9cd fix error 11 months ago
main.ui 86c87435ce add yunque 11 months ago
poetry.lock 2fcffdc869 fix error 11 months ago
pyproject.toml 2fcffdc869 fix error 11 months ago
requirements.txt 2f39a31608 add yuque 1 year ago

README.md

crawl_yuque

语雀文档 一键导出 markdown

Develop

复制文档url,执行如下命令:

python main.py markdown -url https://www.yuque.com/burpheart/phpaudit

wget https://fileshare.yoqi.me/d/dl/c/Python/crawl_yuque/crawl_yuque
chmod +x crawl_yuque
./crawl_yuque markdown -url https://www.yuque.com/burpheart/phpaudit

私有文档配置 .env 文件,chrome 获取cookie填入即可,登录状态可以看到的项目都可以获取。

源码分析

运行 main.py,获取url参数调用requests获取源码,查找如下网页源码:

<script nonce=wJM6HFxGFWlvqbg5UT1h>
(function() {
  window.appData = JSON.parse(decodeURIComponent("%7B%22me%22%3A%7B%xxxx7D"));
})();
</script>

可以发现,云雀将内容存储在window.appData中,我们只需要将其转换为json格式,即可获取到所有的文章内容。

License

Licensed under the Apache 2.0 © liuyuqi.gov@msn.cn

Reference

目前有一些其他语言,如php,node 实现的采集工具,本项目实现的主要用途针对自己的项目,导出markdown文件,方便多平台同步。