proxy_pool

IP proxy pool used by spiders

结构示意图

抓取器(crawler.py、fetcher.py)
主要负责抓取指定IP代理网站的代理资源。大家可以自行增加待抓取IP代理网站的抓取方法，自定义的抓取方法必须以"crawl_"开头。抓取方法增加后，下次启动IP代理池时，将自动抓取这些代理网站的数据。fetcher.py主要负责调用crawler抓取器进行抓取，抓取之前会判断是否资源池的IP达到上限值。
数据管理器(db.py)
主要负责Redis连接、代理IP资源的存储、IP资源Ranking、排序等操作
检测器(proxy_checker.py)
主要负责以异步IO的方式对Redis中保存的代理进行有效性检测
API(api.py)
以Flask web api方式向外部提供代理数据。用户可以自定义服务方法。
调度器(scheduler.py)
主要负责综合调度功能，以进程方式启动各个模块进行工作。
配置文件(settings.py)
相关常量配置

Python3.6
Redis单独安装后，启动Redis服务
其他第三方库可以使用pip安装： pip install -r requirements.txt

安装完毕后，直接运行run.py文件即可

随机获取单个代理，eg：
import requests
r = requests.get("http://localhost:2018/get")
print(r.text)

按代理分数批量获取代理，eg.：
import requests
r = requests.get("http://localhost:2018/score/100")
proxies = r.text.split("<br>")[1:]
print(proxies)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
img		img
README.md		README.md
__init__.py		__init__.py
api.py		api.py
crawler.py		crawler.py
db.py		db.py
error.py		error.py
fetcher.py		fetcher.py
importer.py		importer.py
proxy_checker.py		proxy_checker.py
requirements.txt		requirements.txt
run.py		run.py
scheduler.py		scheduler.py
setting.py		setting.py
utils.py		utils.py