scrapy 安装
#先进入虚拟环境
workon py3scrapy
# 安装scrapy
pip3 install scrapy #安装很慢才几kb/s ,proxy或者豆瓣源开起来
安装好scrapy之后就可以使用了,注意如果退出虚拟环境就不能用了
(py3scrapy) ➜ ~ scrapy
Scrapy 1.6.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
scrapy startproject ArticleSpider
#created in:
# /Users/scottxiong/ArticleSpider
cd ArticleSpider
# scrapy genspider example example.com
scrapy genspider cnblogs news.cnblogs.com
启动scrapy
方法1: 终端启动spider
# 先进入scrapy项目中
scrapy crawl spiderName
方法2:用脚本启动spider
项目根目录下创建一个py文件,这里命名为main.py
# -*- coding: utf-8 -*-
__author__ = 'scott'
from scrapy.cmdline import execute
import sys
import os
# get project folder
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
execute(["scrapy","crawl","cnblogs"])