scrapy安装和配置
# 先进入虚拟环境
(py3scrapy) ~ scrapy
Scrapy 2.1.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
(py3scrapy) ~/desktop/end/python scrapy startproject Article
New Scrapy project 'Article', using template directory '/Users/scottxiong/python/virtualenv/py3scrapy/lib/python3.8/site-packages/scrapy/templates/project', created in:
/Users/scottxiong/Desktop/end/python/Article
You can start your first spider with:
cd Article
scrapy genspider example example.com
cd Article
pycharm .
scrapy genspider cnblogs news.cnblogs.com
(py3scrapy) ~/desktop/end/python/Article scrapy genspider cnblogs news.cnblogs.com
Created spider 'cnblogs' using template 'basic' in module:
Article.spiders.cnblogs
用pycharm打开之后,发现多了个文件 cnblogs.py
, 但是pycharm并不能识别scrapy,原因是我们还没有手动添加支持scrapy的环境: cmd+,
搜索interpretor,手动添加虚拟环境中py3的可执行路径
# -*- coding: utf-8 -*-
import scrapy
class CnblogsSpider(scrapy.Spider):
name = 'cnblogs'
allowed_domains = ['news.cnblogs.com']
start_urls = ['http://news.cnblogs.com/']
def parse(self, response):
pass