scrapy安装和配置

scrapy官方文档

# 先进入虚拟环境
(py3scrapy)  ~  scrapy
Scrapy 2.1.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command
(py3scrapy)  ~/desktop/end/python  scrapy startproject Article
New Scrapy project 'Article', using template directory '/Users/scottxiong/python/virtualenv/py3scrapy/lib/python3.8/site-packages/scrapy/templates/project', created in:
    /Users/scottxiong/Desktop/end/python/Article

You can start your first spider with:
    cd Article
    scrapy genspider example example.com
cd Article
pycharm .
scrapy genspider cnblogs news.cnblogs.com
(py3scrapy)  ~/desktop/end/python/Article  scrapy genspider cnblogs news.cnblogs.com
Created spider 'cnblogs' using template 'basic' in module:
  Article.spiders.cnblogs

用pycharm打开之后,发现多了个文件 cnblogs.py, 但是pycharm并不能识别scrapy,原因是我们还没有手动添加支持scrapy的环境: cmd+, 搜索interpretor,手动添加虚拟环境中py3的可执行路径

# -*- coding: utf-8 -*-
import scrapy


class CnblogsSpider(scrapy.Spider):
    name = 'cnblogs'
    allowed_domains = ['news.cnblogs.com']
    start_urls = ['http://news.cnblogs.com/']

    def parse(self, response):
        pass

results matching ""

    No results matching ""