[toc]

0、爬取目标

爬取下图所列的动漫列表信息。
image.png

1、目标分析

当初始进入和点击更多时,会发送如下过程的接口请求。
image.png
image.png

2、请求组装

    # 一、请求组装
    # ---------------------------------------------
    # 1、url
    url = "https://movie.douban.com/j/new_search_subjects"

    # 2、参数
    params = {
        "sort": "U",
        "range": "0,10",
        "tags": "动漫",
        "start": 40
    }

    # 3、UA伪装  为了防止反爬虫,使自己的小爬虫伪装为浏览器
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
    }
    # ---------------------------------------------

3、发起请求

    # 2、发起请求
    # ---------------------------------------------
    response = requests.get(url=url, params=params, headers=headers)
    # ---------------------------------------------

4、响应体解析

    # 3、响应体解析
    # ---------------------------------------------
    response_json = response.json()
    # ---------------------------------------------

5、持久化

    # 4、持久化
    # ---------------------------------------------
    # 持久化文件路径和文件名称
    file_url = "./"
    file_name = "动漫.json"
    save_url = file_url + file_name
    with open(save_url, "w", encoding="utf-8") as fs:
        json.dump(response_json, fs, ensure_ascii=False)
    print("爬取成功^v^")
    # ---------------------------------------------

6、整体代码

#!/usr/bin/env python
# _*_ coding: utf-8 _*_
import requests
import json

if __name__ == '__main__':
    # 一、请求组装
    # ---------------------------------------------
    # 1、url
    url = "https://movie.douban.com/j/new_search_subjects"

    # 2、参数
    params = {
        "sort": "U",
        "range": "0,10",
        "tags": "动漫",
        "start": 40
    }

    # 3、UA伪装  为了防止反爬虫,使自己的小爬虫伪装为浏览器
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
    }
    # ---------------------------------------------

    # 2、发起请求
    # ---------------------------------------------
    response = requests.get(url=url, params=params, headers=headers)
    # ---------------------------------------------

    # 3、响应体解析
    # ---------------------------------------------
    response_json = response.json()
    # ---------------------------------------------

    # 4、持久化
    # ---------------------------------------------
    # 持久化文件路径和文件名称
    file_url = "./"
    file_name = "动漫.json"
    save_url = file_url + file_name
    with open(save_url, "w", encoding="utf-8") as fs:
        json.dump(response_json, fs, ensure_ascii=False)
    print("爬取成功^v^")
    # ---------------------------------------------

7、运行测试

image.png

image.png
image.png

Q.E.D.


只有创造,才是真正的享受,只有拚搏,才是充实的生活。