300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > 猫眼电影排行榜python爬虫

猫眼电影排行榜python爬虫

时间:2023-11-10 18:21:11

相关推荐

猫眼电影排行榜python爬虫

文章目录

前言二、使用步骤1.引入库2.读入数据总结

前言

提示:这里可以添加本文要记录的大概内容:

例如:随着人工智能的不断发展,机器学习这门技术也越来越重要,很多人都开启了学习机器学习,本文就介绍了机器学习的基础内容。


提示:以下是本篇文章正文内容,下面案例可供参考

二、使用步骤

1.引入库

代码如下(示例):

import requestsfrom lxml import etreeimport re

2.读入数据

代码如下(示例):

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36','Cookie': '__mta=143503882.1566656101761.1589157375069.1589157380961.135; _lxsdk_cuid=16cc3fafcbbc8-0893dde580380a-7373e61-144000-16cc3fafcbbc8; mojo-uuid=a783d1fd086922e6ef4c93ad91960d38; t_lxid=171e755494cc8-07d4f3607db09-d373666-144000-171e755494cc8-tid; uuid_n_v=v1; uuid=906BF3F0931D11EAAA7FE134502C33729B2886C38FEB40C6A3FDE6D09F483FEB; _csrf=cac7bf273f4e8e3f6191512bba148d9a845f52645e3154214e96e7d8145479f2; mojo-session-id={"id":"3a8290611cd60fc73e93addcd0af2e12","time":1589156578410}; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1588937241,1588991422,1589034257,1589156578; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; _lxsdk=906BF3F0931D11EAAA7FE134502C33729B2886C38FEB40C6A3FDE6D09F483FEB; __mta=143503882.1566656101761.1589010781060.1589157341889.132; mojo-trace-id=20; Hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1589157381; _lxsdk_s=17d04ce-44-e08-034%7C%7C29',}def get_html_and_parse():url = '/board/4?offset=30'response = requests.get(url, headers=headers)response.encoding='utf-8'html = response.textselector = etree.HTML(html)dds = selector.xpath("//dl[@class='board-wrapper']/dd")for dd in dds:detail_url = ''+dd.xpath("a/@href")[0]score = dd.xpath("div[@class='board-item-main']/div[@class='board-item-content']/div[@class='taobao-item-number score-num']/p[@class='score']/i/text()")score = ''.join(score)yield (detail_url, score)def get_item(detail_url, score):response = requests.get(detail_url, headers=headers)response.encoding = 'utf-8'html = response.textselector = etree.HTML(html)name = selector.xpath("//div[@class='taobao-brief-container']/h1[@class='name']/text()")[0]type = selector.xpath("//div[@class='taobao-brief-container']/ul/li[@class='ellipsis'][1]/a[@class='text-link']/text()")type = ''.join(type)country_minute = selector.xpath("//div[@class='taobao-brief-container']/ul/li[@class='ellipsis'][2]/text()")[0].strip().replace('\n', '').split('/')country = country_minute[0].strip()minute = country_minute[1].strip()date_place = selector.xpath("//div[@class='taobao-brief-container']/ul/li[@class='ellipsis'][3]/text()")[0]date = re.search('.*\d', date_place).group()place = re.sub('.*\d', '', date_place)introduce = selector.xpath("//div[@class='module'][1]/div[@class='mod-content']/span/text()")[0]return (name,score, type,country, minute,date,place,introduce)import csvdef write_csv(item):with open('data.csv', 'a', encoding='gbk', newline='') as f:write = csv.writer(f)write.writerow(item)print('success')for url, score in get_html_and_parse():item = get_item(url,score)write_csv(item)


总结

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。