300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > 【Python爬虫】requests爬取新浪微博评论代码

【Python爬虫】requests爬取新浪微博评论代码

时间:2024-06-30 22:18:59

相关推荐

【Python爬虫】requests爬取新浪微博评论代码

环境:WIN10+Python3.6

# 完整爬取微博评论程序,只需要修改微博id即可import requestsimport jsonimport re#爬取微博评论写入weibo_comment.txtdef get_comment(weibo_id, url, headers, number):count = 0fp = open("weibo_comment_"+str(weibo_id)+".txt", "a", encoding="utf8")#判断爬取数目是否足够while count<number:#判断是否是第一组,第一组不加max_idif count == 0:print('是第一组')try:url = url + weibo_id + '&mid=' + weibo_id +'&max_id_type=0'web_data = requests.get(url, headers = headers)js_con = web_data.json()#获取连接下一页评论的max_idmax_id = js_con['data']['max_id']print(max_id)comments_list = js_con['data']['data']for commment_item in comments_list:comment = commment_item["text"]#删除表情符号label_filter = pile(r'</?\w+[^>]*>', re.S)comment = re.sub(label_filter, '', comment)fp.write(comment)count += 1print("已获取"+str(count)+"条评论。")except Exception as e:print(str(count) + "遇到异常")continueelse:print('不是第一组')try:url = url + weibo_id + 'max_id=' + str(max_id) + '&max_id_type=0'web_data = requests.get(url, headers = headers)js_con = web_data.json()#获取连接下一页评论的max_idmax_id = js_con['data']['max_id']comments_list = js_con['data']['data']for commment_item in comments_list:comment = commment_item["text"]#删除表情符号label_filter = pile(r'</?\w+[^>]*>', re.S)comment = re.sub(label_filter, '', comment)fp.write(comment)count += 1print("已获取"+str(count)+"条评论。")except Exception as e:print(str(count) + "遇到异常")continuefp.close()if __name__ == "__main__":headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}url = '/comments/hotflow?id='weibo_id = '4363505468007923' #微博idnumber = 400 #爬取评论量get_comment(weibo_id,url,headers,number)

欢迎关注,一起学习。有用点个赞吧!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。