300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服

python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服

时间:2021-01-24 15:04:31

相关推荐

python3[爬虫实战] 爬虫之requests爬取新浪微博京东客服

爬取的内容为京东客服的微博及评论

思路:主要是通过手机端访问新浪微博的api接口,然后进行数据的筛选,

类似于这样的:/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40京东客服&featurecode=20000320

这个主要是登陆上去的微博的url链接,

也可以在

/signin/welcome?entry=mweibo&r=http%3A%2F%%2F

进行新浪微博的登陆,

可以看到的界面:

这里主要爬取的内容为:

说说,说说下面的评论条目

虽然很简单,但是,不得不说句mmp,爬取的过程很坎坷,现在是一直在ip上,另外,个人经过尝试,睡眠时间30秒一次也不是很好的效果, 睡眠10秒就足够了,可能该封你的ip还是会封的,我这问题应该封ip的情况

爬取的方法主要是通过手机端api进行json数据的获取,然后进行数据的提取。

这里可以使用火狐fox的插件使用:

主要api:

说说API:

第一条微博:

/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D京东客服&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478

第二条微博:

/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D京东客服&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page=2

类似于这样子的,

详情评论内容API:

在每条评论下会有一个idstr:4137390568546147

然后跳到评论详情页:

/status/4137390568546147

评论条目拼加方式:

/api/comments/show?id=4137390568546147&page=1

/api/comments/show?id=4137390568546147&page=2

带大家看一下评论api下返回的数据:JSON格式的

{"cardlistInfo": {"containerid": "1076035650743478","v_p": 42,"show_style": 1,"total": 3264,"page": 2},"cards": [{"card_type": 9,"itemid": "1076035650743478_-_4137858652321796","scheme": "/status/FfSSl9K0k?mblogid=FfSSl9K0k&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "2小时前","id": "4137858652321796","mid": "4137858652321796","idstr": "4137858652321796","text": "明天又要上班了,用四个字描述下你现在的心情吧<span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span> ​​​","textLength": 50,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg","bmiddle_pic": "/bmiddle/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg","original_pic": "/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 0,"comments_count": 4,"attitudes_count": 2,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfSSl9K0k","pics": [{"pid": "006apWvQgy1fi7tkjguy4j309q09qt8q","url": "/orj360/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg","size": "orj360","geo": {"width": "350","height": "350","croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7tkjguy4j309q09qt8q.jpg","geo": {"width": "350","height": "350","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4137692553365577","scheme": "/status/FfOyre7xv?mblogid=FfOyre7xv&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "13小时前","id": "4137692553365577","mid": "4137692553365577","idstr": "4137692553365577","text": "你觉得举办哪种《中国有_____》比赛,你能进入决赛? ​​​","textLength": 49,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg","bmiddle_pic": "/bmiddle/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg","original_pic": "/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 0,"comments_count": 13,"attitudes_count": 1,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfOyre7xv","pics": [{"pid": "006apWvQgy1fi7ul9n9rfj30k00lsgnj","url": "/orj360/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg","size": "orj360","geo": {"width": 360,"height": 392,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7ul9n9rfj30k00lsgnj.jpg","geo": {"width": "720","height": "784","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4137390568546147","scheme": "/status/FfGHmzRf5?mblogid=FfGHmzRf5&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "昨天 14:24","id": "4137390568546147","mid": "4137390568546147","idstr": "4137390568546147","text": "周末就是买买买,吃吃吃<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_huaixiao-bb5966dcc6.png\" style=\"width:1em;height:1em;\" alt=\"[坏笑]\"></span> ​​​","textLength": 28,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi7taijr9pg307e05kgvl.gif","bmiddle_pic": "/bmiddle/006apWvQgy1fi7taijr9pg307e05kgvl.gif","original_pic": "/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 0,"comments_count": 19,"attitudes_count": 1,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfGHmzRf5","pics": [{"pid": "006apWvQgy1fi7taijr9pg307e05kgvl","url": "/orj360/006apWvQgy1fi7taijr9pg307e05kgvl.gif","size": "orj360","geo": {"width": "266","height": "200","croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7taijr9pg307e05kgvl.gif","geo": {"width": "266","height": "200","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4137278329132849","scheme": "/status/FfDMkCjS1?mblogid=FfDMkCjS1&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "昨天 06:58","id": "4137278329132849","mid": "4137278329132849","idstr": "4137278329132849","text": "周六早呀,今天有比我起的还早的吗<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span> ​​​​","textLength": 47,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg","bmiddle_pic": "/bmiddle/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg","original_pic": "/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 0,"comments_count": 8,"attitudes_count": 2,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfDMkCjS1","pics": [{"pid": "006apWvQgy1fi7tiv5e5qj30dc0d5dfz","url": "/orj360/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg","size": "orj360","geo": {"width": 273,"height": 270,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7tiv5e5qj30dc0d5dfz.jpg","geo": {"width": "480","height": "473","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4137054743266182","scheme": "/status/FfxXIdHGm?mblogid=FfxXIdHGm&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-04","id": "4137054743266182","mid": "4137054743266182","idstr": "4137054743266182","text": "就问一句,这样人美心善的90后小哥你们要不要?<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span>","source": "微博 ","favorited": false,"user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"retweeted_status": {"created_at": "08-04","id": "4137016583280831","mid": "4137016583280831","idstr": "4137016583280831","text": "<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span><span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_tian-52ea252705.png\" style=\"width:1em;height:1em;\" alt=\"[舔屏]\"></span> <a data-url=\"/R9S6VWV\" href=\"/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=1&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707&ep=FfwYadLuD%2C1717871843%2CFfwYadLuD%2C1717871843\" data-hide=\"\"><span class=\"url-icon\"><img src=\"/upload//09/25/3/timeline_card_small_article_default.png\"></span></i><span class=\"surl-text\">90后小哥征婚启事</a> ​​​","textLength": 38,"source": "微博 ","favorited": false,"user": {"id": 1717871843,"screen_name": "京东","profile_image_url": "/crop.0.0.480.480.180/6664a4e3ly8fffaxrnv8fj20dc0dcmy4.jpg","profile_url": "/u/1717871843?uid=1717871843&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 19903,"verified": true,"verified_type": 2,"verified_type_ext": 50,"verified_reason": "京东网上商城","description": "中国最大的自营电商企业京东商城集团在线销售家电、数码通讯、电脑、家居百货、服装服饰、母婴、图书、食品等13大类数万个品牌上千万种优质商品。","gender": "m","mbtype": 12,"urank": 43,"mbrank": 5,"follow_me": false,"following": false,"followers_count": 4025036,"follow_count": 258,"cover_image_phone": "/crop.0.0.640.640.640/6664a4e3ly1fffb8torrtj20ku0ku409.jpg"},"reposts_count": 12,"comments_count": 24,"attitudes_count": 16,"isLongText": false,"visible": {"type": 0,"list_id": 0},"page_info": {"page_pic": {"url": "/crop.0.0.617.347.1000/6664a4e3ly1fi7khoua7dj20hk09nn45.jpg"},"page_url": "/article?object_id=1022%3A2309404137016584472707&url_type=39&object_type=article&pos=2&luicode=10000011&lfid=1076035650743478&featurecode=20000320&id=2309404137016584472707","page_title": "京东","content1": "90后小哥征婚启事","content2": "","icon": "/upload//12/28/14/feed_headlines_icon_flash1228_2.png","type": "article"},"bid": "FfwYadLuD"},"reposts_count": 0,"comments_count": 30,"attitudes_count": 1,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"raw_text": "就问一句,这样人美心善的90后小哥你们要不要?[舔屏][舔屏]","bid": "FfxXIdHGm"},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4136952959746775","scheme": "/status/FfvjxETA3?mblogid=FfvjxETA3&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-04","id": "4136952959746775","mid": "4136952959746775","idstr": "4136952959746775","text": "周五早上上班的你和下班的你<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_xiaoku-7430606cb7.png\" style=\"width:1em;height:1em;\" alt=\"[笑cry]\"></span> ​​​","textLength": 33,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg","bmiddle_pic": "/bmiddle/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg","original_pic": "/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 0,"comments_count": 14,"attitudes_count": 1,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfvjxETA3","pics": [{"pid": "006apWvQgy1fi7fkqpatfj30j60j6jsg","url": "/orj360/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg","size": "orj360","geo": {"width": 360,"height": 360,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7fkqpatfj30j60j6jsg.jpg","geo": {"width": "690","height": "690","croped": false}}},{"pid": "006apWvQgy1fi7fkuj1tvg308c0fkmxy","url": "/orj360/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif","size": "orj360","geo": {"width": "300","height": "560","croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi7fkuj1tvg308c0fkmxy.gif","geo": {"width": "300","height": "560","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4136663145262324","scheme": "/status/FfnM6m4Yc?mblogid=FfnM6m4Yc&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-03","id": "4136663145262324","mid": "4136663145262324","idstr": "4136663145262324","text": "输入法,你们喜欢用哪种?<span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ​​​","textLength": 30,"source": "微博 ","favorited": false,"thumbnail_pic": "/thumbnail/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg","bmiddle_pic": "/bmiddle/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg","original_pic": "/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg","user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 4,"comments_count": 40,"attitudes_count": 6,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"bid": "FfnM6m4Yc","pics": [{"pid": "006apWvQgy1fi6i8tkspqj30ku0i7mz4","url": "/orj360/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg","size": "orj360","geo": {"width": 309,"height": 270,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi6i8tkspqj30ku0i7mz4.jpg","geo": {"width": "750","height": "655","croped": false}}},{"pid": "006apWvQgy1fi6i8z010xj30ku0h6jte","url": "/orj360/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg","size": "orj360","geo": {"width": 327,"height": 270,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi6i8z010xj30ku0h6jte.jpg","geo": {"width": "750","height": "618","croped": false}}},{"pid": "006apWvQgy1fi6i988w7pj30kt0hbgms","url": "/orj360/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg","size": "orj360","geo": {"width": 324,"height": 270,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi6i988w7pj30kt0hbgms.jpg","geo": {"width": "749","height": "623","croped": false}}},{"pid": "006apWvQgy1fi6i9bnkgfj30ku0gwgmj","url": "/orj360/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg","size": "orj360","geo": {"width": 333,"height": 270,"croped": false},"large": {"size": "large","url": "/large/006apWvQgy1fi6i9bnkgfj30ku0gwgmj.jpg","geo": {"width": "750","height": "608","croped": false}}}]},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4136613988263792","scheme": "/status/FfmuOyFMY?mblogid=FfmuOyFMY&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-03","id": "4136613988263792","mid": "4136613988263792","idstr": "4136613988263792","text": "<a class='k' href='/k/%E5%BC%A0%E8%8B%A5%E6%98%80%E5%94%90%E8%89%BA%E6%98%95%E5%85%AC%E5%BC%80%E6%81%8B%E6%83%85?from=feed'>#张若昀唐艺昕公开恋情#</a> 恭喜呀<span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span><span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/l_xin-8e9a1a0346.png\" style=\"width:1em;height:1em;\" alt=\"[心]\"></span>,大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)<span class=\"url-icon\"><img src=\"///m/emoticon/icon/default/d_wabishi-f5765407f7.png\" style=\"width:1em;height:1em;\" alt=\"[挖鼻]\"></span>","source": "微博 ","favorited": false,"user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"retweeted_status": {"created_at": "08-02","id": "4136423907632073","mid": "4136423907632073","idstr": "4136423907632073","text": "时光赐给我们盗不走的爱人,而你赐给我时光。<a href='/n/唐艺昕'>@唐艺昕</a> ​​​","textLength": 49,"source": "iPhone 6s","favorited": false,"thumbnail_pic": "/thumbnail/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg","bmiddle_pic": "/bmiddle/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg","original_pic": "/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg","user": {"id": 1827683445,"screen_name": "张若昀","profile_image_url": "/crop.9.0.494.494.180/6cf03c75jw8fajncv51lvj20e80dq74i.jpg","profile_url": "/u/1827683445?uid=1827683445&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 1199,"verified": true,"verified_type": 0,"verified_type_ext": 1,"verified_reason": "演员张若昀","description": "Per Aspera Ad Astra 循此苦旅,以达天际。 工作邮箱:ruoyunwork@","gender": "m","mbtype": 12,"urank": 37,"mbrank": 6,"follow_me": false,"following": false,"followers_count": 13527839,"follow_count": 195,"cover_image_phone": "/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg"},"picStatus": "0:1,1:1","reposts_count": 283896,"comments_count": 325438,"attitudes_count": 2380726,"isLongText": false,"visible": {"type": 0,"list_id": 0},"cardid": "star_183","bid": "Ffhyew1rX","pics": [{"pid": "6cf03c75ly1fi5qtg3z8fj20hs0nqq46","url": "/orj360/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg","size": "orj360","geo": {"width": 360,"height": 480,"croped": false},"large": {"size": "large","url": "/large/6cf03c75ly1fi5qtg3z8fj20hs0nqq46.jpg","geo": {"width": "640","height": "854","croped": false}}},{"pid": "6cf03c75ly1fi5qtfv90rj20c80c6dgs","url": "/orj360/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg","size": "orj360","geo": {"width": 271,"height": 270,"croped": false},"large": {"size": "large","url": "/large/6cf03c75ly1fi5qtfv90rj20c80c6dgs.jpg","geo": {"width": "440","height": "438","croped": false}}}]},"reposts_count": 3,"comments_count": 13,"attitudes_count": 6,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"raw_text": "#张若昀唐艺昕公开恋情# 恭喜呀[心][心][心],大家就默默干了这碗狗粮吧,狗粮够吃吗?不够吃的话,你(jing)们(dong)懂(you)的(shou)[挖鼻]","bid": "FfmuOyFMY"},"show_type": 0,"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4136598981629551","scheme": "/status/Ffm6C6PV5?mblogid=Ffm6C6PV5&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-03","id": "4136598981629551","mid": "4136598981629551","idstr": "4136598981629551","text": "仿佛看到了自己<span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/d_erha-0d2bea3a7d.png\" style=\"width:1em;height:1em;\" alt=\"[二哈]\"></span>","source": "微博 ","favorited": false,"user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"retweeted_status": {"created_at": "08-02","id": "4136434165892638","mid": "4136434165892638","idstr": "4136434165892638","text": "我在张若昀和唐艺昕公开恋情的微博里看到了你唉~~<span class=\"url-icon\"><img src=\"///m/emoticon/icon/others/d_doge-d903433c82.png\" style=\"width:1em;height:1em;\" alt=\"[doge]\"></span> ​​​","textLength": 54,"source": "","favorited": false,"thumbnail_pic": "/thumbnail/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg","bmiddle_pic": "/bmiddle/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg","original_pic": "/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg","user": {"id": 3147292215,"screen_name": "草图君","profile_image_url": "/crop.0.0.511.511.180/bb97de37jw8f57ewfuqt9j20e70e8q37.jpg","profile_url": "/u/3147292215?uid=3147292215&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 5980,"verified": true,"verified_type": 0,"verified_type_ext": 1,"verified_reason": "直播红人 微博知名综艺博主","description": "一个得罪了半个娱乐圈的少年","gender": "m","mbtype": 12,"urank": 44,"mbrank": 6,"follow_me": false,"following": false,"followers_count": 6192418,"follow_count": 433,"cover_image_phone": "/crop.0.0.640.640.640/bb97de37jw1ewysfmiioyj20yi0ykqe7.jpg"},"picStatus": "0:1,1:1,2:1,3:1","reposts_count": 3832,"comments_count": 7349,"attitudes_count": 65785,"isLongText": false,"visible": {"type": 0,"list_id": 0},"bid": "FfhOMoIWy","pics": [{"pid": "bb97de37ly1fi5s0g76jrj20yi0p1n0m","url": "/orj360/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg","size": "orj360","geo": {"width": 372,"height": 270,"croped": false},"large": {"size": "large","url": "/large/bb97de37ly1fi5s0g76jrj20yi0p1n0m.jpg","geo": {"width": "1242","height": "901","croped": false}}},{"pid": "bb97de37ly1fi5s0goz0nj20hs0nq0tw","url": "/orj360/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg","size": "orj360","geo": {"width": 360,"height": 480,"croped": false},"large": {"size": "large","url": "/large/bb97de37ly1fi5s0goz0nj20hs0nq0tw.jpg","geo": {"width": "640","height": "854","croped": false}}},{"pid": "bb97de37ly1fi5s0h69g3j20c80c7juk","url": "/orj360/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg","size": "orj360","geo": {"width": 270,"height": 270,"croped": false},"large": {"size": "large","url": "/large/bb97de37ly1fi5s0h69g3j20c80c7juk.jpg","geo": {"width": "440","height": "439","croped": false}}},{"pid": "bb97de37ly1fi5s0fg68mj202g02g3yo","url": "/orj360/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg","size": "orj360","geo": {"width": "88","height": "88","croped": false},"large": {"size": "large","url": "/large/bb97de37ly1fi5s0fg68mj202g02g3yo.jpg","geo": {"width": "88","height": "88","croped": false}}}]},"reposts_count": 2,"comments_count": 21,"attitudes_count": 7,"isLongText": false,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"raw_text": "仿佛看到了自己[二哈]","bid": "Ffm6C6PV5"},"show_type": 0,"openurl": ""},{"card_type": 11,"show_type": 0,"card_group": [],"openurl": ""},{"card_type": 9,"itemid": "1076035650743478_-_4136407577953610","scheme": "/status/Ffh7Txn62?mblogid=Ffh7Txn62&luicode=10000011&lfid=1076035650743478&featurecode=20000320","mblog": {"created_at": "08-02","id": "4136407577953610","mid": "4136407577953610","idstr": "4136407577953610","text": "<a class='k' href='/k/%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B?from=feed'>#一个感人的故事#</a>去年暑假,8岁的小明特意坐了三个多小时车去奶奶家;奶奶为了小明也愿意去县城的超市买小明爱的薯片和巧克力等零食,但是奶奶家没有WiFi和智能手机,奶奶可以陪他一起看古装电视剧;讲他最爱听的神话故事,唱小曲哄他睡觉……奶奶家有吃不完的零食,也不会&quot;太无聊了&quot;<br/>今年,奶奶提前做 ​​​...<a href=\"/status/4136407577953610\">全文</a>","textLength": 393,"source": "微博 ","favorited": false,"user": {"id": 5650743478,"screen_name": "京东客服","profile_image_url": "/crop.38.7.206.206.180/006apWvQjw8f9dwuejt68j307y0630sz.jpg","profile_url": "/u/5650743478?uid=5650743478&luicode=10000011&lfid=1076035650743478&featurecode=20000320","statuses_count": 3245,"verified": true,"verified_type": 2,"verified_type_ext": 0,"verified_reason": "北京京东世纪贸易有限公司","description": "订单咨询、问题反馈、意见建议……获取专业贴心服务,尽在京东客服","gender": "f","mbtype": 2,"urank": 29,"mbrank": 2,"follow_me": false,"following": false,"followers_count": 18427,"follow_count": 235,"cover_image_phone": "/crop.0.0.640.640.640/006apWvQjw1f2g20q03tbj30e80e8t93.jpg"},"reposts_count": 6,"comments_count": 17,"attitudes_count": 2,"isLongText": true,"visible": {"type": 0,"list_id": 0},"mblogtype": 0,"page_info": {"page_pic": {"url": "/thumb180/74f67c55jw9ey0hrixq57j2050050t92.jpg"},"page_url": "/p/index?containerid=100808f50fb5741ffd610570b92baf2cc3b342&extparam=%E4%B8%80%E4%B8%AA%E6%84%9F%E4%BA%BA%E7%9A%84%E6%95%85%E4%BA%8B&luicode=10000011&lfid=1076035650743478&featurecode=20000320","page_title": "#一个感人的故事#","content1": "","content2": "3人关注","type": "topic"},"bid": "Ffh7Txn62"},"show_type": 0,"openurl": ""}],"ok": 1,"showAppTips": 0,"scheme": "sinaweibo://cardlist?containerid=1076035650743478&luicode=10000011&lfid=100103type=1&q=京东客服&featurecode=20000320"}

###上面只是一个页面的说说,估计写前端移动端的要晕死,好恶心,要是返回个null或者空回来。。

上面代码可以直接在jsonview里面进行格式化,

爬取的字段是:cards 下面的mblog下面的:text ,idstr(拼接评论页的)

评论条目:/api/comments/show?id=4137390568546147&page=2

这里的id就是idstr

详情页就是上面评论条目的json串,搞下来也是一大把,跟上面的差不多,详情页里面的数据跟评论页的数据差不多,这里就不再继续多些了,因为上面的内容已经占用的差不多了

因为微博的封IP地址的原因,所以第一次爬取了4w多数据,就GG了,第二天晚上睡眠30秒,爬取一条,发现,毛用也没有,只好是接着爬,ip不封了之后换了cookie,换了starturl,换了page索引继续爬取,也睡眠了10秒,反正睡多了也没用,最后爬取的垃圾数据有22万左右吧,去掉去重不要的估计也就4000不知道有没有,反正也没数。

附上几张爬虫过程中的图片截图:

最后是微博数据的结果图片:

这里的代码上传到github上了,有需要的话可以自己去下载,另外写了一份类似于 爬取新浪微博京东客服 @京东客服的简单爬虫。

发一下牢骚,json串又多又大又不稳定,返回不一致

贴上部分代码:

# encoding=utf8import requestsimport jsonimport reimport timestartUrl = '/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/0101 Firefox/54.0','Cookie': 'ALF=1504709445; SCF=Ag0epa_4tyFCglnCwHJiaRDznUy645wpqEhg-dG3Sv0cbfGX1wNmqXPnHQroard1FW2nn3RdCnmux4VZ7bFRuMo.; SUHB=0ebt4qVvtKU1d7; _T_WM=22bb4d80315608a0e9bd3bf92b3c1dac; SUB=_2A250jA4VDeRhGeBN6FsT8i7MyTyIHXVXjpJdrDV6PUJbktBeLXjBkW1oTOqmqg0rff3UmekP4TzhMFYtsw..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WFNrBkhSeVrfPGckwnaFCcy5JpX5o2p5NHD95Qce0e4eoz7ehz7Ws4DqcjBIcHVdr.peoepeoefeK5Ee5tt; M_WEIBOCN_PARAMS=luicode%3D10000011%26lfid%3D100103type%253D1%2526q%253D%2540%25E4%25BA%25AC%25E4%25B8%259C%25E5%25AE%25A2%25E6%259C%258D%26featurecode%3D20000320%26fid%3D1076035650743478%26uicode%3D10000011','Host':'','Accept':'application/json, text/plain, */*','Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3','Accept-Encoding':'gzip, deflate, br','X-Requested-With':'XMLHttpRequest','Referer':'/u/5650743478?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%40%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320',}# 详情页listdetaiList = []# 说说textList = []# 说说跟详情页textAnddetailList = []# 评论数,详情页返回的是每一页10个commentsList = []numSizeList = []detaiLinks = []def getJsonData(url):req = requests.get(url, headers=headers)# print(req.text)return req.textjsonData = getJsonData(startUrl)def parseDetailListdata(listdata):for detailData in listdata:text = detailData['text'] if 'text' in detailData else ""reply_text = detailData['reply_text'] if 'reply_text' in detailData else ""f.write(text+'\r\n')print(text)print(reply_text)f.write(reply_text + '\r\n')# passdef parseJsonData(jsonData):global pagedetailjsondata = json.loads(jsonData, 'utf-8')print(jsondata)listdata = jsondata['cards']if 'cards' in jsondata else ""print(listdata)for datainfo in listdata:# print(datainfo)mblog = datainfo['mblog'] if 'mblog' in datainfo else ""# print(mblog)if len(mblog)> 0 : # 有数据,继续执行descText = mblog['text']# print(descText)descText = getTextInfo(descText)dex = '发表的说说开始:\r\n'f.write(dex)dex2 = '发表的说说内容:'+descText+'\r\n'f.write(dex2)print("发表的说说开始:")print('发表的说说内容:'+descText)textList.append(descText)comments = mblog['comments_count'] # 评论数numSizeList.append(comments)# print(comments)# if comments > 1: # 有评论,获取到评论链接上的数据#detailLine = datainfo['scheme']#print(detailLine)#detaiList.append(detailLine)idstr = mblog['idstr']detaiLinks = getpageSize(comments,idstr)pagedetail = 1for detaillink in detaiLinks:jsonData2 = getJsonData(detaillink)str11 = '评论详情页条目:'+str(pagedetail)+'.......\r\n'f.write(str11)print('评论详情页条目:'+str(pagedetail)+'.......')print(jsonData2)pagedetail = pagedetail +1jsonDatadetail = json.loads(jsonData2, 'utf-8')listdata = jsonDatadetail['data'] if 'data' in jsonDatadetail else ''# print(listdata)parseDetailListdata(listdata)pagedetail = 1print('主页条目结束...')f.write('主页条目结束...\r\n')# detailJsonStr = '/api/comments/show?id=' + str(idstr) + '&page=' + str(comments)# print(detailJsonStr)# commentsList.append(detailJsonStr)else:# 在里面的话,直接跳出方法returnprint('爬取结束......')def getTextInfo(textStr):# 得到文本内容# for textStr in textList:# print('***********')regx = '<span(.*?)</span>'strregx = pile(regx)strregx = re.findall(strregx, str(textStr))replacestr = str(textStr).replace('<span' + ''.join(strregx) + '</span>', '')str1 = '<span'sstr1 = str(textStr)[0:str(textStr).find(str1)]# print(sstr1)return sstr1# print(textStr)# print(replacestr)# 得到文本详情页链接def getpageSize(comments,idstr):for i in range(1,int((comments / 10))+2):# 评论也的linkdetaiLink = '/api/comments/show?id=' + str(idstr) + '&page=' +str(i)detaiLinks.append(detaiLink)# print(detaiLink)return detaiLinks# parseJsonData(jsonData)# print(str(textList)) page = 7# print(str(detaiList))f = open('微博京东说说跟评论.txt', 'a',encoding='utf-8')def main_start():for inde in range(11,50):# startUrl = '/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1005055650743478&page='+str(inde)startUrl = '/api/container/getIndex?uid=5650743478&luicode=10000011&lfid=100103type%3D1%26q%3D@%E4%BA%AC%E4%B8%9C%E5%AE%A2%E6%9C%8D&featurecode=20000320&type=uid&value=5650743478&containerid=1076035650743478&page={}'+str(inde)pageindex = '页数:'+str(inde)+'\r\n'print('startUrl '+'index '+str(inde)+''+startUrl)f.write(pageindex)data = getJsonData(startUrl)parseJsonData(data)time.sleep(2)f.close()main_start()

现在暂时可以借用这份代码,里面的url跟cookie换一下,用自己的账号就可以。另外爬虫要学会用fiddler等类似的抓包工具,感觉确实是抓包利器。

公司996啊, 加上自己的能力有限,确实现在学习也就到这深度了 以后要多了解一下cookie池,代理池之类似的东西。

github地址:

/643435675/PyStudy

end

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。