300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > 爬虫:requests BeautifulSoup 实战案例

爬虫:requests BeautifulSoup 实战案例

时间:2020-07-07 21:59:01

相关推荐

爬虫:requests  BeautifulSoup 实战案例

爬取猫途鹰旅游网站:/Attractions-g60763-Activities-New_York_City_New_York.html景点信息

from bs4 import BeautifulSoupimport requestsurl_saves = '/Saves#37685322'url = '/Attractions-g60763-Activities-New_York_City_New_York.html'urls = ['/Attractions-g60763-Activities-oa{}-New_York_City_New_York.html#ATTRACTION_LIST'.format(str(i)) for i in range(30,930,30)]headers = {'User-Agent':'','Cookie':''}def get_attractions(url,data=None):wb_data = requests.get(url)time.sleep(4)soup = BeautifulSoup(wb_data.text,'html.parser')titles = soup.select('div.property_title > a[target="_blank"]')imgs= soup.select('img[width="160"]')cates= soup.select('div.p13n_reasoning_v2')if data == None:for title,img,cate in zip(titles,imgs,cates):data = {'title' :title.get_text(),'img' :img.get('src'),'cate' :list(cate.stripped_strings),}print(data)def get_favs(url,data=None):wb_data = requests.get(url,headers=headers)soup= BeautifulSoup(wb_data.text,'lxml')titles = soup.select('a.location-name')imgs= soup.select('div.photo > div.sizedThumb > img.photo_image')metas = soup.select('span.format_address')if data == None:for title,img,meta in zip(titles,imgs,metas):data = {'title' :title.get_text(),'img' :img.get('src'),'meta' :list(meta.stripped_strings)}print(data)for single_url in urls:get_attractions(single_url)

PC端爬取信息容易受到限制,若爬取失败,可尝试移动端

headers = {'User-Agent':'', #mobile device user agent from chrome}mb_data = requests.get(url,headers=headers)soup = BeautifulSoup(mb_data.text,'lxml')imgs = soup.select('div.thumb.thumbLLR.soThumb > img')for i in imgs:print(i.get('src'))

headers 提供网页爬取时的头部信息,让对方识别为人的操作。

在谷歌浏览器里输入chrome://version,就可以看到用户代理,将用户代理添加到头部信息。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。