python实现,结果保存在同一目录下的douba.txt中
使用beautifulsoup实现
#coding=utf-8
import urllib2from bs4 import BeautifulSoup#伪造的头,不知到有用否sendHeaders = {User-Agent:Mozilla/5.3 (Windows NT 7.2; rv:18.0) Gecko/0101 Firefox/19.0,Accept: ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,Connection:keep-alive}urlTmep = /top250?start=saveFile = open(douban.txt,a)k=1for i in range(11):url = urlTmep + str(i*5) #页码是通过get方式获取,同每页在后面都是5的倍数,一共10页request = urllib2.Request(url,headers