300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > Python抓取网页云音乐指定歌手的歌曲和评论数量

Python抓取网页云音乐指定歌手的歌曲和评论数量

时间:2019-03-12 00:12:11

相关推荐

Python抓取网页云音乐指定歌手的歌曲和评论数量

之前简单学了一下Python,没做过东西,心血来潮来了个idea,就写了一个抓取网页云音乐指定歌手的歌曲和评论数量的脚本。

代码如下,如果缺少包则用pip安装一下,不过AES加密用到的pycrypto包,编译安装需要有c++环境,所以建议下载编译好的版本,我这里是Python35的:/nsrathjen/pycrypto-py3.5-win64-binary

代码

import osimport jsonimport hashlibimport base64import binasciifrom Crypto.Cipher import AESimport requestsimport prettytabledefault_timeout = 100 #定义超时时间modulus = ('00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7''b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280''104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932''575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b''3ece0462db0a22b8e7')nonce = '0CoJUm6Qyw8W8jud'pubKey = '010001'# 歌曲加密算法, 基于/yanunon/NeteaseCloudMusic脚本实现def encrypted_id(id):magic = bytearray('3go8&$8*3*3h0k(2)2', 'u8')song_id = bytearray(id, 'cd u8')magic_len = len(magic)for i, sid in enumerate(song_id):song_id[i] = sid ^ magic[i % magic_len]m = hashlib.md5(song_id)result = m.digest()result = base64.b64encode(result)result = result.replace(b'/', b'_')result = result.replace(b'+', b'-')return result.decode('utf-8')# 加密算法, 基于/stkevintan/nw_musicbox脚本实现def encrypted_request(text):text = json.dumps(text)secKey = createSecretKey(16)encText = aesEncrypt(aesEncrypt(text, nonce), secKey)encSecKey = rsaEncrypt(secKey, pubKey, modulus)data = {'params': encText, 'encSecKey': encSecKey}return datadef aesEncrypt(text, secKey):pad = 16 - len(text) % 16text = text + chr(pad) * padencryptor = AES.new(secKey, 2, '0102030405060708')ciphertext = encryptor.encrypt(text)ciphertext = base64.b64encode(ciphertext).decode('utf-8')return ciphertextdef rsaEncrypt(text, pubKey, modulus):text = text[::-1]rs = pow(int(binascii.hexlify(text), 16), int(pubKey, 16), int(modulus, 16))return format(rs, 'x').zfill(256)def createSecretKey(size):return binascii.hexlify(os.urandom(size))[:16]# 此类用了post查询歌曲class NetEase:def __init__(self):self.header = {'Accept': '*/*','Accept-Encoding': 'gzip,deflate,sdch','Accept-Language': 'zh-CN,zh;q=0.8,gl;q=0.6,zh-TW;q=0.4','Connection': 'keep-alive','Content-Type': 'application/x-www-form-urlencoded','Host': '','Referer': '/search/','User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36'}self.cookies = {'appver': '1.5.2'}# 搜索单曲(1),歌手(100),专辑(10),歌单(1000),用户(1002) *(type)*def search(self, s, stype=1, offset=0, total='true', limit=100):action = '/api/search/get/web'data = {'s': s,'type': stype,'offset': offset,'total': total,'limit': limit}return self.httpRequest('POST', action, data)###发起一个http请求def httpRequest(self, method, action, query=None, urlencoded=None, callback=None, timeout=None):if(method == 'GET'):url = action if (query == None) else (action + '?' + query)connection = requests.get(url, headers=self.header, timeout=default_timeout)else:connection = requests.post(action,data=query,headers=self.header,timeout=default_timeout)connection.encoding = "UTF-8"connection = json.loads(connection.text)return connection####获取评论数量def getCommentNum(self, id):action = '/weapi/v1/resource/comments/R_SO_4_'+str(id)+'?csrf_token='csrf = ''req = {'csrf_token': csrf}data = encrypted_request(req)return self.httpRequest('POST', action, data) ################################################################################配置项singer = '周杰伦' #歌手page = 2 #只请求两页数据limit = 100 #每页的歌曲数量##########################正文##############################netEase = NetEase()musics = netEase.search(singer, stype=1)songCount = musics['result']['songCount']#总页数pageNum = int(songCount/limit) if songCount%limit==0 else int(songCount/limit)+1songs = [[]] * (songCount if (page > pageNum) else (page * limit))number = 0for currentPage in range(0,pageNum):if(currentPage>=page):breakprint('正在处理第' + str(currentPage + 1) + '页数据...')offset = currentPage*limitmusics = netEase.search(singer,stype=1,offset=offset,limit=limit)count = (songCount-offset) if currentPage==limit else limit###循环获取歌曲评论数量for key in range(0,count):songId = musics['result']['songs'][key]['id']songName = musics['result']['songs'][key]['name']result = netEase.getCommentNum(songId)commentNum = result['total']songs[number] = [songId,songName,singer,commentNum]number+=1songs.sort(key=lambda x:x[3],reverse=True) ##按评论数量逆序##表格输出结果table = prettytable.PrettyTable()table.field_names = ["ID", "歌名", "歌手", "评论数量"]for number in range(0, 20): ##只展示评论数量前20名的歌曲table.add_row([songs[number][0],songs[number][1],songs[number][2],songs[number][3]])print(table)

效果

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。