以下是我学python爬虫的打怪升级之路,过程充满艰辛,也充满欢乐,虽然还未打倒大boss,但一路的风景就是最大的乐趣,不是么?希望大家能get到想要的东西!
多图预警!
<img src="https://pic4.zhimg.com/55e8bc9324234bc88b354821ce005bc3_b.png" data-rawwidth="288" data-rawheight="179" class="content_image" width="288">
<img src="https://pic3.zhimg.com/af1baba1052c2cd49cea5ea6986eb30a_b.png" data-rawwidth="242" data-rawheight="268" class="content_image" width="242">
<img src="https://pic2.zhimg.com/5ec82828ba71e96a7d86b7e88254ccd9_b.png" data-rawwidth="254" data-rawheight="230" class="content_image" width="254">
<img src="https://pic3.zhimg.com/c60bde3fec9e5f791b1a217613879b46_b.png" data-rawwidth="278" data-rawheight="320" class="content_image" width="278">
<img src="https://pic3.zhimg.com/974b3d7c1c50bac62c14afe58ff0ed26_b.png" data-rawwidth="309" data-rawheight="318" class="content_image" width="309">
<img src="https://pic2.zhimg.com/2c3e1e5f18d6e6cc8758337663c548f5_b.png" data-rawwidth="313" data-rawheight="264" class="content_image" width="313">
<img src="https://pic4.zhimg.com/b65ad1e407e0335107eca80e4a0bdac3_b.png" data-rawwidth="266" data-rawheight="240" class="content_image" width="266">
<img src="https://pic2.zhimg.com/70067cc590378e31676ed48192633d7d_b.png" data-rawwidth="269" data-rawheight="246" class="content_image" width="269">
<img src="https://pic4.zhimg.com/2cecf7ef8b19f24a2fb287403a51142b_b.png" data-rawwidth="299" data-rawheight="254" class="content_image" width="299">
<img src="https://pic3.zhimg.com/b2867a2ddb861a04a91fde5d34ed5982_b.png" data-rawwidth="212" data-rawheight="266" class="content_image" width="212">
<img src="https://pic3.zhimg.com/ae5a6594ab77bfdeaaa9e45b9420c93e_b.png" data-rawwidth="313" data-rawheight="266" class="content_image" width="313">
<img src="https://pic4.zhimg.com/5f65be4b49e5f84ab99efc92ab6ea61b_b.png" data-rawwidth="304" data-rawheight="232" class="content_image" width="304">
<img src="https://pic2.zhimg.com/506899fbbe618e05cbe1e2768665b17d_b.png" data-rawwidth="287" data-rawheight="234" class="content_image" width="287">
<img src="https://pic1.zhimg.com/009fcaa5d4a08f4eda54fb38b88e575c_b.png" data-rawwidth="325" data-rawheight="354" class="content_image" width="325">
<img src="https://pic3.zhimg.com/b93fbe0719c946b1a68a3f0b33937942_b.png" data-rawwidth="289" data-rawheight="243" class="content_image" width="289">
<img src="https://pic2.zhimg.com/ded59bb8038a10b3bfb4e65fd14db631_b.png" data-rawwidth="309" data-rawheight="189" class="content_image" width="309">
<img src="https://pic2.zhimg.com/8d8337c43a58a5386227e037891f9d61_b.png" data-rawwidth="266" data-rawheight="346" class="content_image" width="266">
<img src="https://pic2.zhimg.com/e5dbb6f838f6532b0d0a481c69a79ddd_b.png" data-rawwidth="338" data-rawheight="269" class="content_image" width="338">
<img src="https://pic4.zhimg.com/5e1b525feb212ff0b860481ecb67288b_b.png" data-rawwidth="255" data-rawheight="175" class="content_image" width="255">
以下奉献一段爬取知乎头像的代码
import requests import urllib import re import random from time import sleep def main(): url=' ' #感觉这个话题下面美女多 headers={省略} i=1 for x in xrange(20,3600,20): data={'start':'0', 'offset':str(x), '_xsrf':'a128464ef225a69348cef94c38f4e428'} #知乎用offset控制加载的个数,每次响应加载20 content=requests.post(url,headers=headers,data=data,timeout=10).text #用post提交form data imgs=re.findall('<img src=\\\\\"(.*?)_m.jpg',content) #在爬下来的json上用正则提取图片地址,去掉_m为大图 for img in imgs: try: img=img.replace('\\','') #去掉\字符这个干扰成分 pic=img+'.jpg' path='d:\\bs4\\zhihu\\jpg\\'+str(i)+'.jpg' #声明存储地址及图片名称 urllib.urlretrieve(pic,path) #下载图片 print u'下载了第'+str(i)+u'张图片' i+=1 sleep(random.uniform(0.5,1)) #睡眠函数用于防止爬取过快被封IP except: print u'抓漏1张' pass sleep(random.uniform(0.5,1)) if __name__=='__main__': main()
结果:
&lt;img src="https://pic2.zhimg.com/b1fc67ee3e290376fe882113ff7d44fd_b.png" data-rawwidth="710" data-rawheight="744" class="origin_image zh-lightbox-thumb" width="710" data-original="https://pic2.zhimg.com/b1fc67ee3e290376fe882113ff7d44fd_r.png"&gt;
最后,请关注我吧,我会好好维护你的时间线的
\( ^▽^ )/