爬虫实战-爬取爱企查企业数据
|字数总计:4.1k|阅读时长:23分钟|阅读量:
本教程仅做学习分享,请勿用于违法用途!
1.简介
爱企查是百度推出的企业信息查询工具,可快速查询企业相关信息。在平时测试时,测试人员需要收集大量的企业信息,可以借助爬虫爬取爱企查网站实现。
在挖通用型CNVD时 ,一般需要满足企业注册资金在5000w以上,本文演示爬取满足注册资金在5000w以上的企业信息。
2.分析思路
1.登录爱企查,筛选企业类型为
- 注册资金在5000万以上
- 资本类型为人民币
- 企业状态为开业
2.进入网站后尝试从源码中取出广州市荔湾区华强电动设备行越秀分行
发现查询失败:
尝试f12抓包看请求
发现包内容过多。
3.选择去爬企业名称和注册资本
点击下方的页面索引:
查看抓到的包的访问payload
尝试多访问几页
发现payload基本一致
p:代表页数
f:代表筛选的数据信息
4.查看Response包(部分)
https: https:
{ "status": 0, "msg": "", "data": { "qType": 111, "queryStr": "", "pageNum": 3, "resultList": [ { "pid": "91770835364651", "entName": "\u6d59\u6c5f\u4e49\u94ed\u5efa\u7b51\u52b3\u52a1\u6709\u9650\u516c\u53f8", "entType": "\u6709\u9650\u8d23\u4efb\u516c\u53f8(\u81ea\u7136\u4eba\u6295\u8d44\u6216\u63a7\u80a1)", "validityFrom": "2018-08-22", "domicile": "\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u94b1\u5858\u65b0\u533a\u4e07\u4e9a\u540d\u57ce2\u5e62715\u5ba4", "entLogo": "", "openStatus": "\u5f00\u4e1a", "legalPerson": "\u5415\u7956\u5f3a", "tags": [], "logoWord": "\u4e49\u94ed\u5efa\u7b51", "hkLable": [], "isHkComp": 0, "isClaim": 0, "titleName": "\u6d59\u6c5f\u4e49\u94ed\u5efa\u7b51\u52b3\u52a1\u6709\u9650\u516c\u53f8", "titleLegal": "\u5415\u7956\u5f3a", "titleDomicile": "\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u94b1\u5858\u65b0\u533a\u4e07\u4e9a\u540d\u57ce2\u5e62715\u5ba4", "regCap": "1,011,000,000.0\u4e07", "scope": "\u8bb8\u53ef\u9879\u76ee\uff1a\u623f\u5c4b\u5efa\u7b51\u548c\u5e02\u653f\u57fa\u7840\u8bbe\u65bd\u9879\u76ee\u5de5\u7a0b\u603b\u627f\u5305\uff1b\u5404\u7c7b\u5de5\u7a0b\u5efa\u8bbe\u6d3b\u52a8\uff1b\u7535\u6c14\u5b89\u88c5\u670d\u52a1\uff1b\u5efa\u7b51\u52b3\u52a1\u5206\u5305\uff1b\u4eba\u9632\u5de5\u7a0b\u9632\u62a4\u8bbe\u5907\u5236\u9020(\u4f9d\u6cd5\u987b\u7ecf\u6279\u51c6\u7684\u9879\u76ee\uff0c\u7ecf\u76f8\u5173\u90e8\u95e8\u6279\u51c6\u540e\u65b9\u53ef\u5f00\u5c55\u7ecf\u8425\u6d3b\u52a8\uff0c\u5177\u4f53\u7ecf\u8425\u9879\u76ee\u4ee5\u5ba1\u6279\u7ed3\u679c\u4e3a\u51c6)\u3002\u4e00\u822c\u9879\u76ee\uff1a\u5de5\u7a0b\u7ba1\u7406\u670d\u52a1\uff1b\u4f4f\u5b85\u6c34\u7535\u5b89\u88c5\u7ef4\u62a4\u670d\u52a1\uff1b\u516c\u8def\u6c34\u8fd0\u5de5\u7a0b\u8bd5\u9a8c\u68c0\u6d4b\u670d\u52a1\uff1b\u8f68\u9053\u4ea4\u901a\u4e13\u7528\u8bbe\u5907\u3001\u5173\u952e\u7cfb\u7edf\u53ca\u90e8\u4ef6\u9500\u552e\uff1b\u627f\u63a5\u603b\u516c\u53f8\u5de5\u7a0b\u5efa\u8bbe\u4e1a\u52a1\uff1b\u571f\u77f3\u65b9\u5de5\u7a0b\u65bd\u5de5\uff1b\u56ed\u6797\u7eff\u5316\u5de5\u7a0b\u65bd\u5de5\uff1b\u57ce\u5e02\u7eff\u5316\u7ba1\u7406\uff1b\u82b1\u5349\u7eff\u690d\u79df\u501f\u4e0e\u4ee3\u7ba1\u7406\uff1b\u9632\u8150\u6750\u6599\u9500\u552e\uff1b\u4e94\u91d1\u4ea7\u54c1\u96f6\u552e\uff1b\u4e94\u91d1\u4ea7\u54c1\u6279\u53d1\uff1b\u91d1\u5c5e\u6750\u6599\u9500\u552e\uff1b\u5efa\u7b51\u6750\u6599\u9500\u552e\uff1b\u6d82\u6599\u5236\u9020\uff08\u4e0d\u542b\u5371\u9669\u5316\u5b66\u54c1\uff09\uff1b\u91d1\u5c5e\u7ed3\u6784\u9500\u552e\uff1b\u5efa\u7b51\u7528\u94a2\u7b4b\u4ea7\u54c1\u9500\u552e\uff1b\u91d1\u5c5e\u7ed3\u6784\u5236\u9020(\u9664\u4f9d\u6cd5\u987b\u7ecf\u6279\u51c6\u7684\u9879\u76ee\u5916\uff0c\u51ed\u8425\u4e1a\u6267\u7167\u4f9d\u6cd5\u81ea\u4e3b\u5f00\u5c55\u7ecf\u8425\u6d3b\u52a8)\u3002", "regNo": "91320322MA1X36PF3D", "appJumpUrl": "aiqicha:\/\/open.app?params={\"naModule\":\"\/aqc\/detail\",\"naParam\":\"{\\\"pid\\\":\\\"91770835364651\\\"}\"}", "labels": { "opening": { "text": "\u5f00\u4e1a", "style": "blue", "fontColor": "#1EA830", "bgColor": "#EBF7EC" } }, "personTitle": "\u6cd5\u5b9a\u4ee3\u8868\u4eba", "personId": "0fa179597fa8aaccf6f6ae44062f8a24", "newLabels": [ { "key": "opening", "value": { "text": "\u5f00\u4e1a", "style": "blue", "fontColor": "#1EA830", "bgColor": "#EBF7EC" } } ] },
|
选取部分字段如
“entName”: “\u946b\u6e90\u9e3f\u660a(\u5929\u6d25)\u79d1\u6280\u6709\u9650\u516c\u53f8”,
“regCap”: “1,011,000,000.0\u4e07”,
进行解密:
发现解密结果对应了页面所要抓取的信息
3.程序编写
1.保存访问请求头:
Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9 Connection: keep-alive Cookie: “抓到的cookie” Host: aiqicha.baidu.com Referer: https://aiqicha.baidu.com/advancesearch/list Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114" Sec-Ch-Ua-Mobile: ?0 Sec-Ch-Ua-Platform: "Windows" Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 X-Requested-With: XMLHttpRequest Ymg_ssr: 1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre Zx-Open-Url: https://aiqicha.baidu.com/advancesearch/list
|
爬取代码前根据请求头制作headers
headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' }
|
2.构造params参数如图所示
params = { "p": "1", "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", }
|
3.爬虫实现代码
import requests
def main():
params = { "p": "1", "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) resultList = response.json()["data"]["resultList"] for item in resultList: print(item) print(item['entName']) pass
if __name__ == '__main__': main()
|
4.成功爬取结果:
与browser访问一致
5.爬取内容后保存至文件中
def main(n):
f = open("./a.txt", "a")
params = { "p": "1". "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) result = response.json()["data"]["resultList"] for item in result: a = "公司名称:"+item['entName']+' '+"注册资金:"+item['regCap']+' '+"公司类型:"+item['entType']+' '+"公司成立时间:"+item['validityFrom']+' '+"公司地址:"+item["domicile"]+' '+'开业状况:'+item["openStatus"]+'\n' f.write(a)
f.close() pass
|
6.爬取多页信息
import requests
def main(n):
f = open("./a.txt", "a")
m = str(n) params = { "p": m, "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) result = response.json()["data"]["resultList"] for item in result: a = "公司名称:"+item['entName']+' '+"注册资金:"+item['regCap']+' '+"公司类型:"+item['entType']+' '+"公司成立时间:"+item['validityFrom']+' '+"公司地址:"+item["domicile"]+' '+'开业状况:'+item["openStatus"]+'\n' f.write(a)
f.close() pass
if __name__ == '__main__': main(1) main(2)
|
成功爬取
运用for循环爬取多页
import requests
def main(n):
f = open("./a.txt", "a")
m = str(n) params = { "p": m, "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓取的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) result = response.json()["data"]["resultList"] for item in result: a = "公司名称:"+item['entName']+' '+"注册资金:"+item['regCap']+' '+"公司类型:"+item['entType']+' '+"公司成立时间:"+item['validityFrom']+' '+"公司地址:"+item["domicile"]+' '+'开业状况:'+item["openStatus"]+'\n' f.write(a)
f.close() pass
if __name__ == '__main__': for i in range(1,5): main(i)
|
4.多线程爬虫
1.普通爬虫
爬取爱企查500页数据
import requests import time
def main(n):
f = open("./a.txt", "a")
m = str(n) params = { "p": m, "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686132225976_1686209020456_Krd2ws59qA1TALmBzJb0wdzDayRK14xrDbs2NsxGZghPtW2AaG8K4Now6K/zTfBecE9i1aegJL5KTT+cxrK6j9JK2ix9MoT8+qWljBqcj0i1PEU+RTVvgGjdlkPqamJydND6fQSkBidmHrZVhBXKSiqByC1knHzxmFJl0d+FYKOIK5Yue9P/KBLU2Q3FIF1YsrkslH8qYriHAg5MNXzRTlrU4gdLir4fXP++E+JwOUR2jkA8Mv4vTuUOG7crcKk2W+omKed78e4P6vi6j3qgdk6Pc4jel6aItYE8iTIAZCAcJNP4raFja3p6Eb/KAqre', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) result = response.json()["data"]["resultList"] for item in result: try: a = "公司名称:"+item['entName']+' '+"注册资金:"+item['regCap']+' '+"公司类型:"+item['entType']+' '+"公司成立时间:"+item['validityFrom']+' '+"公司地址:"+item["domicile"]+' '+'开业状况:'+item["openStatus"]+'\n' f.write(a) except: continue f.close() pass
if __name__ == '__main__': start = time.time() for i in range(1,501): main(i) end = time.time() print("直接爬取耗费时间",(end-start))
|
成功爬取5000行数据
程序耗费时间
2.多线程爬虫
开10个线程跑
import requests import time from threading import Thread from multiprocessing import Queue def main(n): f = open("./a.txt", "a") m = str(n) params = { "p": m, "s": "10", 'f': '{"regCap":[{"start":5000,"end":0}],"regCapType":["1"],"openStatus":["开业"]}', "o": "0", } headers = { "Accept": "application/json, text/plain, */*", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": 'zh-CN,zh;q=0.9', "Connection": "keep-alive", "Cookie": '抓到的cookie', 'Host': 'aiqicha.baidu.com', 'Referer': 'https://aiqicha.baidu.com/advancesearch/list', 'Sec-Ch-Ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"', 'Sec-Ch-Ua-Mobile': '?0', 'Sec-Ch-Ua-Platform': '"Windows"', 'Sec-Fetch-Dest': 'empty', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Site': 'same-origin', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'X-Requested-With': 'XMLHttpRequest', 'Ymg_ssr': '1686218663386_1686228429829_WB0FB0MHEc9Cck35D45asRQbQY/PoeVjFk/N0r6KlgJOBaP2AQeCctHp1+UF5PJ0r0oQ5oILGzmfycrw5JXeQkWKFeoFANP7yZEm/24Zn7bQjBOkalyujWkvRU8gYBnRfui6CEnMI6owfohQkqi47feLPDyQ304QODUtg1jr3lP0Yn+4K10FKN3G210hXv9FpwJz3ze/f6japRpezUodE+/Ac2i8kX4il0MtJjc6SRj1Smi+H0bM1xAH5/LthB1gHm0akZCD0pNskPl3oBpmNvRLcQEqf8D7heK+Krw+A1lkjfywECbMAzcktjLm9XLQHaSG+8O2W5p+F7LF0qTVHDcxw7nEkE8/Ix0zG5NnnPttJF1pvM4H4aOCoXnuARy8MuWogjpUrxI2JqjPd3Fjoz13c4usSIy1rQ5OO8BAe5syq+XIuX6+X2i9hmv4C7NwfkFPzUA0UF483F8KAgBFyVA+Pa2V8o95TjRbeCrwgLmp1q4h2tVtBtoplvnr8WKO3MG16+pHq7vcVaZzYSkxL0yIV9e0SKzgeCxmeIaFNZx2oCCtb4xND6+MRKe2VYOePXFuKYA6mG1MQ7/ZkvXz2IsG8t0dXDdZtmU0M8szb7HGxDYzXCjBXkbvmWuidBoI0xLbFSt8fBeQs2NWT0BUHwP0mRJ+52oDDTBobqYTJdZQ81bBVytZpVqeNU72/0rMnWhf2nbu0zBb4fuu5TdnClbECDrfzkzP5WQ94E5XeJSfEKw1HrmxjOKQhECNflwn8WhnkP32FDquj8e+0yLBADAVT5/dPyyeakElNGd4ZdTI10tszotkziWMyKg+qm2ST/NOpM2apFTWLxtaDePALLbwucfQ0E/aMdhYhztSbd7b28zaL0DYQEAcBwUuA4sGC3I/w63nTR00hi2n8awwTKNtpyJvvMjA1NmuwKWZvBrRLjrVwsYyFjTWVDX2cQa7u3/WHubLJo4uSuqNE3a1+FO3aGFwMZupfH7pCKA3LSRjgO829MQnzX5teielCpcywP933QFbMHbeqkn+zGXlDQ==', 'Zx-Open-Url': 'https://aiqicha.baidu.com/advancesearch/list' } url = "https://aiqicha.baidu.com/s/advanceSearchAjax" response = requests.get(url, headers=headers, params=params) try: result = response.json()["data"]["resultList"] for item in result: a = "公司名称:"+item['entName']+' '+"注册资金:"+item['regCap']+' '+"公司类型:"+item['entType']+' '+"公司成立时间:"+item['validityFrom']+' '+"公司地址:"+item["domicile"]+' '+'开业状况:'+item["openStatus"]+'\n' f.write(a)
except: print() pass
def run_reptile(): while not qq.empty(): i = qq.get() main(i) pass
if __name__ == '__main__': qq = Queue() for i in range(1,501): qq.put(i) thread_number = 10 start = time.time() Threads = []
for i in range(thread_number): t = Thread(target=run_reptile) t.start() Threads.append(t) for t in Threads: t.join()
end = time.time() print("直接爬取耗费时间",(end-start))
|
5.参考文章
python爬虫 爬取爱企查公司信息_python 爱企查_代码永不报错的博客-CSDN博客
免责声明:本教程仅做学习分享,请勿用于违法用途!