python 环境配置 requests 模块使用,打开tcp keepalive解决请求长时间返回时tcp连接断开问题

提示 Can‘t connect to HTTPS URL because the SSL module is not available

Anaconde 安装环境问题

将
libcrypto-1_1_x64.dll
libssl-1_1-x64.dll
从
D:\Anaconda\Library\bin
复制到
D:\Anaconda\DLLs

即可
参考
https://blog.csdn.net/Sky_Tree_Delivery/article/details/109078288
https://github.com/conda/conda/issues/8273

手动安装requests模块

在 Anaconde3\pkg 下找到requests模块压缩包
将包中的lib\site-packages\下的
requests
requests-2.21.0.dist-info
目录复制到anaconda3\lib\site-packages目录下

如果仍然无法import ,则添加模块搜索路径
查看当前搜索路径

>>> import sys
>>> sys.path
['', '/usr/local/lib/python35.zip', '/usr/local/lib/python3.5', '/usr/local/lib/python3.5/plat-linux', '/usr/local/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/site-packages']

添加模块搜索路径
set PYTHONPATH=c:\programdata\anaconda3\lib\site-packages

参考 Python 模块搜索路径
https://blog.csdn.net/liang19890820/article/details/76219560

用requests 模块模拟 curl请求

import requests

headers = {
    'Content-type': 'application/json',
}

data = '{"text":"Hello, World!"}'

response = requests.post('https://hooks.slack.com/services/asdfasdfasdf', headers=headers, data=data)

参考
https://stackoverflow.com/questions/25491090/how-to-use-python-to-execute-a-curl-command
Python3之requests模块
https://www.cnblogs.com/wang-yc/p/5623711.html

requests模块报错 https certificate verify failed

要么设置为不验证https证书,要么添加证书
不验证
requests.get('https://example.com', verify=False)

取消校验输出警告的问题
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

参考
如何解决Requests的SSLError?
https://www.jianshu.com/p/8deb13738d2c
解决Python3 控制台输出InsecureRequestWarning的问题
https://www.cnblogs.com/helloworldcc/p/11107920.html

使用json模块处理json数据

import json
js=json.loads(jsontext)
field=js['field']

json.dump(dict1,out_file,indent=6)  #,ensure_ascii=true

python request请求打开tcp keepalive

python设置tcp keepalive

windows和linux的设置方法有区别
linux

    sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec)
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec)
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)

windows 会报错AttributeError: 'module' object has no attribute 'TCP_KEEPIDLE'
windows需要用 ioctl来设置

sock.ioctl(socket.SIO_KEEPALIVE_VALS, (1, 10000, 3000))

这是操作系统socket接口实现不一致导致的差异
参考【整理】Python如何保持TCP心跳
http://www.jyguagua.com/?p=3066

requests 设置tcpkeepalive

为啥要设置keepalive呢,起因是某些requests长时间不返回的调用会报错
Python: [WinError 10054] An existing connection was forcibly closed by the remote host
这是说远端断开了连接,但对比chrome长时间调用就能返回,不会报错,说明python这边应该有啥可以改的。
搜了搜发现可能和tcp的keepalive 有关系,chrome默认是打开tcp的keepalive的45秒发送一次,python默认是不开tcp的keepalive的。

设置方法研究

1 在requests的实现中找

requests模块可以通过自定义adapter的方式来自己初始化PoolManager
然后在PoolManager初始化时带入socket_options 来设置socket_options,方法如下。
https://requests.readthedocs.io/en/master/user/advanced/#transport-adapters
https://urllib3.readthedocs.io/en/latest/reference/urllib3.connection.html

#参考  https://stackoverflow.com/questions/24569428/how-to-specify-socket-options-in-python-requests-lib-since-urllib3-v1-8-3-has
# How to specify “socket_options” in python-requests lib since urllib3 v1.8.3 has been added the “socket_options” feature?
# 这里有完整的代码
import requests
import socket
from requests.adapters import HTTPAdapter
from requests.adapters import PoolManager
from requests.packages.urllib3.connection import HTTPConnection

class SockOpsAdapter(HTTPAdapter):
  def __init__(self, options, **kwargs):
    self.options = options
    super(SockOpsAdapter, self).__init__()
  def init_poolmanager(self, connections, maxsize, block=False):
    print "init_poolmanager"
    self.poolmanager = PoolManager(num_pools=connections,
                                   maxsize=maxsize,
                                   block=block,
                                   socket_options=self.options)

options =  HTTPConnection.default_socket_options + [
              (socket.SOL_SOCKET, socket.SO_REUSEADDR, 1),
          ]

print "build session"
s = requests.Session()
s.mount('http://', SockOpsAdapter(options))
s.mount('https://', SockOpsAdapter(options))

for i in xrange(0, 10):
  print "sending request %i" % i
  url = 'http://host:port' #put in a host/port here
  headers = {'Content-Type':'text/plain', 'Accept':'text/plain'}
  post_status = s.get(url, headers=headers)
  print "Post Status Code = %s" % str(post_status.status_code)
  print post_status.content[0:50]
2 硬改urlilb3的源码

但由于windows要用ioctl来设置,所以这样不管用,urllib3没有找到在那暴露了ioctl设置。
看源码发现上面带入socketoption的方法只能 sock.setsockopt(*opt),没有ioctl
最后是改了urllib3的util/connectio.py 源码,暴力的增加了代码。。。。。如下
https://github.com/urllib3/urllib3/blob/d0b20763f55536aec43caae9d180aa16c7b77d09/src/urllib3/util/connection.py

def _set_socket_options(sock, options):
    if options is None:
        return

    for opt in options:
        sock.setsockopt(*opt)
    #在这增加了设置tcp keepalive
    sock.ioctl(socket.SIO_KEEPALIVE_VALS,(1,10000,3000))

Anaconda3的默认环境的话,文件在

c:\programdata\anaconda3\lib\site-packages\urllib3\util\connection.py
3 通过获取requests 的fileno来获取socket对象

https://stackoverflow.com/questions/32310951/how-to-get-the-underlying-socket-when-using-python-requests
对于流连接(使用stream = True参数打开的连接),可以在响应对象上调用.raw.fileno()方法以获取打开的文件描述符。然后可以使用socket.fromfd(...)方法从描述符创建Python套接字对象。

>>> import requests
>>> import socket
>>> r = requests.get('http://google.com/', stream=True)
>>> s = socket.fromfd(r.raw.fileno(), socket.AF_INET, socket.SOCK_STREAM)
>>> s.getpeername()
('74.125.226.49', 80)
>>> s.getsockname()
('192.168.1.60', 41323)

评论还说可以通过response的hook callback来获取,就可以跳过必须是stream的限制。
但搜response的文档,发现只能对设置response设置hook。。。。。这个功能我用不上了。
这个hook的使用可以参考下面的文档,官方文档内容太少了,用时还要看源码
Using hooks for custom behaviour in requests
https://alexwlchan.net/2017/10/requests-hooks/

注意 SO_KEEPALIVE 不等于HTTP Keep-Alive

参考
冤枉urllib3了,望文生义不可取
https://steemit.com/python/@oflyhigh/urllib3

其他

Python Requests 小技巧总结
https://blog.csdn.net/xie_0723/article/details/52790786

参考

TCP keepalive的探究 (2) : 浏览器的Keepalive机制
https://blog.chionlab.moe/2016/11/07/tcp-keepalive-on-chrome/

Chrome对TCP连接的保活机制
从上面的抓包结果中看到,在服务器返回完整HTTP 200报文的45秒后(Time=72),本地发出了第一个TCP Keepalive探测包并收到来自服务器的ACK。
这说明,Chrome对于可复用的TCP连接,采用的保活机制是TCP层(传输层)自带的Keepalive机制,通过TCP Keepalive探测包的方式实现,而不是在七层报文上自成协议来传输其它数据。

TCP keepalive
https://zhuanlan.zhihu.com/p/82035839

© 2022, 新之助meow. 原创文章转载请注明: 转载自http://www.xinmeow.com

0.00 avg. rating (0% score) - 0 votes
点赞