偽裝瀏覽器

  1. 設定 headers 的 user-agent,使用 requests.get 時多加 headers 參數。
    [dywang@dywmac zzz]$ cat crawler3.py
    #!/usr/bin/env python
    # coding: utf-8
    
    import sys
    reload(sys)
    sys.setdefaultencoding( "utf-8" )
    
    import requests
    
    url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/'
    
    headers = {
    	'User-Agent': 'Mozilla/5.0',
    	'From': 'dywang@csie.cyut.edu.tw'
    }
    htmlfile = requests.get(url, headers=headers)
    try:
    	htmlfile.raise_for_status()
    	print("Connected successfully")
    except Exception as err:
    	print("Failed: %s" % err)
    
  2. 執行程式,成功連線。
    [dywang@dywmac zzz]$ ./crawler3.py 
    Connected successfully