requests 模組

  1. resquests.get 取得網頁的原始檔案,型態是 Response 物件。
    [dywang@dywmac zzz]$ vim crawler1.py
    [dywang@dywmac zzz]$ cat crawler1.py
    #!/usr/bin/env python
    # coding: utf-8
    
    import requests
    url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/'
    htmlfile = requests.get(url)
    print(type(htmlfile))
    [dywang@dywmac zzz]$ ./crawler1.py 
    <class 'requests.models.Response'>
    
  2. Response 物件屬性 status_code 代表是否成功取得網頁,text 為網頁內容。因為範例網頁有中文,所以必須使用 sys.setdefaultencoding 設定預設編碼為 utf-8,才能將網頁原始碼導向。
    [dywang@dywmac zzz]$ vim crawler1.py
    [dywang@dywmac zzz]$ cat crawler1.py
    #!/usr/bin/env python
    # coding: utf-8
    
    import sys
    reload(sys)
    sys.setdefaultencoding( "utf-8" )
    
    import requests
    url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/'
    htmlfile = requests.get(url)
    if htmlfile.status_code == requests.codes.ok:
    	print("Obtained web text successfully")
    else: 
    	print("Failed to get web text")
    print(htmlfile.text)
    
  3. 執行程式,將網頁原始碼導向到 crawler1.html,使用 firefox 開啟 crawler1.html 就可看到網頁內容。
    [dywang@dywmac zzz]$ ./crawler1.py > crawler1.html