儲存網頁

  1. 之前使用輸出導向,將網頁內容存檔,現在直接在程式中以 Response 物件的 iter_content() 存檔,每次儲存 5120 Bytes。
    [dywang@dywmac zzz]$ vim crawler4.py
    [dywang@dywmac zzz]$ cat crawler4.py
    #!/usr/bin/env python
    # coding: utf-8
    
    import sys
    reload(sys)
    sys.setdefaultencoding( "utf-8" )
    
    import requests
    
    url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/'
    
    headers = {
    	'User-Agent': 'Mozilla/5.0',
    	'From': 'dywang@csie.cyut.edu.tw'
    }
    htmlfile = requests.get(url, headers=headers)
    try:
    	htmlfile.raise_for_status()
    	print("Connected successfully")
    except Exception as err:
    	print("Failed: %s" % err)
    
    of = 'crawler4.txt'
    with open(of, 'wb') as outfile:
    	for ram in htmlfile.iter_content(5120):
    		outfile.write(ram)
    		print(len(ram))
    	print("save to %s" % of)
    
  2. 執行程式,儲存迴圈第一次儲存 5120 bytes,第二次儲存 4889 bytes。
    [dywang@dywmac zzz]$ ./crawler4.py 
    Connected successfully
    5120
    4889
    save to crawler4.txt