[dywang@dywmac zzz]$ vim crawler1.py [dywang@dywmac zzz]$ cat crawler1.py #!/usr/bin/env python # coding: utf-8 import requests url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/' htmlfile = requests.get(url) print(type(htmlfile)) [dywang@dywmac zzz]$ ./crawler1.py <class 'requests.models.Response'>
status_code 代表是否成功取得網頁,text 為網頁內容。因為範例網頁有中文,所以必須使用 sys.setdefaultencoding 設定預設編碼為 utf-8,才能將網頁原始碼導向。
[dywang@dywmac zzz]$ vim crawler1.py
[dywang@dywmac zzz]$ cat crawler1.py
#!/usr/bin/env python
# coding: utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
import requests
url = 'http://dywang.csie.cyut.edu.tw/dywang/rhce7/'
htmlfile = requests.get(url)
if htmlfile.status_code == requests.codes.ok:
print("Obtained web text successfully")
else:
print("Failed to get web text")
print(htmlfile.text)
[dywang@dywmac zzz]$ ./crawler1.py > crawler1.html