- select() 可以選擇 CSS 標籤,例如選擇 img 標籤。
[dywang@dywmac zzz]$ cat crawler10.py
#!/usr/bin/env python
# coding: utf-8
import bs4
htmlfile = open('node2.html')
soup = bs4.BeautifulSoup(htmlfile, 'lxml')
tag = soup.select('img')
print("tyep: ", type(tag))
print("tag: ", tag)
print("tag text: ")
for i in range(len(tag)):
print(tag[i].getText())
- 執行程式,使用 list 串列的 getText() 輸出結果都空白。
[dywang@dywmac zzz]$ ./crawler10.py
('tyep: ', <type 'list'>)
('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
tag text:
(”, 0)
(”, 1)
(”, 2)
(”, 3)
- 程式最後一行改成列出 list 串列的 index 才有輸出。
print(str(tag[i]))
[dywang@dywmac zzz]$ ./crawler10.py
('tyep: ', <type 'list'>)
('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
tag text:
('<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>', 0)
('<img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>', 1)
('<img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>', 2)
('<img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>', 3)
- 程式最後一行改成列出 list 串列的 get('src') 可以找到所有圖檔。
print(tag[i].get('src'))
[dywang@dywmac zzz]$ ./crawler10.py('tyep: ', <type 'list'>)
('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
tag text:
next.png
up.png
prev.png
contents.png