bs4 find

find() 找 h1 標籤。

[dywang@dywmac zzz]$ cat crawler7.py 
#!/usr/bin/env python
# coding: utf-8
import bs4

htmlfile = open('node2.html')
soup = bs4.BeautifulSoup(htmlfile, 'lxml')
tag = soup.find('h1')
print("tyep: ", type(tag))
print("tag: ", tag)
print("tag text: %s" % tag.text)

執行程式，找到標籤 h1，內容為「認識 Python」。

[dywang@dywmac zzz]$ ./crawler7.py 
('tyep: ', <class 'bs4.element.Tag'>)
('tag: ', <h1><a name="SECTION00200000000000000000">
認識 Python</a>
</h1>)
tag text: 
認識 Python

find_all() 找所有標籤 li，並列印內容。

[dywang@dywmac zzz]$ cat crawler8.py 
#!/usr/bin/env python
# coding: utf-8
import bs4

htmlfile = open('node2.html')
soup = bs4.BeautifulSoup(htmlfile, 'lxml')
tag = soup.find_all('li')
print("tyep: ", type(tag))
print("tag: ", tag)
print("tag text: ")
for data in tag:
	print(data.text)

執行程式，找到所有標籤 li。

[dywang@dywmac zzz]$ ./crawler8.py 
('tyep: ', <type 'list'>)
('tag: ', [<li><a href="node3.html" name="tex2html184">簡介</a>
</li>, <li><a href="node4.html" name="tex2html185">安裝與執行</a>
</li>, <li><a href="node5.html" name="tex2html186">內縮語法</a>
</li>, <li><a href="node6.html" name="tex2html187">括號、引號、換行、註解</a>
</li>])
tag text: 
簡介

安裝與執行

內縮語法

括號、引號、換行、註解