[dywang@dywmac zzz]$ cat crawler7.py
#!/usr/bin/env python
# coding: utf-8
import bs4
htmlfile = open('node2.html')
soup = bs4.BeautifulSoup(htmlfile, 'lxml')
tag = soup.find('h1')
print("tyep: ", type(tag))
print("tag: ", tag)
print("tag text: %s" % tag.text)
[dywang@dywmac zzz]$ ./crawler7.py
('tyep: ', <class 'bs4.element.Tag'>)
('tag: ', <h1><a name="SECTION00200000000000000000">
認識 Python</a>
</h1>)
tag text:
認識 Python
find_all() 找所有標籤 li,並列印內容。
[dywang@dywmac zzz]$ cat crawler8.py
#!/usr/bin/env python
# coding: utf-8
import bs4
htmlfile = open('node2.html')
soup = bs4.BeautifulSoup(htmlfile, 'lxml')
tag = soup.find_all('li')
print("tyep: ", type(tag))
print("tag: ", tag)
print("tag text: ")
for data in tag:
print(data.text)
[dywang@dywmac zzz]$ ./crawler8.py
('tyep: ', <type 'list'>)
('tag: ', [<li><a href="node3.html" name="tex2html184">簡介</a>
</li>, <li><a href="node4.html" name="tex2html185">安裝與執行</a>
</li>, <li><a href="node5.html" name="tex2html186">內縮語法</a>
</li>, <li><a href="node6.html" name="tex2html187">括號、引號、換行、註解</a>
</li>])
tag text:
簡介
安裝與執行
內縮語法
括號、引號、換行、註解