bs4 find

  1. find() 找 h1 標籤。
    [dywang@dywmac zzz]$ cat crawler7.py 
    #!/usr/bin/env python
    # coding: utf-8
    import bs4
    
    htmlfile = open('node2.html')
    soup = bs4.BeautifulSoup(htmlfile, 'lxml')
    tag = soup.find('h1')
    print("tyep: ", type(tag))
    print("tag: ", tag)
    print("tag text: %s" % tag.text)
    
  2. 執行程式,找到標籤 h1,內容為「認識 Python」。
    [dywang@dywmac zzz]$ ./crawler7.py 
    ('tyep: ', <class 'bs4.element.Tag'>)
    ('tag: ', <h1><a name="SECTION00200000000000000000">
    認識 Python</a>
    </h1>)
    tag text: 
    認識 Python
    
  3. find_all() 找所有標籤 li,並列印內容。
    [dywang@dywmac zzz]$ cat crawler8.py 
    #!/usr/bin/env python
    # coding: utf-8
    import bs4
    
    htmlfile = open('node2.html')
    soup = bs4.BeautifulSoup(htmlfile, 'lxml')
    tag = soup.find_all('li')
    print("tyep: ", type(tag))
    print("tag: ", tag)
    print("tag text: ")
    for data in tag:
    	print(data.text)
    
  4. 執行程式,找到所有標籤 li。
    [dywang@dywmac zzz]$ ./crawler8.py 
    ('tyep: ', <type 'list'>)
    ('tag: ', [<li><a href="node3.html" name="tex2html184">簡介</a>
    </li>, <li><a href="node4.html" name="tex2html185">安裝與執行</a>
    </li>, <li><a href="node5.html" name="tex2html186">內縮語法</a>
    </li>, <li><a href="node6.html" name="tex2html187">括號、引號、換行、註解</a>
    </li>])
    tag text: 
    簡介
    
    安裝與執行
    
    內縮語法
    
    括號、引號、換行、註解