bs4 select

  1. select() 可以選擇 CSS 標籤,例如選擇 img 標籤。
    [dywang@dywmac zzz]$ cat crawler10.py
    #!/usr/bin/env python
    # coding: utf-8
    import bs4
    
    htmlfile = open('node2.html')
    soup = bs4.BeautifulSoup(htmlfile, 'lxml')
    tag = soup.select('img')
    print("tyep: ", type(tag))
    print("tag: ", tag)
    print("tag text: ")
    for i in range(len(tag)):
    	print(tag[i].getText())
    
  2. 執行程式,使用 list 串列的 getText() 輸出結果都空白。
    [dywang@dywmac zzz]$ ./crawler10.py
    ('tyep: ', <type 'list'>)
    ('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
    tag text: 
    (”, 0)
    (”, 1)
    (”, 2)
    (”, 3)
    
  3. 程式最後一行改成列出 list 串列的 index 才有輸出。
    	print(str(tag[i]))
    
    [dywang@dywmac zzz]$ ./crawler10.py
    ('tyep: ', <type 'list'>)
    ('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
    tag text: 
    ('<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>', 0)
    ('<img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>', 1)
    ('<img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>', 2)
    ('<img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>', 3)
    
  4. 程式最後一行改成列出 list 串列的 get('src') 可以找到所有圖檔。
    	print(tag[i].get('src'))
    
    [dywang@dywmac zzz]$ ./crawler10.py('tyep: ', <type 'list'>)
    ('tag: ', [<img align="BOTTOM" alt="next" border="0" height="24" src="next.png" width="37"/>, <img align="BOTTOM" alt="up" border="0" height="24" src="up.png" width="26"/>, <img align="BOTTOM" alt="previous" border="0" height="24" src="prev.png" width="63"/>, <img align="BOTTOM" alt="contents" border="0" height="24" src="contents.png" width="65"/>])
    tag text: 
    next.png
    up.png
    prev.png
    contents.png