Monday, January 08, 2007

PythonでHTMLパース

BeautifulSoup

使い方
>>> import urllib2
>>> html=urllib2.urlopen('http://www.XXXXXXX.jp')
>>> from BeautifulSoup import BeautifulSoup as BF
>>> s = BF(html)
>>> list = s.body.findAll('a')
>>> list[3].string
u'hogehogehogehogehoge'
>>> list[3]['href']
u'http://XXXXXX/hoge/hoge'


メモ
>>> s.contents[2].contents[3].contents[1].contents[1].contents[1].name
u'a' #Tag name
>>> s.contents[2].contents[3].contents[1].contents[1].contents[1].string
u'hogehoge' #text
>>> s.h1
>>> s.a
>>> s.head
>>> s.body.next
>>> s.body.next.next.next.next.findAll('ul')

0 Comments:

Post a Comment

<< Home