Beautiful Soup
A python HTML/XML parser designed for quick turnaround of projects like screen-scraping.
Links
Install
pip install beautifulsoup
Note: For earlier versions of python, it might be best to install:
pip install beautifulsoup==3.0.8
… for more information see Having problems with Beautiful Soup 3.1.0?
Sample
For more details, see Documentation:
html = '...
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
soup.find('label')
>>> <label for="id_name">Place name</label>
soup.findAll('label')
>>> [<label for="id_name">Place name</label>, <label>Place name</label>]
soup.findAll(id='id_name')
>>> [<input name="name" value="East Anstey" class="textInput" maxlength="45" type="text" id="id_name" />]
Attributes
soup = BeautifulSoup(html)
# get the first element
element = soup.contents[0]
# copy the elements to a dict
dict(element.attrs)
Text
soup.findAll(text='ABC')