Beautiful soup findall CSS files

BeautifulSoup is a python class that takes HTML and returns a tree of objects. You can then search that tree to find HTML tags (which is pretty mega easy.)

Here is a little example that downloads the web page, parses it using BS and grabs all the LINK tags that are linking to a CSS file:

from BeautifulSoup import BeautifulSoup
import urllib2url = “”

request = urllib2.Request(url)
opener = urllib2.build_opener()
f =

print ‘-‘*5,‘URL Info:’,‘-‘*5
print ‘-‘*15

html =
soup = BeautifulSoup(html)
css_files = soup.findAll(‘link’,{‘rel’:‘stylesheet’})

for css in css_files:
print str(css)

This outputs (a lot of page header stuff which I have removed. Page header stuff is interesting, it shows all the POST vars, cookies, cache info etc… so I thought I’d keep it in my little example), anyway BS finds one CSS file:

<link type=”text/css” rel=”stylesheet” id=”csslink” href=”” />

The magic method in BS that does the work is findAll(), this takes a tag name as the first parameter, the second parameter is a dictionary of attribute name and value pairs. So in the example above, we are looking for tag link that contains rel=”stylesheet”.

Running this script on returns:

<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />
<link rel=”stylesheet” type=”text/css” href=”” />

Hope you like 🙂