Normalize URL path python

I had a problem with some URLs that we found on someones website. They looked like this: <a href=”http://example.com/../../../path/page.html”>, here is the same link: test. Notice that when you mouse over it, Firefox normalizes the URL so it looks correct. Using urlparse in Python: print urlparse.urljoin( ‘http://site.com/’, ‘/path/../path/.././path/./’ ) ‘http://site.com/path/../path/.././path/./’ How, poopy. So, we need to do better than that. The os module […]

Continue Reading

Beautiful soup findall CSS files

BeautifulSoup is a python class that takes HTML and returns a tree of objects. You can then search that tree to find HTML tags (which is pretty mega easy.) Here is a little example that downloads the msn.com web page, parses it using BS and grabs all the LINK tags that are linking to a CSS file: from BeautifulSoup import BeautifulSoup import urllib2url = “http://www.msn.com” request = urllib2.Request(url) […]

Continue Reading