Normalize URL path python

I had a problem with some URLs that we found on someones website. They looked like this:

<a href=””>, here is the same link: test. Notice that when you mouse over it, Firefox normalizes the URL so it looks correct.

Using urlparse in Python:

print urlparse.urljoin( ‘’, ‘/path/../path/.././path/./’ )

How, poopy.

So, we need to do better than that. The os module has a path notmalizer in it, but this would go squiffy when run on a Windows box because it will translate all the / to \. But there is a module which we can use:

import urlparse
import posixpath

def join(base,url):
join = urlparse.urljoin(base,url)
url = urlparse.urlparse(join)
path = posixpath.normpath(url[2])
return urlparse.urlunparse(

# strange paths
print join( ‘’‘/path/../path/.././path/./’ )
print join( ‘’‘/path/../path/.././path/./y.html’ )

# paths that .. up too far
print join( ‘’‘../../../../path/’ )
print join( ‘’‘../../../../path/moo.html’ )

# how path and base path combine
print join( ‘’‘1/2/3/moo.html’ )
print join( ‘’‘../1/2/3/moo.html’ )

This prints:

Cool, huh?