parsed.org
Python url regular explession by trick on Sep 19, 2008 10:57 PM

The following Python code is a regex that should fine any valid link. It does not, however, include punctuation at the end of the url:

SCHEMES = ('http', 'https', 'ftp', 'mailto', 'news', 'gopher',
'nntp', 'telnet', 'wais', 'prospero', 'aim', 'webcal')
# Note: fragment id is uchar | reserved, see rfc 1738 page 19
# %% for % because of string formating
# puncuation = ? , ; . : !
# if punctuation is at the end, then don't include it
URL_FORMAT = (r'(?<!\w)((?:%s):' # protocol + :
    '/*(?!/)(?:' # get any starting /'s
    '[\w$\+\*@&=\-/]' # reserved | unreserved
    '|%%[a-fA-F0-9]{2}' # escape
    '|[\?\.:\(\),;!\'](?!(?:\s|$))' # punctuation
    '|(?:(?<=[^/:]{2})#)' # fragment id
    '){2,}' # at least two characters in the main url part
    ')') % ('|'.join(SCHEMES),)

Code taken from the remark markup library.

pythonregexurl
RSS