This is a particularly tasty utility that will tell you what a site is (and has been) running along with hosting and DNS lookup information. Just replace www.google.com with the URL of your choosing.
http://toolbar.netcraft.com/site_report?url=http://www.google.com/
diagnosticdnsonlinetoolurlutilities
The following Python code is a regex that should fine any valid link. It does not, however, include punctuation at the end of the url:
SCHEMES = ('http', 'https', 'ftp', 'mailto', 'news', 'gopher',
'nntp', 'telnet', 'wais', 'prospero', 'aim', 'webcal')
# Note: fragment id is uchar | reserved, see rfc 1738 page 19
# %% for % because of string formating
# puncuation = ? , ; . : !
# if punctuation is at the end, then don't include it
URL_FORMAT = (r'(?<!\w)((?:%s):' # protocol + :
'/*(?!/)(?:' # get any starting /'s
'[\w$\+\*@&=\-/]' # reserved | unreserved
'|%%[a-fA-F0-9]{2}' # escape
'|[\?\.:\(\),;!\'](?!(?:\s|$))' # punctuation
'|(?:(?<=[^/:]{2})#)' # fragment id
'){2,}' # at least two characters in the main url part
')') % ('|'.join(SCHEMES),)
Code taken from the remark markup library.
pythonregexurl
If you have a URL that you need to crawl and you know the range of numbers in the image, you can do something like this:
$ curl -O http://www.example.com/img/samples[00-99].jpg
That should (at least attempt to) fetch the images samples00.jpg through samples99.jpg. Enjoy!
If you're using more than one range, you'll want to build your filename or a path with the --create-dirs option. For example:
$ curl http://www.example.com/imgs[00-99]/samples[00-27].jpg --create-dirs -o "#1/#2.jpg"
Alternatively, you can just be ghetto and name the files like dirname_filename.jpg:
$ curl http://www.example.com/images[00-99]/samples[00-27].jpg -o "#1_#2.jpg"
commandscurlextrasimageshellurl