Cyberspace Law and Policy Centre, University of New South Wales
Unlocking IP  |  About this blog  |  Contact us  |  Disclaimer  |  Copyright & licencsing  |  Privacy

Thursday, May 01, 2008

 

Get a list of all (indexable) URLs on a site from the Wayback Machine

Earlier this year I complained about the problem with search engines. Today, Alexander Osborne (from the National Library of Australia) corrected me, at least a little bit.

I said I'd like to see an interface that (among other nice-to-haves) answers questions like "give me everything you've got from cyberlawcentre.org/unlocking-ip", and it turns out that that's actually possible with The Wayback Machine. Not in a single request, that I know of, but with this (simple old HTTP) request: http://web.archive.org/web/*xm_/www.cyberlawcentre.org/unlocking-ip/*, you can get a list of all URLs under cyberlawcentre.org/unlocking-ip, and then if were to want to you could do another HTTP request for each URL.

Pretty cool actually, thanks Alex.

Now I wonder how big you can scale those requests up to... I wonder what happens if you ask for www.*? Or (what the heck, someone has to say it) just '*'. I guess you'd probably break the Internet...

Labels: ,


Comments: Post a Comment

Links to this post:

Create a Link



<< Home
 
 

This page is powered by Blogger. Isn't yours?