Cyberspace Law and Policy Centre, University of New South Wales
Unlocking IP  |  About this blog  |  Contact us  |  Disclaimer  |  Copyright & licencsing  |  Privacy

Monday, May 19, 2008

 

What's an Australian web page?

I said recently that defining the Australian web is an issue in itself. I thought I'd say a little more about how the National Library's crawls handled the issue.

First, the National Library's crawls were outsourced to the Internet Archive, which is a good thing - it's been done well, the data is in a well defined format (a few sharp edges, but pretty good), and there's a decent knowledge-base out there already for accessing this data.

Now, there are two ways that IA chooses to include a page as Australian:
  1. domain name ends in '.au' (e.g. all web pages on the unsw.edu.au domain)
  2. IP address is registered as Australian in a geolocation database
Number 1 is simple, and number 2 complicated. Basically, IA is using another company's geolocation database, which uses things such as the path through the Internet to the server, who the Internet service provider is, and possibly who the domain name is registered to.

Actually, there is a third kind of page in the crawls. The crawls were done with a setting that included some pages linked directly from Australian pages (example: slashdot.org), though not sub-pages of these. I'll have to address this, and I can think of a few ways:

(Thanks to Alex Osborne and Paul Koerbin from the National Library for detailing the specifics for me)

Labels:


Comments: Post a Comment

Links to this post:

Create a Link



<< Home
 
 

This page is powered by Blogger. Isn't yours?