Wednesday, April 30, 2008
Google Code Search lets you search for source code files by licence type, so of course I was interested in whether this could be used for quantifying indexable source code on the web. And luckily GCS lets you search for all works with a given licence. (If you don't understand why that's a big deal, try doing a search for all Creative Commons licensed work using Google Search.) Even better, using the regex facility you can search for all works! You sure as heck can't do that with a regular Google web search.
Okay, so here's the latest results, including hyperlinks to searches for you to try them yourself:
And here's a spreadsheet with graph included: However, note the discontinuity (in absolute and trend terms) between approximate and specific results in that (logarithmic) graph, which suggests Google's approximations are not very good.
Okay, so here's the latest results, including hyperlinks to searches for you to try them yourself:
- all (by regex: .*) : 36,700,000
- gpl : 8,960,000
- lgpl : 4,640,000
- bsd : 3,110,000
- mit : 903,000
- cpl : 136,000
- artistic : 192
- apache : 156
- disclaimer : 130
- python : 108
- zope : 103
- mozilla : 94
- qpl : 86
- ibm : 67
- sleepycat : 51
- apple : 47
- lucent : 19
- nasa : 15
- alladin : 9
And here's a spreadsheet with graph included: However, note the discontinuity (in absolute and trend terms) between approximate and specific results in that (logarithmic) graph, which suggests Google's approximations are not very good.
Labels: ben, free software, licensing, quantification, search
Trawling the web in search of copyright knowledge, as one does from time to time, I came across an interesting post on the Copyright Agency Limited (CAL) site. CAL collects funds under the statutory licence scheme that is provided in the Australian Copyright Act and, according to its website, there are a number of corporations and individuals who are missing out on their moolah. Could you be one of them? Head here to find out.
Sadly this blogger is not in the money, but there are some interesting names on the lists. Some make perfect sense - for example, Australian artist Pro Hart. Then there are the numerous estates who are owed money, including the estate of Ernest Hemingway, AA Milne, and Australian architect Harry Seidler. Then there are the more unusual 'publishers' -for example, eBay Australia & New Zealand and Air Caledonie International. I'm not sure what Air Caledonie has published or who's reproduced it, but I want to go to New Caledonia after visiting that website.
Sadly this blogger is not in the money, but there are some interesting names on the lists. Some make perfect sense - for example, Australian artist Pro Hart. Then there are the numerous estates who are owed money, including the estate of Ernest Hemingway, AA Milne, and Australian architect Harry Seidler. Then there are the more unusual 'publishers' -for example, eBay Australia & New Zealand and Air Caledonie International. I'm not sure what Air Caledonie has published or who's reproduced it, but I want to go to New Caledonia after visiting that website.
Monday, April 28, 2008
Regular readers will know of my interest in all-things-Crown-copyright, so I have come out of my blogging hiatus* to let you all know that last week argument in the appeal of Copyright Agency Limited v State of New South Wales was heard before five members of the High Court (Gleeson CJ, Gummow, Heydon, Crennan, and Kiefel JJ). As you may recall, this case considers whether the Copyright Agency Limited (CAL) can collect money from the NSW Government for the use of certain copyright-protected surveyor plans. The Full Court of the Federal Court of Australia found that CAL could not collect on these plans on the basis an implied licence exists, permitting the NSW Government to do everything it needs to in relation to the plans, as dictated by statute.
A transcript for the hearing can be found on AustLII here. I will get some comments up within the next week.
* Self-imposed in a desperate attempt to actually write my thesis, and I am pleased to report that it's going well, in case my supervisors are reading this.
A transcript for the hearing can be found on AustLII here. I will get some comments up within the next week.
* Self-imposed in a desperate attempt to actually write my thesis, and I am pleased to report that it's going well, in case my supervisors are reading this.
Labels: catherine, Crown copyright
Tuesday, April 08, 2008
Hi commons researchers,
I just did this analysis of Google's and Yahoo's capacities for search for commons (mostly Creative Commons because that's in their advanced search interfaces), and thought I'd share. Basically it's an update of my research from Finding and Quantifying Australia's Online Commons. I hope it's all pretty self-explanatory. Please ask questions. And of course point out flaws in my methods or examples.
Also, I just have to emphasise the "No" in Yahoo's column in row 1: yes, I am in fact saying that the only jurisdiction of licences that Yahoo recognises is the US/unported licences, and that they are in fact ignoring the vast majority of Creative Commons licences. (That leads on to a whole other conversation about quantification, but I'll leave that for now.)
(I've formatted this table in Courier New so it should come out well-aligned, but who knows).
Feature | Google | Yahoo |
------------------------------+--------+-------+
1. Multiple CC jurisdictions | Yes | No | (e.g.)
2. 'link:' query element | No | Yes | (e.g. G, Y)
3. RDF-based CC search | Yes | No | (e.g.)
4. meta name="dc:rights" * | Yes | ? ** | (e.g.)
5. link-based CC search | No | Yes | (e.g.)
6. Media-specific search | No | No | (G, Y)
7. Shows licence elements | No | No | ****
8. CC public domain stamp *** | Yes | Yes | (e.g.)
9. CC-(L)GPL stamp | No | No | (e.g.)
* I can't rule out Google's result here actually being from <a rel="license"> in the links to the license (as described here: http://microformats.org/wiki/rel-license).
** I don't know of any pages that have <meta name="dc:rights"> metadata (or <a rel="licence"> metadata?) but don't have links to licences.
*** Insofar as the appropriate metadata is present.
**** (i.e. doesn't show which result uses which licence)
Notes about example pages (from rows 1, 3-5, 8-9):
I just did this analysis of Google's and Yahoo's capacities for search for commons (mostly Creative Commons because that's in their advanced search interfaces), and thought I'd share. Basically it's an update of my research from Finding and Quantifying Australia's Online Commons. I hope it's all pretty self-explanatory. Please ask questions. And of course point out flaws in my methods or examples.
Also, I just have to emphasise the "No" in Yahoo's column in row 1: yes, I am in fact saying that the only jurisdiction of licences that Yahoo recognises is the US/unported licences, and that they are in fact ignoring the vast majority of Creative Commons licences. (That leads on to a whole other conversation about quantification, but I'll leave that for now.)
(I've formatted this table in Courier New so it should come out well-aligned, but who knows).
Feature | Google | Yahoo |
------------------------------+--------+-------+
1. Multiple CC jurisdictions | Yes | No | (e.g.)
2. 'link:' query element | No | Yes | (e.g. G, Y)
3. RDF-based CC search | Yes | No | (e.g.)
4. meta name="dc:rights" * | Yes | ? ** | (e.g.)
5. link-based CC search | No | Yes | (e.g.)
6. Media-specific search | No | No | (G, Y)
7. Shows licence elements | No | No | ****
8. CC public domain stamp *** | Yes | Yes | (e.g.)
9. CC-(L)GPL stamp | No | No | (e.g.)
* I can't rule out Google's result here actually being from <a rel="license"> in the links to the license (as described here: http://microformats.org/wiki/rel-license).
** I don't know of any pages that have <meta name="dc:rights"> metadata (or <a rel="licence"> metadata?) but don't have links to licences.
*** Insofar as the appropriate metadata is present.
**** (i.e. doesn't show which result uses which licence)
Notes about example pages (from rows 1, 3-5, 8-9):
- To determine whether a search engine can find a given page, first look at the page and find enough snippets of content that you can create a query that definitely returns that page, and test that query to make sure the search engine can find it (e.g. '"clinton lies again" digg' for row 8). Then do the same search as an advanced search with Creative Commons search turned on and see if the result is still found.
- The example pages should all be specific with respect to the feature they exemplify. E.g. the Phylocom example from row 9 has all the right links, logos and metadata for the CC-GPL, and particularly does not have any other Creative Commons licence present, and does not show up in search results.
Labels: ben, Creative Commons, quantification, search