The House of Commons: Semantic Web Search for Commons: An Introduction

Tuesday, October 10, 2006

Semantic Web Search for Commons: An Introduction

As part of my research, I’ve been thinking about the problem of searching the Internet for commons content (for more on the meaning of ‘commons’, see Catherine’s post on the subject). This is what I call the problem of ‘semantic web search for commons’, and in this post I will talk a little about what that means, and why I’m using ‘semantic’ with a small ‘s’.

When I say 'semantic web search', what I’m talking about is using the specific or implied metadata about web documents to allow you to search for specific classes of documents. And of course I am talking specifically about documents with some degree of public rights (i.e. reusability). But before I go any further, I should point out that I’m not talking about the Semantic Web in the formal sense. The Semantic Web usually refers to the use of Resource Description Framework (RDF), Web Ontology Language (OWL) and Extensible Markup Language (XML) to express semantics (i.e. meaning) in web pages. I am using the term (small-s ‘semantic’) rather more generally, to refer more broadly to all statements that can meaningfully be made about web pages and other online documents, not necessarily using RDF, OWL or XML.

Commons content search engine

Now let me go a bit deeper into the problem. First, for a document to be considered 'commons' content (i.e. with enhanced public rights), there must be some indication, usually on the web and in many cases in the document itself. Second, there is no single standard on how to indicate that a document is commons content. Third, there is a broad spectrum of kinds of commons, which breaks down based on the kind of work (multi-media, software, etc.) and on the legal mechanisms (public domain, FSF licences, Creative Commons licences, etc.).

So let us consider the simplest case now, and in future posts I will expand on this. The case I shall consider first is that of a web page that is licensed with a Creative Commons (CC) licence, labelled as recommended by Creative Commons. This web page will contain embedded (hidden) RDF/XML metadata explaining its rights. This could be used as the basis for providing a preliminary semantic web search engine, by restricting results to only those pages that have the appropriate metadata for the search. This can then be expanded to include all other CC licences, and then the search interface can be expanded to include various categories such as 'modifiable' etc., and, in the case of CC, even jurisdiction (i.e. licence country). It is worth noting at this point that the details of each individual licence, specifically URL and jurisdiction, are data that essentially represent domain knowledge, meaning that they will have to be entered by an administrator of the search engine.

But wait there’s more

That’s it for now, but stay tuned for at least three more posts on this topic. In the next chapter, I will talk about the (many) other mechanisms that can be used to give something public rights. Then, I will consider non-web content, and the tricky issue of public domain works. Finally, I will look at where Google is at with these ideas, and consider the possible downfalls of a semantic web search engine as described in this series. Stay tuned.

Labels: ben, search

(permalink) posted by Ben Bildstein @ Tuesday, October 10, 2006

Comments: Post a Comment

Links to this post:

Create a Link

<< Home

Tuesday, October 10, 2006

Semantic Web Search for Commons: An Introduction

Contributors

On this page

Supporters

Archives

IP blogosphere