Help:Semantic search

Revision as of 16:31, 8 January 2007 by Centiare (talk | contribs) (Protected "Help:Semantic search" [edit=sysop:move=sysop])
Help:Contents

This page has been copied from the Semantic MediaWiki web site at Ontoworld. Please refer to that site for more detailed explanations and examples of Centiare's powerful semantic search facilities.

Semantic MediaWiki includes an easy-to-use query language which enables users to access the wiki's knowledge. The syntax of this query language is very similar to the syntax of annotations in Semantic MediaWiki. This query language can be used on the special page Special:Ask or in inline queries.

Naturally, answering queries requires additional resources, and the administrators of some sites can decide to switch off or restrict most of the features given below in order to ensure that even high-traffic sites can handle the additional load.

The documentation below applies to Semantic MediaWiki of version 0.6 and above, though basic features are available since 0.4.3.

Introduction

Queries can be written into the text field of Special:Ask. Among other things, all queries must state some conditions that describe what is asked for. For example, the query:

[[Category:Actor]]

is a query for all pages within the category "Actor", i.e. for all actors. When pressing "Find results" the query will be executed to return results as a simple list of all requested pages. If there are many results, they can be browsed via the navigations links at the top and bottom of the query results, as e.g. in the query for all persons on this wiki.

Much more complex queries are possible, but let us first explain some important things about queries. In general, a query is a request to find a number of pages that satisfy certain requirements. The query must answer two questions:

  1. Which pages are requested?
  2. What information should be displayed about those pages?

The first point is obvious. The second point is important to retrieve more knowledge. In the example above, one might be interested in all actors and their date of birth. This requires two steps: first find all actors; second print out their names and dates of birth. Both points are now explained independently in the sections below.

Page selection

In the example above, we gave the single condition [[Category:Actor]] to describe which pages we were interested in. The condition here is exactly what one would otherwise write to assert that some page is in the category Actor. The enclosing ask-tags invert its meaning to return all such pages (actually some more; but read on). This is a general scheme: The syntax for asking for pages that satisfy some condition is exactly the syntax for explicitly asserting that this condition holds.

The following queries show what this means:

  1. [[Category:Actor]] gives all pages directly or indirectly (through a sub-, subsub-, etc. category) in the category.
  2. [[born in::Boston]] gives all pages annotated as being about someone born in Boston.
  3. [[height:=180cm]] gives all pages annotated as being about someone having a height of 180cm.

By using other categories, relations, or attributes than above, we can already ask for pages which have certain annotations. Next let us combine those requirements:

<ask>[[Category:Actor]] [[born in::Boston]] [[height:=180cm]]</ask> 

asks for everybody who is an actor and was born in Boston and is 180cm tall. In other words: when many conditions are written into one query, the result is narrowed down to those pages that meet all the requirements. Thus we have a logical AND. By the way: queries can also include line-breaks in order to make them more readable. So we could as well write:

  [[Category:Actor]] 
  [[born in::Boston]] 
  [[height:=180cm]]

to get the same result as above. Note that queries only return the articles that are positively known to satisfy the required properties: if there is no attribute for the height of an actor, the actor will not be selected.

Wildcards and disjunctions

In the examples above, we gave very concrete conditions, using "Actor", "Boston", and "180cm" as fillers. It is possible to weaken these conditions in several ways.

Wildcards are written as "+" and allow any filler for a given condition. For example, [[born in::+]] returns all pages that have annotations for the relation "born in", and [[height:=+]] returns all pages that have been assigned some height. For categories, this feature makes little sense: [[Category:+]] just returns everything that has some category.

Disjunctions are written as "||" and allow queries to require (at least) one out of several possible fillers. For example, [[Category:Musical actor||Theatre actor]] retrieves everything that is a musical actor or a theatre actor. This also includes everything that is both, i.e. we really have a logical OR here. We can also specify a list of pages as relation target, e.g. [[born in::Boston||New York]] and a list of attribute values.

Subqueries

To ask for pages having a particular relation to any page in a more complex set, the latter can be written in the form of a query. In this case, instead of a concrete (list of) page names, one enters a new query enclosed in <q> and </q>. For instance, one can ask for all actors that are born in a Italian city by writing

[[Category:Actor]] [[born in::<q>[[Category:City]] [[located in::Italy]]</q>]]

Arbitrary levels of nesting are possible, though nesting might be restricted for a particular site to improve performance.

For another example, assume that we are interested in all cities of the European Union (as far as specified within this wiki). This is done by the following query:

  [[Category:Cities]]
  [[located in::<q>[[Category:Country]] [[member of::European Union]]</q>]]

(<ask limit="0" searchlabel="view results" default="no results within this wiki">

 [[located in:: European Union]]
 * * 

</ask>)

Asking for categories

Conditions with categories are generally simple, but they are more powerful than they might at first appear:

When searching for pages within a category, the result also involves all pages that are contained in subcategories of this category.

For example, assume that we have a category "Theatre actor" which is a subcategory of "Actor". Then the query [[Category:Actor]] will also return those "special" actors that are in the category "Theatre actor" only. This makes sense in many situations, but you can still view the pages that were directly put into the category actor by just going to the page of that category (by following the link [[:Category:Actor]]).

Conditions with attributes

With attribute values, we are usually not looking for exact results, but rather for entities that are included within a certain range. For example

[[Category:Actor]] [[height:=>6 ft]] [[height:=<7 ft]]

asks for all actors that are at least 6 feet and at most 7 feet tall. Here we take advantage of the automatic unit conversion: even if the height of the actor was set with [[height:=195cm]] it would be recognized as a correct answer (provided that a suitable datatype was chosen for height, see Help:custom units).

Such range conditions on attributes are mostly relevant for attributes with values that can be ordered in a natural way. For example, it makes sense to ask [[start date:=>May 6 2006]] but is is not really helpful to say [[homepage URL:=>http://www.somewhere.org]].

If a datatype has no natural linear ordering, Semantic MediaWiki will just apply the alphabetical order to the normalized datavalues as they are used in the RDF export.

Direct conditions on pages

So far, all conditions depended on some or the other annotation given within an page. But there are also conditions that directly select some pages, or pages from a given namespace.

By directly giving some pagename (possibly including a namespace prefix), or a list of such pagenames separated by ||, the existing pages with those names are selected. Note that the namespace prefixes are not displayed, see the hover box or status bar of the browser, or follow the links. Restricting the set based on an attribute value one could ask, e.g., "Who of Bill Murray, Dan Aykroyd, Harold Ramis and Ernie Hudson is taller than 7ft?". But direct selection of articles is most useful if further properties of those articles are asked for, as is described below.

In the case of categories, it is necessary (as known from normal pages) to put a ":" before the page name to prevent confusion of the conditions [[Category:Actor]] (return all actors) and [[:Category:Actor]] (return the category "Actor").

A less strict way of selecting given pages is via namespaces. This is done by using wildcards in the selected pages, e.g. by writing

[[Help:+]]

to return every page in the "Help" namespace. Since the main namespace usually has no prefix, one must write [[:+]]. In the case of categories, an additional ":" is again needed in front of the namespace label to prevent confusion.

Data to be displayed

Simple queries using conditions as above will merely return a list of pages. To display properties of these pages, one adds statements such as [[height:=*]] to show the height (if any) of the selected pages. Using "*" as a filler indicates that this code does not specify a condition for the selection of pages, but specifies what should be displayed about the selected pages. Thus, we can also write

  • [[born in::*]] to show all pages that have a "born in" relation to the result page,
  • [[Category:*]] to show all categories that the result page has directly been stated to be in.

Even if there are no "born in" relations for a page, the page is still in the selection, and an empty field will be printed. Likewise, if some article has been assigned many different vaules for one property, all of them will be displayed.

For attributes that support units, queries can also determine which unit should be used for the output. For example

[[height:=*cm]]

returns the values of the attibute height converted into cm.

Currently, every attribute is only displayed once, even if several different units are asked for. This is a bug that will be fixed in the future.

Sorting results

Special:Ask has a special field for ordering results according to some attribute. This requires all selected pages to have a value for this attribute, and thus the query must impose this additional restriction. For example, in order to sort the countries in this wiki by population, the following is needed:

  • the query should be [[Category:Country]] [[population:=+]]
  • the input field "sort by column" should contain "population"

(<ask order="population" limit="0" searchlabel="view results" default="no results in this wiki"> +</ask>)

Ascending or descending order can be chosen.

At the moment, the condition [[population:=+]] is crucial for this to work. This might change in future versions.

Using templates and variables

Within a query, arbitrary |templates and variables can be used. This can be used to create a standard query that displays all future events (where "future" gets its meaning from the current date):

 [[Category:Event]]
 [[end date:=>{{CURRENTYEAR}}-{{CURRENTMONTH}}-{{CURRENTDAY}}]]

Many other uses are possible, especially when using queries inline. However, it is in no case possible to use template parameters (the things in {{{ }}}) within a query. Sorry.

Another very useful variable for inline queries is {{PAGENAME}} which allows you to customise a query that is used unchanged on many pages. Read about inline queries for more information.