Help:Annotation

Revision as of 01:10, 20 December 2006 by Centiare (talk | contribs)
Help:Contents

This page has been copied from the Semantic MediaWiki web site at Ontoworld. Please refer to that site for more detailed explanations and examples of Centiare's powerful semantic tagging facilities.

This page explains the basic annotation features for Semantic MediaWiki. Annotations are special markup-elements which allow users to make some parts of the directory's content explicit so that software tools can use them to assist other users. In particular, semantic annotations provide the basis for more powerful search functions within the directory. They also allow changes in data on one page to automatically propagate to other pages containing the same data (somewhat comparable with what can be done with templates). Users who are not familiar with the basics of editing MediaWiki should read Cheatsheet first.

Annotations in Semantic MediaWiki can be viewed as an extension of the existing system of categories in MediaWiki. Categories are a means to classify articles according to certain criteria. For example, by adding [[Category:Cities]] to an article, the page is tagged as describing a city. Software tools can use this information to generate an ordered list of all cities in a wiki, and thus help users to browse the information.

Semantic MediaWiki uses the category information, but provides further means of structuring the wiki:

  • Relations basically are "categories for links." They can be used to describe the meaning of a certain hyperlink between articles. For example, the link from the article Berlin to the article Germany might actually describe the relationship of being the capital of some country.
  • Attributes allow users to assign further information to an article by specifying data values for certain characteristic features. For example, Berlin could be associated with a population of 3,396,990.

Although these additions enable users to go beyond mere categorisation of articles, the usage and possible problems with using these features are very similiar to the existing category system. Since categories, relations, and attributes merely emphasize a particular part of an article's content, they are often called (semantic) annotations. Information that was provided in an article anyway, e.g. that Berlin is the capital of Germany, is now provided in a formal way accessible for software tools.

Categories

The main reference for the use of categories is the MediaWiki documentation on categories. Categories are used as universal "tags" for articles, describing that the article belongs to a certain group of articles. To add an article to a category "Example category", just write

[[Category:Example category]]

to the end of the article. The name of the category (here: "Example category") is arbitrary, but of course you should try to use categories that already exist instead of creating new ones. Every category has an own article, which can be linked to by writing [[:Category:Example category]]. The category's article can be empty, but it is strongly recommended to add a description that explains which articles should go into the category.

Categories can have many different interpretations. For example, the category "City" might comprise of all articles about cities, i.e. it describes that something is a city. Other categories, such as the category "Mathematics," might rather describe the topic area of an article. Many other interpretations exist. Semantic MediaWiki endorses this practical usage of categories: categories should be used to describe collections of articles that are considered useful or interesting for users. However, the advanced search functions of Semantic MediaWiki may make some categories superfluous, so that an SMW-enabled wiki might achieve a high degree of organization with far less categories.

Relations

Relations can be viewed as "categories for links." To understand the idea, consider the Wikipedia article on Berlin. This article contains many links to other articles, such as "Germany," "European Union," and "United States." However, the link to "Germany" has a special meaning: it was put there since Berlin is the capital of Germany. To make this knowledge available to computer programs, one would like to "tag" the link

[[Germany]]

that is given in the article text, saying that this is a link that describes a "capital-relationship." With Semantic MediaWiki, this is done by writing

[[capital of::Germany]].

In the article, this text still is displayed as a simple hyperlink to "Germany." The additional text "is capital of" is the name of the relation that we use to classify the link to Germany. As in the case of categories, you are free to use any label that you like to describe a link, but it is useful to re-use relations that already appear elsewhere.

To simplify this re-use, every relation has its own article, where its proper usage can be described. You can search through these articles with the Special:Search page to find existing relations. The titles of relation articles are prefixed with "Relation::" to distinguish them from other articles. Creating these articles is optional, but it greatly helps others to find and apply your relation.

There are various ways of adding relations between two pages:

What it does What you type
Classify a link with the relation "example relation."
 Classify a [[example relation::link]] with the relation "example relation."
Use an alternative text for a classified link.
Use an [[example relation::link|alternative text]] for a classified link.
To make an ordinary link with two colons without creating a relation to another article, escape the markup with a colon in front, e.g. std::out.
To make an ordinary link with two colons without creating a relation to another article,
escape the markup with a colon in front, e.g. [[:std::out]].

Attributes

There are many statements that one cannot easily annotate with relations and categories alone. For example, to say that Berlin has a population of 3,396,990, one would not give a typed link [[has population::3,396,990]] simply because an article "3,396,990" does not make much sense. Yet, one would like Semantic MediaWiki to create a list of all German cities, ordered by number of inhabitants. This "ordering by number" is different from the lexicographic order that one would expect for article names. For example, in the lexicographic order, "1,000,000" is smaller than "345" (in the same way that "Alphabet" is earlier than "Order" in a dictionary).

So we have two requirements:

  1. state that Berlin has a population of 3,396,990 without creating a link to "3,396,990" and
  2. tell the wiki software that population should be treated as a number, not as a text label or anything else.

The first is achieved by writing in the article on Berlin the text

[[population:=3,396,990]]

The only difference to a relation is that we write ":=" instead of "::" as before. The number 3,396,990 now appears as normal text and no link is created. The label "population" again is our free choice. We could have used any other text as well. As in the case of relations, our attribute "population" gets an own article where we can add descriptions for other users. The article name starts with "Attribute:", i.e. the article is called "Attribute:Population" in our case.

We still have to say that "population" is a number. Semantic MediaWiki knows a number of different datatypes that we can choose for attributes. In our case, the type is called Type:Integer. The prefix "Type:" is again a separate namespace that distinguishes descriptive articles about types from normal pages. What we want to say is that the attribute population has the type integer, i.e. that the two things have a special relation. As with all relations, this is stated in the population's article Attribute:population. There, we write

[[has type::Type:integer]]

to say that the special relation "has type" holds between Attribute:population and Type:integer. Semantic MediaWiki knows a number of special relations like Relation:has type. While these relations can also be documented in their own articles, they have a special built-in meaning are not evaluated like other relations.

Datatypes are very important for evaluating attributes. Firstly, the datatype determines how tools should handle the given values, e.g. for sorting search results. Secondly, the datatype is required to understand which values have the same meaning, e.g. the values "1532", "1,532", and "1.532e3" all encode the same number. Finally, some datatypes offer special functions, as will be described below. For these reasons, every attribute must have a datatype. If no datatype was defined, an annotated article will still be displayed correctly, but the semantic annotation cannot be exploited until an attribute is given and the annotated article is saved again. Likewise, changing the type of an attribute later on does not affect the annotations of existing articles until they are modified and stored the next time.

The most important mark-up elements for attributes are

What it does What you type
Assign the value 1,234,567 to the attribute "example."
Assign the value [[example:=1,234,567]] to the attribute "example."
Assign a value of about a million, but showing a different text in the article.
Assign a value of [[example:=999,331|about a million]], 
but showing a different text in the article.
Escaping annotations: in Pascal, variable assignments use the operator :=.
In Pascal, variable assignments use the [[:operator :=]].
Giving the type in an attributes article:
This attribute is an integer number.
Giving the type in an attributes article:
This attribute is an [[has type::Type:Integer|integer number]].
Combining MediaWiki markup with attribute values:
John's email address is john@mailinator.com
Hint: Use a template for this.
Combining MediaWiki markup with attribute values:
John's email address is 
[[email:=john@mailinator.com|[mailto:john@mailinator.com john@mailinator.com]]].

Datatypes and units of measurement

Using different types, attributes can be used to describe very different properties. A complete list of available types is available from Special:Types. Basic types include:

These can be used creatively for very different purposes. For instance, attributes of type string can be used for encoding phone numbers (which in fact can contain non-numeric symbols).

Type:float allows units, but only as distinction. Besides, the message "this attribute supports no unit conversion" clutters query results.

To allow automatic conversion, use custom units instead.

Special types

There are some special built-in types which support more complicated formats.

  • Type:Temperature can't be user-defined since converting temperature units is more complicated than multiplying by a conversion factor.
  • Type:Geographic coordinate describes geographic locations. It includes functions for recognizing different forms of geographic coordinates, and it dynamically provides links to online map services.
  • Type:Date specifies particular points in time. This type is still somewhat experimental, but may feature complex conversions between (historic) calendar models in the future.

For specifying URLs and emails, there are some special variations of the string type:

  • Type:URL and Type:URI both just seem to work like Type:String: when a value of this type is produced in a query it does not work as a link.
  • Type:Annotation URI: attributes of this type are interpreted as relations to external objects, denoted by the URI. They are special since they are interpreted as annotation properties on export. See the type page for documentation. Again, when a value of this type is produced in a query it does not work as a link.
  • Type:Email stores emails as a string datavalue, but automatically links them (with mailto:) within the page.

Semantic templates

It is possible to embed semantic annotations into MediaWiki templates. This can help to simplify syntax for the users, to support the consistent usage of annotations, and to quickly obtain a great amount of semantic data by annotating existing templates. Read Help:Semantic templates for details.

Using a query to produce wikitext for annotations

If multiple pages P have an annotation P R Q for the same Q, corresponding annotations Q Rinv P can conveniently be produced with a query: [[Rinv::<ask sep="| ]][[Rinv::">[[R::Q]]</ask>| ]] This can then be copied from the rendered page to the edit box of Q. If applicable, namespace prefixes have to be added.

For example, [ [location of::<ask sep="| ]][[location of::">[[located in::California]]</ask>| ]] gives:

[ [location of::<ask sep="| ]]">[[located in::CaliforniaProperty "Location of" (as page type) with input value "">[[located in::California" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.</ask>| ]]

The result is copied from the rendered page to the edit box of California, where the space between the first two brackets (put as a convenient "nowiki") is removed. Also user pages are either removed, or the prefix "user:" is provided.