What is Metadata?

Metadata can be defined as data about data. The goal of incorporating metadata into data sources is to enable the end-user to find items and contextually relevant information. Data sources are generally heterogeneous and can be unstructured, semi-structured, and structured. In the semantic Web, a data source is typically a document, such as a Web page, containing textual content or data. Of course, other types of resources may also include metadata information, such as records from a digital library. The first form of semantic data on the Web was metadata  information.

These basically include:

  • Means of creation of the data
  • Purpose of the data
  • Time and date of creation
  • Creator or author of data
  • Placement on a computer network where the data was created
  • Standards used

Example of Metadata:

A meta element specifies name and associated content attributes describing aspects of the HTML page. <meta name="keywords"content="wikipedia,encyclopedia"> Default charset for plain text is simply set with meta:<meta http-equiv="Content-Type" content="text/html charset=UTF-8" >

Types of Metadata

  1. Syntactic Metadata
  2. Structural Metadata
  3. Semantic Metadata
  4. Creating and Extracting Semantic Metadata

1. Syntactic Metadata

The simplest form of metadata is syntactic metadata. It describes non-contextual information about content and provides very general information, such as the documents size, location, or date of creation. Syntactic metadata attaches labels or tags to data. The following example shows syntactic metadata describing a document:

<name> = report.pdf

<creation> =30-09-2005

<modified> = 15-10-2005

<size> = 2048

2. Structural Metadata

Structural metadata provides information regarding the structure of content. It describes how items are put together or arranged. The amount and type of such metadata will vary widely with the type of document. For example, an HTML document may have a set of predefined tags, but these exist primarily for rendering purposes. Therefore, they are not very helpful in providing contextual information for content. Nevertheless, positional or structural placement of information within a document can be used to further embellish metadata (e.g., terms or concepts appear in a title may be give higher weight to that appearing in the body).

<!ELEMENT contacts (contact*)>

<!ELEMENT contact (name, birthdate)>


<!ELEMENT birthdate (#PCDATA)>

3. Semantic Metadata

Semantic metadata adds relationships, rules, and constraints to syntactic and structural metadata. This metadata describe contextually relevant or domain-specific information about content based on a domain specific metadata model or ontology, providing a context for interpretation. In a sense, they capture a meaning associated with the content. If a formal ontology is used for describing and interpreting this type of metadata, then it lends itself to machine process ability and hence higher degrees of automation.

4. Creating and Extracting Semantic Metadata

In order to extract optimal value from a document and make it usable, it needs to be effectively tagged by analyzing and extracting relevant information of semantic interest. Many techniques can be used to achieve this based on extracting syntactic and semantic metadata from documents (Sheth 2003). These include:

    • Semantic lexicons, nomenclatures, reference sets and thesauri
    • Document analysis
    • Ontologies