You are currently browsing the category archive for the ‘Semantic Web’ category.

Elsevier, a world-leading provider of scientific, technical and medical information products and services, announced the winners of the 2012 Semantic Web Challenge (SWC). Determined by a jury of leading experts from both academia and industry, winners were announced at the International Semantic Web Conference held in Boston, MA, Nov 2012. The challenge and allocated prizes were sponsored by Elsevier.

Of this year’s 24 submissions, the panel of experts selected 4 Open Track Challenge winners and 1 Billion Triples Track winner.

Open Track challenge:

  • 1st prize: “Event Media” by Houd Khrouf, Vuk Milicic and Raphaël Troncy from EURECOM, Sophia Antipolis, France – This application demonstrates how to use semantic web technology to more efficiently and easily integrate multiple online and social media content sources that evolve over time. (watch this video and have a look to this paper)
  • 2nd prize:  “Semantic Processing of Urban Data”, by S. Kotoulas, V. Lopez, R Lloyd, M. Sbodid, F. Lecue, M. Stephenson, E. Daly, V. Bicer, A. Gkoulalas-Divanis, G. Di Lorenzo, A. Schumann and P. Aonghusa from IBM research’s Smart Cities Team –  The mayor of Dublin wanted to know why his ambulances were perennially late so this team ‘knowledge mined’ hundreds of information sources emanating from the city ranging from usual twitter feeds to garbage collection tags and many more to help the mayor improve city service.
  • 3rd prize: jointly awarded to: “Open Self Medication” by Olivier Curé of Universite Paris-Est, LIGM, CNRS – This application advises on self-medication, using the Linked Open Data cloud to mine contra indications for various over the counter medications and adds to these where they were missing. A mobile geo location price comparison tool enables users to find nearby pharmacies that sell the cheapest drugs, enabling French health care insurance companies to reduce their costs; “Wildfire Monitoring” by K. Kyzirakos, M. Karpathiotakis, G. Garbis, C. Nikoladu, K. Bereta, I Papatousis, T. Herekakis, D. Michail , M. Koubarakis and C. Kontoes from the National and Kapodistrian University of Athens, National Observatory of Athens and the Harokopeio University of Athens – This application combines multimedia satellite images with ontologies and Linked Geospatial Data to improve the wildfire monitoring service used by the Greek civil protection agencies, military, and firefighters.

Billion Triples Track challenge:

  • Exploring the linked data cloud,“ by X. Zhang, D. Song, S.Priya, Z. Daniels, K. Reynolds and J. Heflin of Lehigh University, USA. This system allows users to understand how massive data sets are populated and reveals patterns in within these data sets.

The availability of inference services in the Semantic Web context is fundamental for performing several tasks such as the consistency check of an ontology, the construction of a concept taxonomy, the concept retrieval etc.

Currently, the main approach used for performing inferences is deductive reasoning. In traditional Aristotelian logic, deductive reasoning is defined as the inference in which the (logically derived) conclusion is of no greater generality than the premises. Other logic theories define deductive reasoning as the inference in which the conclusion is just as certain as the premises. The conclusion of a deductive inference is necessitated by the premises: the premises cannot be true while the conclusion is false. Such characteristics of deductive reasoning are the reason of its usage in the SW. Indeed computing class hierarchy as well as checking ontology consistency require certain and correct results and do not need of high general conclusions with respect to the premises.

Conversely, tasks such as ontology learning, ontology population by assertions, ontology evaluation, ontology mapping and alignment require inferences that are able to return higher general conclusions with respect to the premises. To this end, inductive learning methods, based on inductive reasoning, could be effectively used. Indeed, inductive reasoning generates conclusions that are of greater generality than the premises, even if, differently from the deductive reasoning, such conclusions have less certainty than the premises. Specifically, in contrast to the deduction, the starting premises of the induction are specific (typically facts or examples) rather than general axioms. The goal of the inference is to formulate plausible general assertions explaining the given facts and that are able to predict new facts. Namely, inductive reasoning attempt to derive a complete and correct description of a given phenomenon or part of it.

It is important to mention that, of the two aspects of inductive inference: the generation of plausible hypothesis and their validation (the establishment of their truth status), only the first one is of primary interest to inductive learning research, because it is assumed that the generated hypothesis are judged by human experts and tested by known methods of deductive inference and statistics.

Elsevier announced the winners of the 2010 Semantic Web Challenge. The Elsevier sponsored Challenge occurred at the International Semantic Web Conference held in Shanghai, China from 7-11 November, 2010. A jury consisting of seven leading experts from both academia and industry awarded the four best applications with cash prizes exceeding 3000 Euro in total.

Over the last eight years, the Challenge has attracted more than 140 entries. All submissions are evaluated rigorously by a jury composed of leading scientists and experts from industry in a 3 round knockout competition consisting of a poster session, oral presentations and live demonstrations.

Organized this year by Christian Bizer from the Freie Universität Berlin, Germany, and Diana Maynard from the University of Sheffield, UK, the Semantic Web Challenge consists of two categories: “Open Track” and “Billion Triples Track.”

The Open Track requires that the applications can be used by ordinary people or scientists and must make use of the meaning of information on the web. The Billion Triples track requires applications to scale up to deal with huge amounts of information which has been gathered from the open web.

The winners of the 2010 Open Track challenge were the team from Stanford University comprising of Clement Jonquet, Paea LePendu, Sean M. Falconer, Adrien Coulet, Natalya F. Noy, Mark A. Musen, and Nigam H. Shah for “NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources”. Their entry provides very clear benefits to the biomedical community, bringing together knowledge from many different entities on the web with a large corpus of scientific literature though the clever application of semantic web technologies and principles.

The second prize in the open track was awarded to the team from Rensselaer Polytechnic Institute comprising of Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory Todd Williams, Jin Guang Zheng, Johanna Flores, Zhenning Shangguan, Gino Gervasio, Deborah L. McGuinness and Jim Hendler, for the development of “TWC LOGD: A Portal for Linking Open Government Data” – a massive semantic effort in opening up and linking all the public US government data, and providing the ecosystem and education for re-use.

The third prize in the 2010 Open Track was won by a combined team from the Karlsruhe Institute of Technology, Oxford University and the University of Southern California comprising of Denny Vrandecic, Varun Ratnakar, Markus Krötzsch, and Yolanda Gil for their entry “Shortipedia” – a Web-based knowledge repository and collaborative curating system, pulling together a growing number of sources in order to provide a comprehensive, multilingual and diversived view on entities of interest – a Wikipedia on steroids.

The Billion Triples Track was won by “Creating voiD Descriptions for Web-scale Data” by Christoph Böhm, Johannes Lorey, Dandy Fenz, Eyk Kny, Matthias Pohl, Felix Naumann from Potsdam Univesity, Germany. This entry uses state of the art parallelisation techniques, and some serious cloud computing power, to dissect the enormous Billion Triples dataset into topic-specific views.

Further Information

Further information about the Semantic Web Challenge 2010, the runners-up, all submissions and the evaluation committee is found on the Former Challenges page as well as in the Elsevier Press release about the Semantic Web Challenge 2010.

 

Blank Node

A blank node is an unnamed node, whose name is set by the underlying RDF software and cannot be guaranteed to have the same name for different sessions. Within a graph, it is guaranteed to resolve to the same thing (not a resource/URI but a separate way to represent a node), and between graphs, “it would be incorrect to assume that blank nodes from different graphs having the same blank node identifiers are the same” (see RDF Primer). If you want multiple independent graphs to refer to the same resource, you have to give it an explicit URI.

The most authoritative source for named graphs (being a W3C Recommendation) is SPARQL. Serialization syntaxes, such as RDF/XML or Turtle, allow you to assign an explicit name (a “blank node identifier”) to a blank node, but this is only to distinguish between different blank nodes or to refer to the same blank node from different triples within the same graph. If you give the same blank node identifier to blank nodes in different graphs, these blank nodes are still different from each other; in fact, there will be no relationship or interaction between them at all.

 

Named Graph

Named Graphs is the idea that having multiple RDF graphs in a single document/repository and naming them with URIs provides useful additional functionality built on top of the RDF Recommendations.

Named Graphs turn the RDF triple model into a quad model by extending a triple to include an additional item of information. This extra piece of information takes the form of a URI which provides some additional context to the triple with which it is associated, providing an extra degree of freedom when it comes to managing RDF data. The ability to group triples around a URI underlies features such as: Tracking provenance of RDF data, Access Control and Versioning.

There’s some useful background available on Named Graph Provenance and Trust,  on Named Graphs in general in a paper about NG4J, and specifically on their use in OpenAnzo.

Named Graphs are an important part of the overall technical framework for managing, publishing and querying RDF and Linked Data, and its important to understand the trade-offs in different approaches to using them.

On 28. August 2010 the Jena project celebrated its 10th year of providing us with a Semantic Web Framework, Jena now is probably one of the most popular Java RDF APIs in the community. It started as an idea by a developer at HPLabs in Bristol and Andy mentioned Brian McBride’s email (
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html ) as the official starting point for the project.

TDB is a persistent graph storage layer for Jena. TDB works with the Jena SPARQL query engine (ARQ) to provide complete SPARQL together with a number of extensions (e.g. property functions, aggregates, arbitrary length property paths). It is a pure-Java, employing memory mapped I/O, a custom implementation of B+Trees and optimized range filters for XSD value stapces (integers, decimals, dates, dateTime).

TDB has been used to load UniProt v13.4 (1.7B triples, 1.5B unique) on a single machine with 64 bit hardware (36 hours, 12k triples/s). TDB 0.5 Results for the Berlin SPARQL Benchmark (August 2008).

Elsevier announced the winners of the 2009 Semantic Web Challenge, which took place at the International Semantic Web Conference held in Washington, D.C., from October 25-29, 2009. A jury consisting of eleven leading experts from both academia and industry awarded the four best applications with cash prizes of 2750 Euro in total, sponsored by Elsevier.

The 2009 Semantic Web Challenge was organized by Peter Mika of Yahoo! Research and Chris Bizer of Freie Universität Berlin and consists of two categories: “Open Track” and “Billion Triples Track.” Open Track requires that the applications utilize the semantics (meaning) of data and that they have been designed to operate in an open web environment, whilst the Billion Triples Track focuses on dealing with very large data sets of low quality commonly found on the web.

The Billion Triples Track was won by “Scalable Reduction” by Gregory Todd Williams, Jesse Weaver, Medha Atre, and James A. Hendler (Rensselaer Polytechnic Institute, USA). The entry showed how massive parallelization can be applied to quickly clean and filter large amounts of RDF data.

The winners of the 2009 Open Track were Chintan Patel, Sharib Khan, and Karthik Gomadam from Applied Informatics, Inc for “TrialX” (http://trialx.com). TrialX enables finding new treatments by intelligently matching patients to clinical trials using advanced medical onthologies to combine several electronic health records with user generated information.

The second prize of the 2009 Open Track was awarded to Andreas Harth from the Institute of Applied Informatics and Formal Description Methods, Universität Karlsruhe, Germany for “VisiNav” (http://visinav.deri.org/). The third prize in the 2009 open Track was awarded to Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, and Stefan Decker from the Digital Enterprise Research Institute, Ireland for the development of “Sig.ma” (http://sig.ma/).

“This year’s winner of the Open Track is an application that we can hold up as an example to those outside of our community. In comparison, the Billion Triples Track have attracted less submissions this year, but it has been noticeable that all submissions have dealt with increasing amounts of information. Altogether we see clear progress toward implementing the vision of the Semantic Web.” said Chris Bizer and Peter Mika, co-chairs of the Semantic Web Challenge.

Open Track
1st place:
TrialX
Chintan Patel, Sharib Khan, and Karthik Gomadam from Applied Informatics, Inc
http://www.cs.vu.nl/~pmika/swc/documents/TrialX-healthx-iswc09-challenge.pdf

2nd place:
VisiNav
Andreas Harth from the Institute of Applied Informatics and Formal Description Methods, Universität Karlsruhe, Germany
http://www.cs.vu.nl/~pmika/swc/documents/VisiNav-paper.pdf

3rd place:
Sig.ma
Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, and Stefan Decker from the Digital Enterprise Research Institute, Ireland
http://www.cs.vu.nl/~pmika/swc/documents/Sig.ma:%20Live%20views%20on%20the%20web%20of%20data-sigma.pdf

Billion Triples Track:
Winner:
Scalable Reduction
Gregory Todd Williams, Jesse Weaver, Medha Atre, and James A. Hendler from the Rensselaer Polytechnic Institute, USA
http://www.cs.vu.nl/~pmika/swc/documents/Scalable%20Reduction%20of%20Large%20Datasets%20to%20Interesting%20Subsets-btc2009.pdf

More information on the 2009 Semantic Web Challenge Awards, as well as a demo and links to all the competing applications can be found on http://challenge.semanticweb.org

The Web is increasingly understood as a global information space consisting not just of linked documents, but also of linked data. The term Linked Data was coined by Tim Berners-Lee in his Linked Data Web architecture note. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using .

More than just a vision, the Web of Data has been brought into being by the maturing of the Semantic Web technology stack, and by the publication of an increasing number of datasets according to the principles of Linked Data. Today, this emerging Web of Data includes data sets as extensive and diverse as DBpedia, Geonames, US Census, EuroStat, MusicBrainz, BBC Programmes, Flickr, DBLP, PubMed, UniProt, FOAF, SIOC, OpenCyc, UMBEL and Yago. The availability of these and many other data sets has paved the way for an increasing number of applications that build on Linked Data, support services designed to reduce the complexity of integrating heterogeneous data from distributed sources, as well as new business opportunities for start-up companies in this space.

The basic tenets of Linked Data are to:

  • use the RDF data model to publish structured data on the Web
  • use RDF links to interlink data from different data sources

Applying both principles leads to the creation of a data commons on the Web, a space where people and organizations can post and consume data about anything. This data commons is often called the Web of Data or Semantic Web.

In summary, Linked Data is simply about using the Web to create typed links between data from different sources. It is important to know that Linked Data is not the Semantic Web, it’s the basement for it.

For more information, you may refer to:

The Tabulator Extension is an extension for Firefox that provides a human-readable interface for linked data. It is based on the Tabulator, a web-based interface for browsing RDF. Using Tabulator’s outline mode, query views, and back-end code, the Tabulator Extension integrates the browsing of linked data directly into the Firefox browser, making for a more natural and seamless experience when browsing linked data on the Web.

A primary goal of the Tabulator Extension is to explore how linked data could be displayed in the next generation of Web browsers. The Tabulator aims to make linked data human-readable by taking a document and picking out the actual things that the document describes. The properties of these things are the displayed in a table, and then the links in that table can be followed to load more data about other things in other documents.

A link to the latest version of the extension can be found on the Tabulator Extension site: The Tabulator Extension. Moreover, Tabulator is now hosted on addons.mozilla.org. If you download and install from there, it will provide automatic updates through the Firefox Addon Manager.

 

Once the extension file is downloaded, it should automatically install. After restarting Firefox, all documents served as application/rdf+xml and text/n3 (and for a while legacy documents served as text/rdf+n3) will be automatically loaded in the Tabulator’s outline view. It may be necessary to disable other RDF-related extensions that could override the Tabulator’s capture of these documents.

 

For more information, read this article : Tabulator: Exploring and Analyzing linked data on the Semantic Web

This topic will be discussed in the Webinar on 5 March 2009. Actually, there are two well known technical issues when reasoning with ontologies that contain hundreds of thousands of classes/subclasses and where change happens frequently.

 

The first problem, materializing type information, takes far too much time. In some triple stores, materialization takes almost as long as loading the data. Once an ontology changes, the entire materialization process has to start over.


The second problem, optimizing a SPARQL engine for a reasoning triple store, is more challenging than just using SPARQL as a retrieval language. In a non-reasoning SPARQL engine, optimizing is relatively straightforward, applying the right hash and sort joins once given the statistics of the database when it reorders appropriately. However, when SPARQL is used on top of a reasoner, suddenly additional considerations are required. In practice, you only know the statistics of each clause after you have done the reasoning.


This Webinar will discuss a new solution that mitigates or nearly solves both problems. We will discuss some indexing techniques that do not require materialization and we will cover how an ordinary backtracking technique can be very fast with the right reordering.


Register for this webinar at:

https://www2.gotomeeting.com/register/494147427

Oracle 10g Release 2 / Oracle 11g introduces the industry’s first open, scalable, secure platform to store RDF and OWL data. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types.

 

Oracle Jena adaptor software implements the well-known Jena Graph and Model APIs. (Jena is an Open Source framework developed by Hewlett-Packard and is available under a BSD-style license; see jena.sf.net for details.) It extends the capabilities of Oracle semantic data management (Oracle 10gR2 RDF and Oracle 11gR1 RDF/OWL) with a set of easy-to-use Java APIs. Enhancements have been done to the server side to accommodate those APIs.

 

Application developers can now use the power of the Oracle 11g Database to design and develop a wide range of semantic-enhanced business applications. Application areas include:

  • Life Sciences: Biological pathway analysis, discovery and enhanced search.
  • Defence & Intelligence: Data and content integration, reasoning and inference.
  • Enterprise Application Integration: Data and systems integration, semantic enterprise integration and semantic web services.
  • CRM/ERP: Supply chain integration, sourcing optimization and customer service automation.

 

Oracle Semantic Technologies Software Documentation

[1] Oracle Semantic Technologies Technical Presentation (PDF)

[2] Oracle RDF Overview

[3] Oracle Database 11gInference Best Practices with RDFS/OWL

[4] Semantic data management on Windows XP and configure semantic web technology support in Oracle 11g.