Could Big Data Spur Semantic Web Development?

The semantic web, or web 3.0, is often quoted as the next phase of the Internet. Led by the World Wide Web Consortium (W3C), the objective is to convert the current web of unstructured and semi-structured data into a “web of data”. According to W3C, with the semantic web it will be possible to easily share and re-use data across application, community and enterprise boundaries.

The inventor of the web, Tim Berners-Lee, called the semantic web in 1998 already “a web of data, in some ways like a global database”. This database will consist of all unstructured, semi-structured and structured data currently already online but still residing in silos across the web. In that same paper he describes the rationale for developing the semantic web as “the Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well-defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web”.

So the semantic web will enable all humans as well as all internet connected devices (think: the Internet of Things) to communicates with each other as well as share and re-use data in different forms across different applications and organizations in real-time. It is clear that this has everything to do with big data. The question is, however, whether big data and its technologies can spur the development of the semantic web.

Big data is often discussed as the 3V’s of volume, velocity and variety. Personally I think this is too short-sighted as it forgets a few other very important aspects of big data: veracity, value, visualization and variability. Veracity being the correctness of the data. Value being the economic benefits for companies, organizations and societies. Visualization being the art of making the data easy and understandable to read. Variability being the changing meaning of data over time.

Together these 7V’s define big data and it immediately shows the challenges of the semantic web: how to connect, link and make available all data on the web that is created in large volume at high-speed in different variety and variability ensuring the correctness and quality of that data and making it understandable for humans and machines. It also shows how big data can help create the semantic web. All the technologies currently being developed for big data, such as for example Hadoop, open source tools or the technology developed by big data startups, will enable the development of the semantic web as processing, linking and analysing all that data becomes better and cheaper.

Ramani Pandurangan describes in a blog post the semantic web as “essentially a framework to link metadata (data about data) of data stored in disparate databases on the Web so that it will allow machines to query these databases and yield enriched results.” When all databases that are currently still in silos on the web will be connected it will become possible for machines to find information currently difficult or impossible to find as well as connect and communicate with that information.

A great example of this is the Knowledge Graph of Google, which was introduced in May 2012. Google calls it the future of search, indexing things and not strings. Knowledge Graph is very promising, but as mentioned by Larry Page, they “are still at 1% of where Google wants to be”. Currently the semantic network created by Google contains 570 million objects and over 18 billion facts about relationships between different objects that are used to understand the meaning of the keywords entered for the search. The objective is to develop a Startrek Experience, where users can simply ask computers natural questions.

Not only Google is working on semantic search, also big data startups like Ontology aim to connect things instead of strings. Ontology focuses on the enterprise applications instead of the web and therefore does for enterprise applications what Google does for the web.

The semantic web as discussed by Tim Berners-Lee already in 1998 focuses especially also on machines connecting and communicating with the web. Nowadays we call this The Internet of Things. We have discussed the Internet of Things already a few times on BigData-Startups but it is evident that it requires a semantic backbone to work. Connecting 25 or 50 billion different devices that are interoperable can only happen when those devices can browse, connect and communicate with the web like humans do. It becomes even more important when in about a decade or so we will reach a trillion sensors that are connected to the web.

Therefore, the technologies involved with big data do not only stimulated the development of the semantic web, they will also require a semantic web to work correctly and to take full advantage of the promises of big data. This might seem like a “chicken and egg” story but most of the big data technologies that are currently developed will be able to help develop the semantic web before the Internet of things has arrived in full force.