Subscribe: Planet RDF
Preview: Planet RDF

Planet RDF

It's triples all the way down

Published: 2016-12-09T15:07:13.02Z


SANSA 0.1 (Semantic Analytics Stack) Released


Dear all, The Smart Data Analytics group  /AKSW are very happy to announce SANSA 0.1 – the initial release of the Scalable Semantic Analytics Stack. SANSA combines distributed computing and semantic technologies in order to allow powerful machine learning, inference and querying capabilities for large knowledge graphs. Website: GitHub: Download: ChangeLog: You can find the FAQ and usage examples at . The following features are currently supported by SANSA: Support for reading and writing RDF files in N-Triples format Support for reading OWL files in various standard formats Querying and partitioning based on Sparqlify Support ...

AKSW wins award for Best Resources Paper at ISWC 2016 in Japan


Our paper, “ LODStats: The Data Web Census Dataset ”, won the award for Best Resources Paper at the recent conference in Kobe/Japan, which was the premier international forum for Semantic Web and Linked Data Community. The paper presents the LODStats dataset, which provides a comprehensive picture of the current state of a significant part of the Data Web. Congrats to  Ivan Ermilov , Jens Lehmann , Michael Martin and Sören Auer . Please find the complete list of winners here .  

PhD Proposal: Ankur Padia, Dealing with Dubious Facts in Knowledge Graphs


Tweet Dissertation Proposal Dealing with Dubious Facts in Knowledge Graphs Ankur Padia 1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts. An extracted fact ...

AKSW Colloquium, 28.11.2016, NED using PBOH + Large-Scale Learning of Relation-Extraction Rules.


In the upcoming Colloquium, November the 28th at 3 PM, two papers will be presented: Probabilistic Bag-Of-Hyperlinks Model for Entity Linking Diego Moussallem will discuss the paper “Probabilistic Bag-Of-Hyperlinks Model for Entity Linking” by Octavian-Eugen Ganea et. al. which was accepted at WWW 2016. Abstract :  Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to ...

Leveraging KBpedia Aspects To Generate Training Sets Automatically


In previous articles I have covered multiple ways to create training corpuses for unsupervised learning and positive and negative training sets for supervised learning 1 , 2 , 3 using Cognonto and KBpedia. Different structures inherent to a knowledge graph like KBpedia can lead to quite different corpuses and sets. Each of these corpuses or sets may yield different predictive powers depending on the task at hand. So far we have covered two ways to leverage the KBpedia Knowledge Graph to automatically create positive and negative training corpuses: Using the links that exist between each KBpedia reference concept and their ...

Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 2


In the first part of this series we found the good hyperparameters for a single linear SVM classifier. In part 2, we will try another technique to improve the performance of the system: ensemble learning. So far, we already reached 95% of accuracy with some tweaking the hyperparameters and the training corpuses but the F1 score is still around ~70% with the full gold standard which can be improved. There are also situations when precision should be nearly perfect (because false positives are really not acceptable) or when the recall should be optimized. Here we will try to improve this ...

Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 1


In my previous blog post, Create a Domain Text Classifier Using Cognonto , I explained how one can use the KBpedia Knowledge Graph to automatically create positive and negative training corpuses for different machine learning tasks. I explained how SVM classifiers could be trained and used to check if an input text belongs to the defined domain or not. This article is the first of two articles.In first part I will extend on this idea to explain how the KBpedia Knowledge Graph can be used, along with other machine learning techniques, to cope with different situations and use cases. I ...

Triplifying a real dictionary


The Linked Data Lexicography for High-End Language Technology (LDL4HELTA) project was started in cooperation between  Semantic Web Company  (SWC) and  K Dictionaries . LDL4HELTA combines lexicography and Language Technology with semantic technologies and Linked (Open) Data mechanisms and technologies. One of the implementation steps of the project is to create a language graph from the dictionary data. The input data, described further, is a Spanish dictionary core translated into multiple languages and available in XML format. This data should be triplified (which means to be converted to RDF –  Resource Description Framework ) for several purposes, including to enrich it with ...

Accepted paper in AAAI 2017


Hello Community! We are very pleased to announce that our paper “Radon– Rapid Discovery of Topological Relations” was accepted for presentation at the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) , which will be held in February 4–9 at the Hilton San Francisco, San Francisco, California, USA. In more detail, we will present the following paper:  “Radon– Rapid Discovery of Topological Relations” Mohamed Ahmed Sherif , Kevin Dreßler , Panayiotis Smeros , and Axel-Cyrille Ngonga Ngomo Abstract. Datasets containing geo-spatial resources are increasingly being represented according to the Linked Data principles. Several time-efficient approaches for discovering links between RDF resources ...

Pulling RDF out of MySQL


With a command line option and a very short stylesheet.

SUB Göttingen joins DCMI as Institutional Member


2016-11-11, DCMI is pleased to announce that Göttingen State and University Library (SUB Göttingen) has joined DCMI as an Institutional Member. SUB Göttingen is one of most important research libraries in Germany, plays a leading role in a large number of national and international projects involving the optimization of literature and information provision and the establishment and development of digital research and information infrastructures. Its scope of activities include the cooperative development of a Germany-wide service infrastructure for the acquisition, licensing and provision of electronic resources; the coordination of large-scale joint research projects for developing research infrastructures in the humanities ...

Donate to the commons this holiday season


Holiday season is nearly upon us. Donating to a charity is an alternative form of gift giving that shows you care, whilst directing your money towards helping those that need it. There are a lot of great and deserving causes you can support, and I’m certainly not going to tell you where you should donate your money. But I’ve been thinking about the various ways in which I can support projects that I care about. There are a lot of them as it turns out. And it occurred to me that I could ask friends and family who might want to buy me a gift to ...

The practice of open data


Open data is data that anyone can access, use and share. Open data is the result of several processes. The most obvious one is the release process that results in data being made available for reuse and sharing. But there are other processes that may take place before that open data is made available: collecting and curating a dataset; running it through quality checks; or ensuring that data has been properly anonymised. There are also processes that happen after data has been published. Providing support to users, for example. Or dealing with error reports or service issues with an API ...

Building and Maintaining the KBpedia Knowledge Graph


The Cognonto demo is powered by an extensive knowledge graph called the KBpedia Knowledge Graph, as organized according to the KBpedia Knowledge Ontology (KKO). KBpedia is used for all kinds of tasks, some of which are demonstrated by the Cognonto use cases . KBpedia powers dataset linkage and mapping tools, machine learning training workflows, entity and concept extractions, category and topic tagging, etc. The KBpedia Knowledge Graph is a structure of more than 39,000 reference concepts linked to 6 major knowledge bases and 20 popular ontologies in use across the Web. Unlike other knowledge graphs that analyze big corpuses of ...

Discogs: a business based on public domain data


When I’m discussing business models around open data I regularly refer to a few different examples. Not all of these have well developed case studies, so I thought I’d start trying to capture them here. In this first write-up I’m going to look at

Machine learning links


[work in progress – I’m updating it gradually] Machine Learning

Checking Fact Checkers


As of last month

Elinor Ostrom and data infrastructure


One of the topics that most interests me at the moment is how we design systems and organisations that contribute to the creation and maintenance of the open data commons. This is more than a purely academic interest. If we can understand the characteristics of successful open data projects like Open Street Map or Musicbrainz then we could replicate them in other areas. My hope is that we may be able to define a useful tool-kit of organisational and technical design patterns that make it more likely for other similar projects to proceed. These patterns might also give us a way to evaluate and ...

My SQL quick reference


Pun intended.

Create a Domain Text Classifier Using Cognonto


A common task required by systems that automatically analyze text is to classify an input text into one or multiple classes . A model needs to be created to scope the class (what belongs to it and what does not) and then a classification algorithm uses this model to classify an input text. Multiple classification algorithms exists to perform such a task: Support Vector Machine (SVM), K-Nearest Neigbours (KNN), C4.5 and others. What is hard with any such text classification task is not so much how to use these algorithms: they are generally easy to configure and use once implemented ...

A presence robot with Chromium, WebRTC, Raspberry Pi 3 and EasyRTC


Here’s how to make a presence robot with Chromium 51, WebRTC, Raspberry Pi 3 and EasyRTC. It’s actually very easy, especially now that Chromium 51 comes with Raspian Jessie, although it’s taken me a long time to find the exact incantation. If you’re going to use it for real, I’d suggest using the



For the purposes of having something to point to in future, here’s a list of different meanings of “open” that I’ve encountered. XYZ is “open” because: It’s on the web It’s free to use It’s published under an open licence It’s published under a custom licence, which limits some types of use (usually commercial, often everything except personal) It’s published under an open licence, but we’ve not checked too deeply in whether we can do that It’s free to use, so long as you do so within our app or application There’s a restricted/limited access free version There’s documentation on how it works ...

Current gaps in the open data standards framework


In this post I want to highlight what I think are some fairly large gaps in the standards we have for publishing and consuming data on the web. My purpose for writing these down is to try and fill in gaps in my own knowledge, so leave a comment if you think I’m missing something (there’s probably loads!) To define the scope of those standards, lets try and answer two questions. Question 1: What are the various activities that we might want to carry out around an open dataset? A. Discover the metadata and documentation about a dataset B. Download ...

AKSW Colloquium, 17.10.2016, Version Control for RDF Triple Stores + NEED4Tweet


In the upcoming Colloquium, October the 17th at 3 PM, two papers will be presented: Version Control for RDF Triple Stores Marvin Frommhold will discuss the paper “Version Control for RDF Triple Stores” by Steve Cassidy and James Ballantine which forms the foundation of his own work regarding versioning for RDF. Abstract :  RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources and via human authoring either directly or through tools that generate RDF output. As individuals build up large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, ...

LIMES 1.0.0 Released


Dear all, the LIMES Dev team is happy to announce LIMES 1.0.0. LIMES, the Li nk Discovery Framework for Me tric S paces, is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. Our approaches facilitate different approximation techniques to compute estimates of the similarity between instances. These estimates are then used to filter out a large amount of those instance pairs that do not suffice the mapping conditions. By these means, LIMES can reduce the number of comparisons needed during the mapping process by ...

DL-Learner 1.3 (Supervised Structured Machine Learning Framework) Released


Dear all, the Smart Data Analytics group  at AKSW is happy to announce DL-Learner 1.3. DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis. Website: GitHub page: Download: ChangeLog: DL-Learner is used for data analysis tasks within other tools such as ...

Mapping Datasets, Schema and Ontologies Using the Cognonto Mapper


There are many situations were we want to link named entities from two different datasets or to find duplicate entities to remove in a single dataset. The same is true for vocabulary terms or ontology classes that we want to integrate and map together. Sometimes we want to use such a linkage system to help save time when creating gold standards for named entity recognition tasks. There exist multiple data linkage & deduplication frameworks developed in several different programming languages. At Cognonto, we have our own system called the Cognonto Mapper. Most mapping frameworks work more or less the same ...

OntoWiki 1.0.0 released


Dear Semantic Web and Linked Data Community, we are proud to finally announce the releases of OntoWiki 1.0.0 and the underlying Erfurt Framework in version 1.8.0 . After 10 years of development we’ve decided to release the teenager OntoWiki from the cozy home of 0.x versions. Since the last release of 0.9.11 in January 2014 we did a lot of testing to stabilize OntoWikis behavior and accordingly made a lot of bug fixes, also we are now using PHP Composer for dependency management, improved the testing work flow, gave a new structure and home to the documentation and we have ...

Improving Machine Learning Tasks By Integrating Private Datasets


In the last decade, we have seen the emergence of two big families of datasets: the public and the private ones. Invaluable public datasets like Wikipedia , Wikidata , Open Corporates and others have been created and leveraged by organizations world-wide. However, as great as they are, most organization still rely on private datasets of their own curated data. In this article, I want to demonstrate how high-value private datasets may be integrated into the Cognonto’s KBpedia knowledge base to produce a significant impact on the quality of the results of some machine learning tasks. To demonstrate this impact, I ...