In Proceedings of the 12th Extended Semantic Web Conference: Semantic Publishing Challenge (2016)
In this paper, we present our solution for the first task of the second Semantic Publishing Challenge. The task requires extracting and semantically annotating information regarding CEUR-WS workshops, their chairs and conference affiliations, as well as their papers and their authors, from a set of html-encoded workshop proceedings volumes. Our solution builds on last year's submission, while we address a number of shortcomings, assess the generated dataset for its quality and publish the queries as SPARQL query templates. This is accomplished using the RDF Mapping Language (RML) to define the mappings, the RMLProcessor to execute them, the RDFUnit to both validate the mapping documents and assess the generated dataset's quality, and the datatank to publish the SPARQL query templates. This results in an overall improved quality of the generated dataset that is reflected in the query results.