Data Analysis of Hierarchical Data for RDF Term Identification
Pieter Heyvaert
,
Anastasia Dimou
,
Ruben Verborgh
,
Erik Mannens
In Proceedings of the 6th Joint International Semantic Technology Conference (2016)
Generating Linked Data based on existing data sources requires the modeling of their information structure. This modeling needs the identification of potential entities, their attributes and the relationships between them and among entities. For databases this identification is not required, because a data schema is always available. However, for other data formats, such as hierarchical data, this is not always the case. Therefore, analysis of the data is required to support RDF term and data type identification. We introduce a tool that performs such an analysis on hierarchical data. It implements the algorithms, Daro and S-Daro, proposed in this paper. Based on our evaluation, we conclude that S-Daro offers a more scalable solution regarding run time, with respect to the dataset size, and provides more complete results.
PDF
BibTeX +
@inproceedings{heyvaert_jist_2016,
author = {Heyvaert, Pieter and Dimou, Anastasia and Verborgh, Ruben and Mannens, Erik},
title = {Data Analysis of Hierarchical Data for { RDF } Term Identification},
booktitle = {Proceedings of the 6th Joint International Semantic Technology Conference},
year = 2016,
month = nov,
abstract = {
Generating Linked Data based on existing data sources requires the modeling of their information structure. This modeling needs the identification of potential entities, their attributes and the relationships between them and among entities. For databases this identification is not required, because a data schema is always available. However, for other data formats, such as hierarchical data, this is not always the case. Therefore, analysis of the data is required to support RDF term and data type identification. We introduce a tool that performs such an analysis on hierarchical data. It implements the algorithms, Daro and S-Daro, proposed in this paper. Based on our evaluation, we conclude that S-Daro offers a more scalable solution regarding run time, with respect to the dataset size, and provides more complete results.
},
pdf = {https://biblio.ugent.be/publication/8504071/file/8504072}
}