Thierry Declerck: Linguistically analyzed labels of knowledge objects: How can they support OBIE? Lessons leaned from the Monnet and TrendMiner projects.
We are investigating the use of natural language expressions included in Knowledge Organization Systems (KOS) for supporting Ontology-Based Information Extraction (OBIE), in a multi- and cross-lingual context. Very often, Knowledge Organization Systems include so-called annotation properties, in the form of labels, comments, definitions, etc, which have the purpose of introducing human readable information in the formal description of the domain modelled in the KOS.
An approach developed in the Monnet project, and continued in the TrendMiner project, consists in transforming the content of annotation properties into linguistically analysed data. Natural language processing of such language expressions, also called sometimes lexicalisation of Knowledge Organisation Systems, are thus transforming the unstructured content of annotation properties into linguistically structured data, which can be used in comparing language data included in a KOS with linguistically annotated texts. If some match of linguistic features between those two types of documents can be established, corresponding segments of the textual documents can be semantically annotated with the elements of the KOS the content of the annotation property is associated with. Evidently, this semantic annotation procedure can be of great help for OBIE, relating text segment to relevant parts of thesauri, taxonomy or ontologies.
But looking in more details at the language data contained in annotation properties, we can see that this data very often has to be modified in order to be better used in the context of OBIE. Also there is a need for a formal representation of such linguistically annotated language data in order to ensure interoperability with semantic data available in the Linked Data Framework. The talk will expand on those issues.