4/9/2023 0 Comments Oxygen xml developer 19.1![]() The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. This paper is part of ongoing work and a contribution to the efforts of the DARIAH-ERIC Lexical Resources working group. We are currently focusing on the macrostructural level, more precisely on the types of lexical units NO1VN1ACL1U,nAALN1 NO1ivVeLN1 rV,si1Vr1ACL1dLss, providing a set of modelling principles and representation forms of every type of entry in the DACL. In the paper, we discuss the TEI Lex-0 encoding of the DACL, as well as the conversion methodology and the tools used for the automatic conversion from the original encoding. Our experiments show that even though TEI Lex-0 is stricter than TEI itself (allowing fewer elements and imposing certain constraints that are not present in plain TEI), it is fully capable of representing the complexities of the entry structure of the DACL. ![]() Even though the original encoding of the DACL was based on TEI, we decided to switch to TEI Lex-0 because it allowed us to streamline our encoding. This paper describes some experiments made while encoding the first complete dictionary of the Academia das Ciências de Lisboa (DACL) in the context of TEI Lex-0, a community-based interchange format for lexical data aimed at facilitating the interoperability and reusability of lexical resources. ![]() Finally, a couple of programs were created in order to prepare regular reports on the dictionary revision process, as well as to backup it in a GIT repository. In order to guarantee incremental backups, it was defined a mechanism to import the XML database into a GIT repository. The lexicographers can edit entries using the oXygen XML editor, reading and storing them directly in the database. It can be queried used a web interface developed using XQuery. ![]() To store the dictionary we decided to use an XML aware database (eXist-DB), that stores each dictionary entry as a separate resource. For that, an iterative filtering approach was used. The conversion process was challenging given the format of the PDF file, and the fine grained detail of the XML schema that was used. In this article we describe the workflow implemented to convert a dictionary saved as a PDF file into an XML document and posterior importation into an XML aware database, and the process to edit, add and delete new entries. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |