The Inductive Inference of Structure in Text Streams
Access status:
Open Access
Type
Conference paperAbstract
Text can be thought of as a data stream that has embedded in it a variety of structural elements that indicate semantic changes in content. One of the simpler examples of this is a dictionary. We work from the general principles of inductive inference and use automata theory to ...
See moreText can be thought of as a data stream that has embedded in it a variety of structural elements that indicate semantic changes in content. One of the simpler examples of this is a dictionary. We work from the general principles of inductive inference and use automata theory to model a text data stream. From this principle an Intelligent Self-Learning Parser-editor for inferencing the structure in the text, verifying it for accuracy and automating error correction is feasible. A good case study to test the quality of the solution is the conversion of dictionaries in text format into a database format. This is not necessarily a straightforward task as the data is noisy to some extent due to typographic errors, and inconsistent structure across dictionary entries. As well information for attribute demarcation is most often implied by changes in text formats and not by explicit symbols. In this project the aim has been to build a parser-editor that can be trained to identify the structure of dictionary entries and then learn from examples to parse unseen entries. The software has to be able to cope with erroneous data, missing data and irregularly formatted data and intelligently prompt a user to intervene in the parsing process as well as allow and record irregular structures. The technique has been used to convert a Basque-English bilingual dictionary from Word processing files into XML files.
See less
See moreText can be thought of as a data stream that has embedded in it a variety of structural elements that indicate semantic changes in content. One of the simpler examples of this is a dictionary. We work from the general principles of inductive inference and use automata theory to model a text data stream. From this principle an Intelligent Self-Learning Parser-editor for inferencing the structure in the text, verifying it for accuracy and automating error correction is feasible. A good case study to test the quality of the solution is the conversion of dictionaries in text format into a database format. This is not necessarily a straightforward task as the data is noisy to some extent due to typographic errors, and inconsistent structure across dictionary entries. As well information for attribute demarcation is most often implied by changes in text formats and not by explicit symbols. In this project the aim has been to build a parser-editor that can be trained to identify the structure of dictionary entries and then learn from examples to parse unseen entries. The software has to be able to cope with erroneous data, missing data and irregularly formatted data and intelligently prompt a user to intervene in the parsing process as well as allow and record irregular structures. The technique has been used to convert a Basque-English bilingual dictionary from Word processing files into XML files.
See less
Date
2001-01-01Publisher
Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney.Licence
Copyright the University of SydneyCitation
Computing Arts 2001 : digital resources for research in the humanities : 26th-28th September 2001, Veterinary Science Conference Centre, the University of Sydney / hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of SydneySubjects
Humanities ComputingShare