Show simple item record

FieldValueLanguage
dc.contributor.authorPatrick, Jon
dc.contributor.authorPalko, Dusan
dc.contributor.authorKhan, Asiz
dc.date.accessioned2010-06-08
dc.date.available2010-06-08
dc.date.issued2001-01-01
dc.identifier.citationComputing Arts 2001 : digital resources for research in the humanities : 26th-28th September 2001, Veterinary Science Conference Centre, the University of Sydney / hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydneyen_AU
dc.identifier.urihttp://hdl.handle.net/2123/6229
dc.description.abstractText can be thought of as a data stream that has embedded in it a variety of structural elements that indicate semantic changes in content. One of the simpler examples of this is a dictionary. We work from the general principles of inductive inference and use automata theory to model a text data stream. From this principle an Intelligent Self-Learning Parser-editor for inferencing the structure in the text, verifying it for accuracy and automating error correction is feasible. A good case study to test the quality of the solution is the conversion of dictionaries in text format into a database format. This is not necessarily a straightforward task as the data is noisy to some extent due to typographic errors, and inconsistent structure across dictionary entries. As well information for attribute demarcation is most often implied by changes in text formats and not by explicit symbols. In this project the aim has been to build a parser-editor that can be trained to identify the structure of dictionary entries and then learn from examples to parse unseen entries. The software has to be able to cope with erroneous data, missing data and irregularly formatted data and intelligently prompt a user to intervene in the parsing process as well as allow and record irregular structures. The technique has been used to convert a Basque-English bilingual dictionary from Word processing files into XML files.en_AU
dc.description.sponsorshipHosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney.en_AU
dc.language.isoenen_AU
dc.publisherResearch Institute for Humanities and Social Sciences (RIHSS), the University of Sydney.en_AU
dc.rightsCopyright the University of Sydneyen
dc.subjectHumanities Computingen_AU
dc.titleThe Inductive Inference of Structure in Text Streamsen_AU
dc.typeConference paperen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.