Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora
Field | Value | Language |
dc.contributor.author | Hansen, Silvia | |
dc.contributor.author | Teich, Elke | |
dc.date.accessioned | 2010-06-08 | |
dc.date.available | 2010-06-08 | |
dc.date.issued | 2001-01-01 | |
dc.identifier.citation | Computing Arts 2001 : digital resources for research in the humanities : 26th-28th September 2001, Veterinary Science Conference Centre, the University of Sydney / hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney | en_AU |
dc.identifier.uri | http://hdl.handle.net/2123/6217 | |
dc.description.abstract | In the proposed talk we discuss the application of a set of computational text analysis techniques for the analysis of the linguistic features of translations. The goal of this analysis is to test two hypotheses about the specific properties of translations: Baker's hypothesis of normalization (Baker, 1995) and Toury's law of interference (Toury, 1995). The corpus we analyze consists of English and German original texts and translations of those texts into German and English, respectively. The analysis task is complex in a number of respects. First, a multi-level analysis (clause, phrases, words) has to be carried out; second, among the linguistic features selected for analysis are some rather abstract ones, ranging from functional-grammatical features, e.g., Subject, Adverbial of Time, etc, to semantic features, e.g., semantic roles, such as Agent, Goal, Locative, etc.; third, monolingual and contrastive analyses are involved. This places certain requirements on the computational techniques to be employed both regarding corpus encoding, linguistic annotation and information extraction. We show how a combination of commonly available techniques can fulfill these requirements to a large degree and point out their limitations for application to the research questions raised. These techniques range from document encoding (TEI, XML) over automatic corpus annotation (notably part-of-speech tagging; Brants, 2000) and semi-automatic annotation (O'Donnell, 1995) to query systems as implemented in e.g., the IMS Corpus Workbench (Christ, 1994), the MATE system (Mengel & Lezius, 2000) and the Gsearch system (Keller et al., 1999). | en_AU |
dc.description.sponsorship | Hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney. | en_AU |
dc.language.iso | en | en_AU |
dc.publisher | Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney. | en_AU |
dc.rights | Copyright the University of Sydney | en |
dc.subject | Humanities Computing | en_AU |
dc.subject | Computational text analysis | en_AU |
dc.title | Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora | en_AU |
dc.type | Conference paper | en_AU |
Associated file/s
Associated collections