Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

Hansen, Silvia; Teich, Elke

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Hansen, Silvia
dc.contributor.author	Teich, Elke
dc.date.accessioned	2010-06-08
dc.date.available	2010-06-08
dc.date.issued	2001-01-01
dc.identifier.citation	Computing Arts 2001 : digital resources for research in the humanities : 26th-28th September 2001, Veterinary Science Conference Centre, the University of Sydney / hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney	en
dc.identifier.uri	http://hdl.handle.net/2123/6217
dc.description.abstract	In the proposed talk we discuss the application of a set of computational text analysis techniques for the analysis of the linguistic features of translations. The goal of this analysis is to test two hypotheses about the specific properties of translations: Baker's hypothesis of normalization (Baker, 1995) and Toury's law of interference (Toury, 1995). The corpus we analyze consists of English and German original texts and translations of those texts into German and English, respectively. The analysis task is complex in a number of respects. First, a multi-level analysis (clause, phrases, words) has to be carried out; second, among the linguistic features selected for analysis are some rather abstract ones, ranging from functional-grammatical features, e.g., Subject, Adverbial of Time, etc, to semantic features, e.g., semantic roles, such as Agent, Goal, Locative, etc.; third, monolingual and contrastive analyses are involved. This places certain requirements on the computational techniques to be employed both regarding corpus encoding, linguistic annotation and information extraction. We show how a combination of commonly available techniques can fulfill these requirements to a large degree and point out their limitations for application to the research questions raised. These techniques range from document encoding (TEI, XML) over automatic corpus annotation (notably part-of-speech tagging; Brants, 2000) and semi-automatic annotation (O'Donnell, 1995) to query systems as implemented in e.g., the IMS Corpus Workbench (Christ, 1994), the MATE system (Mengel & Lezius, 2000) and the Gsearch system (Keller et al., 1999).	en
dc.description.sponsorship	Hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney.	en
dc.language.iso	en	en
dc.publisher	Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney.	en
dc.rights	Copyright the University of Sydney	en
dc.subject	Humanities Computing	en
dc.subject	Computational text analysis	en
dc.title	Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora	en
dc.type	Conference paper	en
usyd.faculty	University hosted conferences

Show simple item record

Associated file/s

Name:: hansen_teich.pdf
Size:: 122.3KB
Format:: PDF

File/s

Associated collections

Computing Arts 2001: Digital Resources for Research in the Humanities

Show simple item record

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

Associated file/s

Associated collections

Share

Version history

Filters

Library