Show simple item record

FieldValueLanguage
dc.contributor.authorJohnson, Rebecca Lynn
dc.date.accessioned2026-04-07T23:25:39Z
dc.date.available2026-04-07T23:25:39Z
dc.date.issued2026en
dc.identifier.urihttps://hdl.handle.net/2123/35079
dc.description.abstractIn measurement theory, instruments do not simply record reality; they help constitute what is observed. This thesis argues that generative AI evaluation works the same way: benchmarks do not just measure models, they help shape what models appear to be. Functionalist benchmarks, rooted in computationalist assumptions, treat models as isolated predictors, while prescriptive benchmarks assess what systems ought to be. Both approaches can obscure the sociotechnical conditions in which meaning and values are enacted, and in a pluralist world they risk reifying narrow cultural epistemologies. In response, the thesis develops a descriptive alternative for evaluating generative AI as a pluralist sociotechnical system. It introduces MaSH Loops (Machine-Society-Human-in-the-loop) as an enactivist framework for tracing how models, people, and institutions recursively co-construct meaning and values. It also presents the World Values Benchmark, a distributional method grounded in World Values Survey data, prompt sets, and anchor-aware scoring. Together, these contributions shift evaluation away from one-score rankings and toward methods that reveal what models are doing and whose values they enact. The argument is developed across five chapters and applied through two case studies: value drift in early GPT-3 and sociotechnical evaluation in real estate. The final chapter extends the account through participatory realism, arguing that prompting and evaluation are constitutive interventions rather than neutral observations. Overall, the thesis shows that responsible evaluation requires pluralist, recursive frameworks that make value assumptions visible and support more culturally responsive AI governance, research practice, and policy design.en
dc.language.isoenen
dc.subjectArtificial Intelligenceen
dc.subjectEvaluations of AIen
dc.subjectGenerative AIen
dc.subjectParticipatory Realismen
dc.subjectEnactivismen
dc.subjectMeasurement Theoryen
dc.titleMeasuring the machine: Evaluating Generative AI as Pluralist Sociotechnical Systemsen
dc.typeThesis
dc.type.thesisDoctor of Philosophyen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Science::School of History and Philosophy of Scienceen
usyd.degreeDoctor of Philosophy Ph.D.en
usyd.awardinginstThe University of Sydneyen
usyd.advisorRickles, Dean
usyd.include.pubNoen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.