Measuring the machine: Evaluating Generative AI as Pluralist Sociotechnical Systems

Johnson, Rebecca Lynn

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Johnson, Rebecca Lynn
dc.date.accessioned	2026-04-07T23:25:39Z
dc.date.available	2026-04-07T23:25:39Z
dc.date.issued	2026	en
dc.identifier.uri	https://hdl.handle.net/2123/35079
dc.description.abstract	In measurement theory, instruments do not simply record reality; they help constitute what is observed. This thesis argues that generative AI evaluation works the same way: benchmarks do not just measure models, they help shape what models appear to be. Functionalist benchmarks, rooted in computationalist assumptions, treat models as isolated predictors, while prescriptive benchmarks assess what systems ought to be. Both approaches can obscure the sociotechnical conditions in which meaning and values are enacted, and in a pluralist world they risk reifying narrow cultural epistemologies. In response, the thesis develops a descriptive alternative for evaluating generative AI as a pluralist sociotechnical system. It introduces MaSH Loops (Machine-Society-Human-in-the-loop) as an enactivist framework for tracing how models, people, and institutions recursively co-construct meaning and values. It also presents the World Values Benchmark, a distributional method grounded in World Values Survey data, prompt sets, and anchor-aware scoring. Together, these contributions shift evaluation away from one-score rankings and toward methods that reveal what models are doing and whose values they enact. The argument is developed across five chapters and applied through two case studies: value drift in early GPT-3 and sociotechnical evaluation in real estate. The final chapter extends the account through participatory realism, arguing that prompting and evaluation are constitutive interventions rather than neutral observations. Overall, the thesis shows that responsible evaluation requires pluralist, recursive frameworks that make value assumptions visible and support more culturally responsive AI governance, research practice, and policy design.	en
dc.language.iso	en	en
dc.subject	Artificial Intelligence	en
dc.subject	Evaluations of AI	en
dc.subject	Generative AI	en
dc.subject	Participatory Realism	en
dc.subject	Enactivism	en
dc.subject	Measurement Theory	en
dc.title	Measuring the machine: Evaluating Generative AI as Pluralist Sociotechnical Systems	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Science::School of History and Philosophy of Science	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Rickles, Dean
usyd.include.pub	No	en