Measuring the machine: Evaluating Generative AI as Pluralist Sociotechnical Systems
| Field | Value | Language |
| dc.contributor.author | Johnson, Rebecca Lynn | |
| dc.date.accessioned | 2026-04-07T23:25:39Z | |
| dc.date.available | 2026-04-07T23:25:39Z | |
| dc.date.issued | 2026 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/35079 | |
| dc.description.abstract | In measurement theory, instruments do not simply record reality; they help constitute what is observed. This thesis argues that generative AI evaluation works the same way: benchmarks do not just measure models, they help shape what models appear to be. Functionalist benchmarks, rooted in computationalist assumptions, treat models as isolated predictors, while prescriptive benchmarks assess what systems ought to be. Both approaches can obscure the sociotechnical conditions in which meaning and values are enacted, and in a pluralist world they risk reifying narrow cultural epistemologies. In response, the thesis develops a descriptive alternative for evaluating generative AI as a pluralist sociotechnical system. It introduces MaSH Loops (Machine-Society-Human-in-the-loop) as an enactivist framework for tracing how models, people, and institutions recursively co-construct meaning and values. It also presents the World Values Benchmark, a distributional method grounded in World Values Survey data, prompt sets, and anchor-aware scoring. Together, these contributions shift evaluation away from one-score rankings and toward methods that reveal what models are doing and whose values they enact. The argument is developed across five chapters and applied through two case studies: value drift in early GPT-3 and sociotechnical evaluation in real estate. The final chapter extends the account through participatory realism, arguing that prompting and evaluation are constitutive interventions rather than neutral observations. Overall, the thesis shows that responsible evaluation requires pluralist, recursive frameworks that make value assumptions visible and support more culturally responsive AI governance, research practice, and policy design. | en |
| dc.language.iso | en | en |
| dc.subject | Artificial Intelligence | en |
| dc.subject | Evaluations of AI | en |
| dc.subject | Generative AI | en |
| dc.subject | Participatory Realism | en |
| dc.subject | Enactivism | en |
| dc.subject | Measurement Theory | en |
| dc.title | Measuring the machine: Evaluating Generative AI as Pluralist Sociotechnical Systems | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Science::School of History and Philosophy of Science | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Rickles, Dean | |
| usyd.include.pub | No | en |
Associated file/s
Associated collections