Efficient Training Data Attribution on Diffusion Models

Lin, Jinxu

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Lin, Jinxu
dc.date.accessioned	2025-10-20T00:35:41Z
dc.date.available	2025-10-20T00:35:41Z
dc.date.issued	2025	en
dc.identifier.uri	https://hdl.handle.net/2123/34416
dc.description	Includes publication
dc.description.abstract	As diffusion models gain widespread adoption, concerns over the misuse of copyrighted and private images have become increasingly prominent. A promising approach to mitigate these issues involves identifying the contribution of individual training samples in generative process, a task referred to as data attribution. Existing data attribution methods for diffusion models typically assess the contribution of a training sample by examining the change in diffusion loss when the sample is included or excluded during training. However, we contend that the direct use of diffusion loss fails to accurately capture this contribution due to the nature of its calculation. Specifically, these methods rely on computing KL-divergence, measuring the divergence between predicted and ground truth distributions. This indirect comparison of predicted distributions inadequately reflects the variations in model behavior caused by different training samples. To address these limitations, we propose the Diffusion Attribution Score (\textit{DAS}), a novel attribution score that enables direct comparisons between predicted distributions to evaluate the importance of individual training samples. DAS is grounded in rigorous theoretical analysis, which we detail to substantiate its efficacy in attributing data influence in diffusion models. Moreover, we present optimization strategies to accelerate DAS computations, making it efficient to apply to large-scale diffusion models. Extensive experiments conducted across diverse datasets and diffusion models highlight that DAS significantly outperforms existing benchmarks, achieving superior results in terms of the linear data-modeling score and establishing a new state-of-the-art in data attribution performance.	en
dc.language.iso	en	en
dc.subject	Artificial Intelligence	en
dc.subject	Deep Learning	en
dc.subject	Image Generationg Models	en
dc.subject	Diffusion Models	en
dc.subject	Explainable AI	en
dc.subject	Training Data Attribution	en
dc.title	Efficient Training Data Attribution on Diffusion Models	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Master of Philosophy M.Phil	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Xu, Chang
usyd.include.pub	Yes	en