Efficient Training Data Attribution on Diffusion Models
| Field | Value | Language |
| dc.contributor.author | Lin, Jinxu | |
| dc.date.accessioned | 2025-10-20T00:35:41Z | |
| dc.date.available | 2025-10-20T00:35:41Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34416 | |
| dc.description | Includes publication | |
| dc.description.abstract | As diffusion models gain widespread adoption, concerns over the misuse of copyrighted and private images have become increasingly prominent. A promising approach to mitigate these issues involves identifying the contribution of individual training samples in generative process, a task referred to as data attribution. Existing data attribution methods for diffusion models typically assess the contribution of a training sample by examining the change in diffusion loss when the sample is included or excluded during training. However, we contend that the direct use of diffusion loss fails to accurately capture this contribution due to the nature of its calculation. Specifically, these methods rely on computing KL-divergence, measuring the divergence between predicted and ground truth distributions. This indirect comparison of predicted distributions inadequately reflects the variations in model behavior caused by different training samples. To address these limitations, we propose the Diffusion Attribution Score (\textit{DAS}), a novel attribution score that enables direct comparisons between predicted distributions to evaluate the importance of individual training samples. DAS is grounded in rigorous theoretical analysis, which we detail to substantiate its efficacy in attributing data influence in diffusion models. Moreover, we present optimization strategies to accelerate DAS computations, making it efficient to apply to large-scale diffusion models. Extensive experiments conducted across diverse datasets and diffusion models highlight that DAS significantly outperforms existing benchmarks, achieving superior results in terms of the linear data-modeling score and establishing a new state-of-the-art in data attribution performance. | en |
| dc.language.iso | en | en |
| dc.subject | Artificial Intelligence | en |
| dc.subject | Deep Learning | en |
| dc.subject | Image Generationg Models | en |
| dc.subject | Diffusion Models | en |
| dc.subject | Explainable AI | en |
| dc.subject | Training Data Attribution | en |
| dc.title | Efficient Training Data Attribution on Diffusion Models | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en |
| usyd.degree | Master of Philosophy M.Phil | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Xu, Chang | |
| usyd.include.pub | Yes | en |
Associated file/s
Associated collections