Automatic Privacy Compliance Checks for Mobile Apps Using Natural Language Processing

Pinchahewage, Bhanuka Malith Silva

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Pinchahewage, Bhanuka Malith Silva
dc.date.accessioned	2026-06-01T01:22:58Z
dc.date.available	2026-06-01T01:22:58Z
dc.date.issued	2026	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/35375
dc.description.abstract	The rapid growth of the mobile app ecosystem has intensified concerns about how user data is collected, shared, and communicated through privacy disclosures. Privacy compliance in app marketplaces relies heavily on developer self-reporting and user awareness. As a result, privacy information, whether in detailed policy documents or in summarised forms, often fails to accurately reflect intended data practices. This thesis explores how recent advances in NLP can enable automated and scalable privacy compliance checks in the Google Play Store. It identifies key factors that limit the transparency and usability of privacy policies and proposes enhanced parsing and structuring techniques to improve comprehension and support more effective regulatory oversight. Existing encoder-based models provide accurate predictions but lack interpretability, while decoder-based LLMs provide meaningful explanations, yet they lack verifiability. To address this gap, this thesis first introduces an entailment-driven LLM framework that couples generative reasoning and re-evaluation strategies with embedding-based verification, improving both the interpretability and factual consistency of privacy policy classification. It then presents PrivPRISM, a novel language-modelling framework that leverages both encoder and decoder architectures for large-scale compliance analysis, which cross-examines privacy policies, Play Store disclosures, and installation artefacts to detect inconsistencies. Findings reveal that 53% of analysed apps exhibit discrepancies, highlighting the need for evidence-driven auditing. Finally, this thesis details PrivSTRUCT, a structured modelling approach that leverages developer-defined structural cues to disentangle complex privacy disclosures by linking data items to their stated or implied purposes. The findings reveal a persistent transparency gap in which broadly defined purpose disclosures obscure sensitive first- and third-party data practices in mobile apps.	en_AU
dc.language.iso	en	en_AU
dc.subject	privacy policies	en_AU
dc.subject	mobile app privacy	en_AU
dc.subject	natural language processing	en_AU
dc.subject	large language models	en_AU
dc.subject	privacy compliance	en_AU
dc.title	Automatic Privacy Compliance Checks for Mobile Apps Using Natural Language Processing	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Doctor of Philosophy Ph.D.	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Seneviratne, Suranga
usyd.include.pub	No	en_AU