Hark A Deep Learning System for Navigating Privacy Feedback at Scale

Hamza Harkous, Sai Teja Peddinti, Rishabh Khandelwal, Animesh Srivastava, Nina Taft

May, 2022

Abstract

Integrating user feedback is one of the pillars for building successful products. However, this feedback is generally collected in an unstructured free-text form, which is challenging to understand at scale. This is particularly demanding in the privacy domain due to the nuances associated with the concept and the limited existing solutions. In this work, we present Hark, a system for discovering and summarizing privacyrelated feedback at scale. Hark automates the entire process of summarizing privacy feedback, starting from unstructured text and resulting in a hierarchy of high-level privacy themes and fine-grained issues within each theme, along with representative reviews for each issue. At the core of Hark is a set of new deep learning models trained on di↵erent tasks, such as privacy feedback classification, privacy issues generation, and high-level theme creation. We illustrate Hark’s ecacy on a corpus of 626M Google Play reviews. Out of this corpus, our privacy feedback classifier extracts 6M privacy-related reviews (with an AUC-ROC of 0.92). With three annotation studies, we show that Hark’s generated issues are of high accuracy and coverage and that the theme titles are of high quality. We illustrate Hark’s capabilities by presenting high-level insights from 1.3M Android apps.

Type

Conference paper

Publication

IEEE Symposium on Security and Privacy (SP)