The team used artificial intelligence (AI) algorithms to crawl 7,000 of the most popular websites' privacy policies and identify those that contain language about data collection and use, third-party sharing, data retention, and user choice — among other privacy issues. The project website enables people to navigate machine-annotated privacy policies and jump directly to statements of interest to them, including those often buried deep in the text of privacy policies.
"We found that the text of the policies is often vague and ambiguous, and people tend to struggle to interpret and determine what personal information is collected, how it's used, and what other entities it's shared with," Sadeh says. "From a legal standpoint, this is problematic."
To "train" their AI, the team asked a group of law students to manually annotate 115 privacy policies. The AI learned from those annotations and then crawled the policies from over 7,000 of the most popular sites on the web.
"While not perfect, our techniques are capable of automatically extracting a large number of privacy statements from the text of privacy policies," says Sadeh. "Eventually, the goal is to make this information available to users via a simple and intuitive browser plug-in that would provide users with personalized summaries highlighting those issues they are most likely to care about."