Court Documents: Facebook Doesn’t Know What Data It Collects
WASHINGTON — Facebook’s user data is scattered throughout its system in such a way no one person could detail exactly what information the social media company collects on an individual. This is according to recently unsealed court documents from the California-based lawsuit against Facebook for the Cambridge Analytica scandal.
The 833-page document, which holds transcripts from pretrial hearings and other materials giving insight into how the company holds users’ information, was released after the company agreed to settle the multi-year privacy class action lawsuit in the Northern District of California.
The lawsuit stems from the company allegedly giving the personal information of millions of users to Cambridge Analytica, a political consulting firm that supported Donald Trump’s presidential campaign in 2016.
The documents reveal the company dodged requests for information showing how much data it has on any particular user, and how the 55 Facebook subsystems that hold user data create a structure where no one particularly knows how much information is kept.
Leaked internal documents released in April showed this disorganized approach to how the company catalogs users’ data could hinder its compliance with regulations aimed at protecting people’s privacy.
“Our systems are sophisticated and it shouldn’t be a surprise that no single company engineer can answer every question about where each piece of user information is stored. We’ve built one of the most comprehensive privacy programs to oversee data use across our operations and to carefully manage and protect people’s data,” said a spokesperson for Meta, the company formerly known as Facebook, in an email statement.
Leading up to the trial Daniel Garrie, a subject-matter expert, was appointed to be the special master overseeing the impasse between Facebook and the plaintiffs to disclose what data the company keeps on its users.
During a pretrial hearing intended to solve the impasse, Garrie asked Facebook engineers to pinpoint how data was collected and stored for a single user. However, those being questioned — Eugene Zarashaw, an engineering director, and Steven Elia, a software engineering manager — struggled to answer.
“The court said data collected from a user’s on-platform activity, data obtained from third parties regarding a user’s off-platform activities, and data inferred from a user’s on-or-off platform activity,” Garrie said, reminding Zarashaw of the scope of the question.
Zarashaw explained there are multiple files stored across the system that create one user’s profile. And some information could be duplicated, he said.
“If it’s not you or Steven [Elia] that can answer this, who at Facebook can — ’cause it’s pretty simple; right? I mean, this question is simple but the answer is deceptively complicated. And I’m just trying to understand at the most basic level from this list what we’re looking at,” Garrie said, referring to a list of the different systems where data is stored.
Zarashaw, who got his start in software engineering at Microsoft in 2001, according to his LinkedIn page, revealed, “I don’t believe there’s a single person that exists who could answer that question. It would take a significant team effort to even be able to answer that question.”
Facebook faced scrutiny over how it handles data previously. And according to leaked internal documents reported by Vice’s Motherboard in April, the distributed data throughout Facebook’s systems has created issues for its potential compliance with data privacy regulation.
“We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose.’ And yet, this is exactly what regulators expect us to do, increasing our risk of mistakes and misrepresentation,” according to the document’s executive summary.
“Addressing these challenges will require additional multi-year investment in ads and our infrastructure teams to gain control over how our systems ingest, process and egest data. This new investment is needed in addition to the ongoing Purpose Policy Framework investments,” according to the document.
A spokesperson from the company said in a statement, “We have made — and continue making — significant investments to meet our privacy commitments and obligations, including extensive data controls.”
The lawsuit was settled just weeks before the Sept. 20 deadline for some of the top executives including Meta CEO Mark Zuckerberg and his long-time chief operating officer, Sheryl Sandberg, to submit to depositions during the final phases of pretrial evidence gathering, according to court documents.
The terms of the settlement have not been disclosed, and there was a 60-day stay of the action filed while lawyers finalize the settlement, which put its likely completion date in late October.
The Associated Press contributed to this report.