Israeli Company Sues Meta Over Public Data Issue
NETANYA, Israel — Bright Data, one of the world’s leading web data platforms, is suing Meta, stating the owner of Facebook and Instagram is wrongly claiming public user data as its own, violating the spirit of openness on which the internet was founded.
Meta, meanwhile, has countersued, both revealing that Bright Data is one of its former contractors and that together they had engaged in so-called “data scraping” of other websites.
The revelation in Meta’s lawsuit, which was filed in San Francisco, California, is noteworthy because for years the company has battled with data scrapers.
Meta has since maintained that it only used data collected by Bright Data to build brand profiles and to identify harmful sites and phishing campaigns.
It claims it terminated its relationship with Bright Data after the contractor allegedly violated terms when gathering and selling data from Facebook and Instagram.
But in its lawsuit, filed on Jan. 6 in Delaware Superior Court, Bright Data said it only collected publicly available information when scraping websites and that it respects U.S. and all other relevant regulations.
Further, it argues that what’s really at stake is whether public data belongs in the hands of the public or in the cold hard grasp of private companies.
On Monday, The Well News spoke with Or Lenchner, CEO of Bright Data, to learn more about the litigation.
To understand the litigation, one has to at least have a grasp of what people mean when they refer to “web data scraping” or “web data collection.”
To get there one has to realize, as Lenchner explained, that the internet is the largest database ever created in the history of humanity, its size measured in zettabytes (one zettabyte being a “1” followed by 21 zeros).
“Today, we’re talking about something that measures about 200 zettabytes, and it is growing rapidly,” Lenchner said.
The data contained in all these zettabytes falls roughly into two categories — public data and private data.
“Public information, at least the definition used by most of the industry today, is anything you can see with your own eyes without doing anything special, such as having to login or pay for a paywall or something like that,” Lenchner said.
“Most of the information that falls into this category are products and product pages, news, ads, things like that,” he continued. “It’s the result of HTML code, which is turning binary code into a visual … that humans can see.”
But this information isn’t static. News stories and their placement changes. Prices of products ebb and flow.
The scraping of web data then is the automated process of collecting data as HTML or code data and rapidly analyzing the changes.
So for instance, an e-commerce platform might use web scraping on a daily basis on a set number of competitor websites to make sure they’re offering consumers the lowest price, best selection or fastest shipping.
Firms like Bright Data create and operate software that allows their client companies to collect data on a daily, hourly and, in some cases, even second-to-second basis and process it.
“So what you have now is, say, the largest hedge fund no longer reading last quarter’s profit and loss statement when considering an investment in a company … they’re looking at the scraped data to gain an understanding of what is going on at that company right now,” Lenchner said.
How big is data scraping?
Lenchner responded by saying “everyone needs data.”
“In 2022, our customers sent 5.5 trillion requests, a request being their way of asking us to go to one web page to retrieve data. That’s more than double the amount of all of the search queries on all of the search engines combined that year,” he said.
When it came to the specifics of the litigation, Lenchner apologetically explained there were some aspects of the cases he couldn’t speak to, citing the ongoing nature of the litigation.
However, he said, public information included in the complaints was fair game.
According to Lenchner, an email included as an exhibit in Meta’s lawsuit reveals the social media giant was “using both our proxy and our data scraping services — in other words, practically every product that we have — for more than six years.”
Then, just over two months ago, Lenchner said, he was contacted by an office at Meta he’d never dealt with before.
“They asked for an urgent call. I said, ‘Of course.’ And basically, it was a cease or desist call,” he said.
Lenchner recalled the person at the other end of the call telling him Meta had no problem with Bright Data scraping other sites, but that it must stop scraping Meta domains, including Facebook and Instagram.
“It didn’t make any sense to me,” Lenchner said. “It’s all public information.”
Discussions continued for weeks, though mostly through lawyers on both sides.
“While I can’t get into the details of the discussions, generally speaking, as written in the complaint, we told them no. It’s public data. No one in the universe should be able to block access to public information — to do so would be practically trying to be a dictatorship,” Lenchner said.
“It was also no secret who was using this specific data from Instagram and from Facebook. We had published case studies on the usage,” he said. “For instance, we had leading American NGOs using the data to fight human- and sex trafficking — you’d be amazed at the horrible, terrible things you can find online, on public internet domains.”
A second example cited by Lenchner involved an Israeli NGO that was using data specifically from social media networks to locate youths in distress, so they could try to help them before the young people committed suicide.
“So it’s not like we don’t have solid, real-life examples of what was going on here,” he said. “And we give this data away for free to anyone who can prove they need the data to stop people from being abused or even killed,” he said.
Still not satisfied, Meta pointed to a provision of the parties’ licensing agreement which specifically forbade the contractors from scraping Facebook or Instagram’s data.
“It’s one long paragraph which says scraping is not allowed and as a result, they said, they intended to sue us for breach of contract. We said, ‘Okay. Fine.’ And before suing them we deleted all of Bright Data’s company pages from Facebook and Instagram.
“We said, ‘Okay, we have no contract anymore,’ and the next day we filed the declaratory judgment lawsuit,” he said.
Lenchner said he believes it will be months, at least, before the merits of the case are heard, but at the same time, he believes the implications for the internet — not to mention the economy at large — could be grave.
According to a press release issued by Bright Data last week, because markets and society function best when public data is accessible to the public, these systems would start breaking down with a Meta victory.
At the same time, the information transparency that helps drive market competition, advance research and assist life-saving organizations would dry up, the release said.
Lenchner sought to expand on it.
“If e-commerce platforms don’t have the data that allows them to know how their competition is doing, they won’t know how to respond to be more attractive to their potential customers. Without transparency, there will be no competition and prices will go up.
“That’s just on the hardcore commercial side; you’ll also be causing a detrimental effect on the NGOs I spoke of earlier, and the over 220 universities who contract with us to collect data for research. I can go on and on and on.
“This is nothing less than a tech war. Most companies can’t fight Meta. This is not okay. The world won’t operate without access to data. It’s not okay to try and use their money and thousands of lawyers to try and shut down access to data,” Lenchner said.
“If we lose, then the tech giants will just gain more power, we will lose transparency, and the only ones who will win are those who are making a lot of money on our backs. And by ‘our,’ I mean, you and I and everyone else who is, you know, an internet citizen.”
Despite his dire warning, Bright Data’s CEO said he’s confident that “eventually, we will get a ruling that says the public information is public, and that it’s not public information when a tech giant has the ability to decide what’s public information and what’s not.”
“And that will be a win not just for me, that’ll be a win for everyone,” Lenchner said.
The Well News reached out to Meta for comment. This article will be updated with their response.
Dan can be reached at firstname.lastname@example.org and at https://twitter.com/DanMcCue
This story has been updated to clarify a quote attributed to Or Lenchner on Bright Data's conversation with Meta over the scraping of Meta's websites. Meta had no objections to Bright Data scraping sites generally. It requested that Bright Data stop scraping its websites.