These aren't slip-ups. This is calculation. — Big Tech Files

In March 2018, The Observer and The New York Times published a story that held front pages around the world for weeks: Cambridge Analytica had used data from 87 million Facebook users to profile voters in Donald Trump's presidential campaign and in the Brexit referendum campaign. A year later, Facebook paid a $5 billion penalty imposed by the Federal Trade Commission — the largest in FTC history. A week after the penalty was announced, Meta's stock rose 1.8 percent. The market took it as a relief. The amount turned out to be smaller than had been feared.

This is not an anecdote. It is the thesis this database is built around.

When the Cambridge Analytica scandal broke, the dominant interpretive frame was moral: someone behaved badly, someone would be punished, the system would correct itself. Nine years later we know it was not an exception, not an error, not a matter of one company's organizational culture. It was the logical consequence of a business model in which user attention is raw material, data is the product, and regulatory fines are a line-item cost of doing business. In 2024, Meta posted $164 billion in revenue. The largest GDPR fine in European Union history was 1.2 billion euros. That is 0.7 percent of annual sales. For comparison: an ordinary speeding ticket hits the median-wage earner proportionally harder.

Why 33 cases

Each of the 33 cases in this database was covered in the media at the time. A handful — Cambridge Analytica, Facebook's 2014 experiment manipulating users' emotions, the Pegasus disclosures, Clearview AI — became global events and made the cover. The rest were treated as news of the daily cycle: they surfaced, held space for half a day, and vanished under the next wave. To the average reader, the case of TikTok's algorithm addicting teenagers, the case of Amazon Ring cameras shared with U.S. police, the Greyball case — Uber's internal tool for evading regulators — are loose, unconnected facts whose common logic is easy to miss.

Our thesis is that this connection cannot be seen from the perspective of a single story. It becomes visible only when the stories are placed alongside each other — as in a museum vitrine, as in a catalog of precedents, as in an archive. Thirty-three cases grouped by company, jurisdiction, mechanism, and legal basis reveal not thirty-three occurrences, but one pattern, repeating with the regularity of industrial production.

The pattern is this: a company introduces a feature that generates business value and at the same time breaches the law or ethical norms. The feature runs for a year, two, five. A whistleblower, a journalist, or a regulator exposes it. The company responds in sequence: denial, apology, promise of reform, PR pivot, financial penalty booked as an operating expense. Meanwhile, a new feature is introduced in a similar gray zone. The cycle repeats. We call this the matrix — hence the name of the portal.

Why in Polish

Polish-language material on this subject exists, but it is scattered: individual pieces in OKO.press, analyses by the Panoptykon Foundation, translations of Zuboff, sporadic academic reports. There is no single place a journalist, a lawyer, or a teacher can turn to for a reference-grade set of facts, dates, amounts, and sources. Anyone who has tried to write a piece in the Polish press on GDPR that required cross-checking three cases at once knows how many hours it takes to assemble what should be one click away.

The Polish reader is systematically treated as periphery in this story. European Union regulatory documents apply to them directly but are rarely translated in full. Class-action settlements in the United States do not cover their claims. Hearings before the Court of Justice of the European Union, where the privacy of hundreds of millions of Europeans is at stake, are reported in the Polish press superficially and often with significant delay. The result is that a Polish citizen, whose data is exported to Silicon Valley on the same terms as a German's or a Frenchman's, has markedly worse access to information about what happens to that data.

This database will not repair that asymmetry. But it aims to be a point at which the asymmetry becomes visible and tangible.

What this database is not

It is not a news service. We do not update it daily, we do not chase the news cycle, we do not pass along unverified rumors. Every card is a reference document — written so that it can be cited in a news article, in a court filing, in a master's thesis. Sources are primary: regulators' decisions, court documents, testimony before parliamentary committees, original investigative reporting. Every figure and every date is verified against at least two independent sources. When facts are uncertain, we say so plainly.

It is also not a conspiracy theory. We do not claim there is a single decision-making center steering Big Tech. We claim something far less dramatic and far harder to refute: that the market, regulatory, and technological structures within which the major platform companies operate create strong and coherent incentives for certain behaviors, and that those behaviors — documented, cataloged, measured — arrange themselves into a repeatable and predictable matrix.

Nor is this a call to Luddism. We are not arguing for a return to a pre-internet world. We are not urging readers to delete their accounts. Each of us on the Matryca editorial team uses at least several of the services described in this database, because the alternatives are often impractical or locked away in niches. The aim is not moral but epistemic: to show how the thing works, so that decisions can be made with open eyes. Awareness does not guarantee freedom, but the absence of awareness guarantees the opposite.

Who it is for

The database is written with four circles of readers in mind. Journalists get a ready-made bibliography with links to primary documents, a tabular accounting of fines, a chronology of events, and the names of whistleblowers who can be tracked down. Lawyers and mediators get a compilation of precedents, legal bases, and jurisdictions arranged so that they can be used in argument or in the preparation of class-action suits. Teachers and lecturers get material for workshops on technology ethics, privacy law, and investigative journalism — every card follows the same structure, which makes comparisons easier. Citizens get a "Citizen takeaways" section with concrete steps that can be taken in one afternoon.

We do not expect each of these circles to read the database the same way. A teacher is looking for an example for a lesson; an attorney for a precedent to cite; a parent for information about TikTok; a student for a quotation to use in a paper. All have the right to find what they came for without having to wade through an ideological layer this database deliberately does not carry. The arguments are in the introductory essays. The cards hold the facts.

What we expect

The most interesting moment in this work is not yet behind us — it lies ahead: the moment the database is put to use. When a journalist takes three facts from here and writes a piece we would have written differently. When an attorney uses the compilation of fines in a court appearance. When a high-school teacher shows her class the Cambridge Analytica card, and her students notice the same mechanism in TikTok. When a staffer at Poland's data protection authority finds the argument she needed. When a Facebook user closes their account — or does not, but changes privacy settings they did not know existed.

This database is not a finished argument. It is a set of building blocks from which an argument can be assembled. Our introductory essays — four voices, each with its own stake — demonstrate four ways of assembling one. We do not claim these are the only ways. We claim that placing them side by side is itself a position: the problem we are examining is neither purely technological, nor purely legal, nor purely civic. It is all of these at once. And for that reason a single perspective — even the best one — cannot contain it.

Thirty-three cases. Four voices. One database. The rest depends on what you do with it.