Background Discussion

Background talk with Byron Tau: How intelligence agencies are buying data

June 2024

Wednesday

26

15:00 - 16:00

(CEST)

with

Thorsten Wetzling

Programmes

Digital Rights, Surveillance and Democracy

Copy URL

June 2024

26

Wednesday

15:00 - 16:00 (CEST)

Copy URL

Intelligence services, the military and law enforcement bodies can gain access to enormous amounts of data by buying commercial data from data brokers and internet firms. This growing but often still unnoticed method of state surveillance is a profound challenge to both civil liberties and national security. In the United States, after several media reports, this became subject of controversial debates and triggered legislative reforms. In most European states however, this method still draws very little attention.

What is known about the scale and specific cases of cooperation between commercial data brokers and intelligence agencies? We are delighted that Byron Tau, a former investigative reporter for the Wall Street Journal and author of the new book Means of Control: How the Hidden Alliance of Tech and Government Is Creating a New American Surveillance State went into a conversation with SNV's Thorsten Wetzling to explore the subject further.

The background talk took place on 26 June at 3:00 pm CET.

On Bryon Tau:

Byron is an investigative and enterprise journalist who specializes in law, courts and national security. He’s the author of the book “Means of Control” about how consumer data is increasingly being repurposed for government surveillance. He currently works for NOTUS, a nonprofit newsroom in Washington D.C. Previously, he worked at the Wall Street Journal, where he served as a White House reporter during the Obama administration; covered Congress with a focus on the intelligence, oversight and judiciary committees; and was a legal affairs and national security correspondent. Before that, he worked at Politico, where he covered the White House, lobbying, campaign finance and politics.

Dr. Thorsten Wetzling, Lead Digital Rights, Surveillance and Democracy at interface: Okay. Well, I will gently start because you're waiting so patiently, Byron, and it's great to have you with us. I welcome all of you to the first public event of interface, a European think tank specializing in information technology and public policy. My name is Thorsten Wetzling, and I direct interface's work stream on digital rights, surveillance, and democracy. I'm thrilled that Byron has accepted my invitation to discuss his contributions to shedding light on what may be called public-private co-production of surveillance.

Before I introduce Byron, allow me to briefly set the stage. We'll focus today on government access to personal data. It's an evergreen topic, yet one with lots of loose ends. It has also become an increasingly dynamic field of public policy due to the rapid technological evolution and changes in geopolitics. Governments have an obligation to provide security. This is a daunting task given the many genuine threats and geopolitical disruptions we currently face. Much of the data government needs to draw inferences on threats, and trends, and to track all sorts of malign actors within and outside their jurisdiction is held by private actors. This means that the tools that our governments use for data acquisition and processing and the implications for fundamental rights and freedoms are often less clear. Some have grown in the shadows of our democracies, to put it mildly. Today, we hone in on what may rightly be called a frontier territory of national security law and practice. The purchases of data or datasets from private sector brokers and vendors by national security agencies.

Means of Control tells a remarkable story about "how tiny experimental programs, data vendors, and obscure contractors have brought us to the precipice of a digital panopticon, one built by corporate America and blessed by government lawyers.

With us today is Byron Tau, an investigative and enterprise journalist who specializes in law, courts, and national security. He is the author of the recently published book, Means of Control: How the Hidden Alliance of Tech and Government Is Creating a New American Surveillance State. It is available in European bookstores too, and I would advise you to get a copy. The book tells a remarkable story about "how tiny experimental programs, data vendors, and obscure contractors have brought us to the precipice of a digital panopticon, one built by corporate America and blessed by government lawyers." Welcome, Byron. It's a pleasure to have you with us today.

Byron Tau, notus.org: Thanks so much for having me.

Thorsten: You're welcome. Before we begin, we just start a quick word on the setup for our audience. Over the next roughly 30 minutes, we'll start with a conversation with Byron on his book and his findings. The following half of the event is dedicated to your questions. Please submit your questions in writing using the Q&A tab. You can pose your questions anonymously or provide your name and organization. Up to you. If you provide your name, it will be in the recording. You can also vote on the questions so we get a better idea of what everyone is most interested in.

So Byron, with this, let's start. The first question is more of a grand tour question to me on private sector actors and government interest to get us started. In your book, you describe four generations of data brokers or data providers. Could you briefly give us an example for each generation and tell us what evolution you see over the past two decades — in particular with regard to the services that these private sectors provide as well as their interaction with national security agencies?

Byron: Sure. So this is sort of my taxonomy. I kind of made this up myself, but it sort of made sense as I was doing the research and as I was grouping data brokers into different capabilities and different types of data sets, this organization emerged. At a high level, there are what you call the traditional data brokers. There are what I call social media data brokers or social data brokers. Then there are companies that focus on location. So you can either call them location brokers or because so much of the information is derived from the advertising market, you could arguably call them ad tech brokers. The last category of data I call gray data, which is a term I borrowed from the term gray literature, which applies to material that's not traditionally published. These are data brokers collecting data that is not traditionally brokered but is nevertheless available for the taking. And I'll get to examples of what that is.

Data brokers are collecting information either from the advertising system, the system of bidding that we'll talk about, that results in the serving of targeted ads, or sometimes app publishers themselves directly sell this information to data brokers

Your traditional data brokers, they're not quite household names, but they're probably the largest and most visible companies in the space. So think Thomson Reuters, think the credit bureaus who have entered the data brokerage business, like TransUnion. Think of Axiom, which was a giant and early in the field. All of those are data brokers, and traditionally they've collected information from public records, from early marketing information, so stuff like magazine subscription lists and other mailing lists. And by and large, the information that those kinds of companies have, I think today we would think is kind of innocuous, but at the time they were digitizing it and compiling it and starting to work with government in the late 1990s and early 2000s, it was controversial. So these are the kind of traditional mainstream original data brokers. Then as social media becomes a larger part of our kind of social mix and the technology platforms that people use, they're sprung up to be data brokers that would specialize in the acquisition of social media data or the monitoring of social media data. So probably the best known in that space is Dataminr, but there are a million other small companies that have come and gone.

In my book, I focus on one company called Babel Street. Which early on specialized in social media and later branched into other things. Then you come along to location and advertising data brokers. So these are data brokers that are collecting information either from the advertising system, the system of bidding that we'll talk about, I think, that results in the serving of targeted ads, or sometimes app publishers themselves directly sell this information to data brokers. And again, these are small companies. Very few of them are household names. In my book, I focus on one called Gravy Analytics or Venntel, but there are dozens of others out there. There's one called Outlogic, which used to be called X-Mode, which I also talk about in my book. Foursquare is one major company they don't work with the government but they are a company that collects a lot of data and we're early in this space and there's many, many, more of them.

If you are clever and smart and you distribute enough of these sensors around the world, you can collect that information. You can do that with devices running an SDK. You can do that with physical sensors. You can merge it with something like a license plate system but there are many ways to do that. And the other example in my book I have of gray data is internet data.

Finally, when it comes to gray data I talk about two categories of it in my book. One is radio frequency data that comes off of all these devices that we're now carrying around. I'm wearing a fitness ring right here that's emitting radio frequency signals. My Apple AirPods emit radio frequency signals. Even your car tire pressure sensor emits a radio frequency signal. And because that information is rarely encrypted and rarely do these devices rotate their unique identifiers, that data can be captured. It can be geotagged so you can see where you saw various radio frequency identifiers. And if you're clever and smart and you distribute enough of these sensors around the world, you can collect that information. You can do that with devices running an SDK. You can do that with physical sensors. You can merge it with something like a license plate system but there are many ways to do that. And the other example in my book I have of gray data is internet data.

So like the kind of cybersecurity data, so DNS lookups and NetFlow data, that's useful for cyber threat hunters. It's not sort of traditionally brokered data, but if you know how a DNS server operates, you know where to get it. If you can route traffic one way or another through these internet gateways or through DNS servers, you can capture some of that information and it is traded and sometimes sold among data brokers. Each of these generations of data brokers has some relationship with the government. These early data brokers that I talked about, like your Axioms and your TransUnions and your LexisNexuses and your Thomson Reuterses, all of those got into the government business in earnest after 9-11. Some of them had dipped their toes in the water a little bit before that. But 9-11 was a different kind of mission for governments. And so, these governments wanted a lot more information about the average person in the population than historically they had needed. And these data brokers had a lot of information. So this relationship blossomed after 9-11.

With social data brokers, the Arab Spring was a big turning point, the killing of Osama bin Laden by US special forces, and then the rise of all sorts of protest movements both domestically here in the United States and internationally contributed to the notion that law enforcement and intelligence agencies and others needed to be on social media or monitoring social media. And tech brokers came along in the 2010s as mobile phones became kind of a bigger part of our social lives. We started putting everything on them. They started tracking our locations 24/7. And many of these apps had access to that data. And there are some examples of maybe some early Israeli companies that were collecting it in the early 2010s. But as a technology, it came into government probably at least in the United States and probably in other Western democracies in the mid to late 2010s. And gray data is the real true frontier. I don't even think that a lot of government agencies know quite what to do with this kind of data, but they are collecting it. There are some companies that are integrating the collection of wireless data into their license plate reader systems. There was a military experiment in the United States to use a distributed network of phones to try to get radio frequency data. So governments are experimenting with this stuff. They know about it and they're trying to figure out what it's useful for and where to get it. That's a big, broad overview, I think.

Thorsten: I called it a grand tour question, Byron, for a reason. Because I think we wouldn't be able to do justice to your 22 chapters in the book. And it's a fascinating read, very well written, I think. It also tells the funny story of how you went to your local car dealer and tried to get rid of some of those tire pressure sensors that can be used for monitoring your very movements. I really encourage people to read your book. I would now like to turn the focus to another illuminating case. In your book, you discuss a tool called Locate X, and that got my attention. Could you please tell us more about this specific tool and the typical scenarios for its use?

You discuss a tool called Locate X, and that got my attention. Could you please tell us more about this specific tool and the typical scenarios for its use?

Byron: Sure. So Locate X was a tool made by a company called Babel Street. They were relying on data from another one of their partners. That partner is called Venntel. Essentially, the partner was originally a commercial company that was trying to do advertising. They were originally trying to make an events app. And then they got more into the location-based advertising space, and eventually, they realized there's a sideline of business we could have, which is selling to either government contractors, other companies making tools for surveillance purposes, or directly to the government itself. So the data from Venntel comes from your traditional typical apps, your weather apps, your games, anything you've opted in to give your location permissions. The apps that are most valuable are the ones that have 24/7 location permissions. And back when this company launched, that was a lot more, a larger percentage of the population than it is today because I think Apple and Google have made some changes that tried to remind people, "Hey, you know, these app publishers are getting all this information." But at the time when they launched, I think a lot of people were very sloppy and careless with their privacy settings on their devices and hadn't thought through the implications. And so, they had a pretty rich set of data from the globe. I think they were characterizing themselves as having several billion location points. I forget if that's either a day or in a month. It's a pretty large chunk of mobile devices in the world are being seen at least a couple of times in this dataset.

A lot of people were sort of at least ignorant or maybe willfully ignorant of the privacy consequences of sharing their location 24/7, resulting in a particularly rich data set, a useful tool for intelligence agencies and public safety agencies to try to solve crimes, to try to identify the patterns of life of persons of interest, to try to identify known associates. Where your device spends time and who it spends time with gives a lot of interesting clues to who you are and who you hang out with. This was fascinating to public sector entities because historically that kind of surveillance has been done.

Locate X was built using this technology, using this dataset, and it was built by a company called Babel Street. The aim of it was to sell this as a tracking device to the military, law enforcement, US special forces, and the intelligence community. Because at the time, again, a lot of people were not opting out. A lot of people were sort of at least ignorant or maybe willfully ignorant of the privacy consequences of sharing their location 24/7. And so you had a particularly rich data set, you had a useful tool for intelligence agencies and public safety agencies to try to solve crimes, to try to identify the patterns of life of persons of interest, to try to identify known associates, right? So where your device spends time and who it spends time with, that's a lot of interesting clues to who you are and who you hang out with. This was fascinating to public sector entities because historically that kind of surveillance has been done. It can be done through the cell phone companies. It can be done through computer network intrusion, through hacking. You can get that information. The US military and other advanced militaries invest a lot of money in specialized SIGINT equipment, and they can do that kind of thing.

This was a different sort of scale because A, it's a commercial vendor, and B, at least domestically in the United States, all these lawyers said, "Well, this is a commercial tool, these people, this data is for sale, we don't need a court order, we don't need a warrant."

This was a different sort of scale because A, it's a commercial vendor, and B, at least domestically in the United States, all these lawyers said, "Well, this is a commercial tool, these people, this data is for sale, we don't need a court order, we don't need a warrant." And so it opened up different avenues, especially for law enforcement to try to solve crimes without needing to show the barrier in the United States, which is probable cause of a crime to go track someone's cell phone. It was a very attractive tool. It was a controversial tool when its existence was revealed, I believe by a publication called Protocol, and it has been controversial ever since. The one thing I thought was fascinating about it is if you get your hands on the contracts that Locate X has especially with police agencies, but I think this is the standard provision in all of them. They say you're not to reveal the existence of Locate X in any sort of court proceeding, which in the United States that's quite a problem because, in theory, the police are, if they're going to arrest you, they're supposed to tell you the chain of events that led to your arrest.

In this instance, you have a private company building a pretty powerful surveillance tool that's explicitly telling these law enforcement entities that they're not allowed to do that.

And in this instance, you have a private company building a pretty powerful surveillance tool that's explicitly telling these law enforcement entities that they're not allowed to do that. And so I think, especially on the public safety law enforcement side, in the United States, it raised these profound questions about why are you trying to conceal the nature of this tool and the use of this tool? And is it even constitutional or legal for police agencies to conceal the fact that they've acquired this data and they're using this tool?

Thorsten: Thank you, Byron. It sure sounds attractive and very convenient — if you think about the pesky paperwork that normally has to be submitted prior to obtaining a warrant for this kind of information. Here, law enforcement and national security agencies have found a means to circumvent this. What is more, it comes with an NDA.

You are concerned about the way governments contribute, penetrate, exploit these platforms through what you call cutouts, intermediaries, research agreements, shell companies, and other means. These platforms can be used not just to survey billions of app users, but also as a vector to serve malware on individual targets and to engage in disinformation campaigns.

I now want to bring our focus to the role of state actors on ad exchange platforms. In your book, especially towards the end, you're concerned about the way governments contribute, penetrate, exploit these platforms through what you call cutouts, intermediaries, research agreements, shell companies, and other means. Plus, these platforms can be used not just to survey billions of app users, but also as a vector to serve malware on individual targets and to engage in disinformation campaigns. Can you please elaborate on this and provide examples of how state actors have penetrated or exploited these ad exchanges?

Byron: Sure. I think, let me try to bring up a very specific example I'm going to share my screen and I think it will be illustrative to your audience. Okay, so this is from a story I did with the Wall Street Journal, I think you guys can put the link in the chat. But this is a look at how data flows directly from phone apps through this mind-numbingly complicated advertising ecosystem and goes to state actors. So just as to back up a little bit, the way the digital advertising system works is it's an instantaneous ad auction process. So anytime you fire up a website or anytime you open an app, that has a banner ad in it, you're participating in this gigantic, enormous system of data collection that pretty much links every computer and device and phone on earth to this backend network of advertisers that are trying to serve you digital ads. And it's a two-way exchange of information.

So your device provides information about itself, and in some cases about you to these ad exchanges and those ad exchanges have thousands and thousands of parties sitting there waiting to try to place bids and they're using the data that is sent to them, is sent to the ad exchange from the phone user or the computer user to try to figure out if you're in their audience segment. Of course, a lot of technical information comes along with this. Apple and Google have given every phone a unique identifier. There are all sorts of like your IP address goes along with it in many cases, the app that you're in sometimes goes to these ad exchanges, your GPS coordinates if you've enabled GPS, the app to know your GPS, and many other pieces of technical information, including the configuration of your screen.

Here's how data flows from these things to state entities: You have the phone apps up here at the top. And those are the things we load our phones up every day, often without thought. Many of them, if not most of them, have some sort of advertising system plugged into it. And those advertisers, and so this explains kind of what technical information gets passed back. You see a unique user ID. It's not a name, it's not a phone number, it's a string of letters and numbers, but that string of letters and numbers usually sticks with you. It's usually persistent unless you take some effort to de-anonymize it, to reset it. Then, you get some device-type stuff in your IP and often your geolocation. So, this was an example of one app, Life360 was sending a lot of data to these ad exchanges and was sometimes even partnering.

These data brokers down here, these advertisers, they are basically ad exchanges, but they also sometimes have relationships with data brokers, sometimes they are data brokers, and sometimes companies are merely sitting on their networks and just siphoning the information that crosses the network.

So these data brokers down here, these advertisers, they're basically ad exchanges, but they also have, sometimes they have relationships with data brokers, sometimes they are data brokers, and sometimes companies are merely sitting on their networks and just siphoning the information that crosses the network. So a lot of them you've probably never heard of, they're not household names, but they're the names that you see in those weird redirects when your computer gets stuck or when you look at the Chrome developer tools. Some of them are, as I said, data brokers themselves, like SafeGraph buy information directly from some apps. But these over here on this side, these are essentially ad exchanges. And so that's how information flows to this data system.

States have, of course, become interested in this information because it's essentially cyber information on every phone, every tablet, and every computer on the planet. If you take a lot of privacy precautions, there won't be a ton of metadata in there. But if you don't, they're going to get the full haul.

States have, of course, become interested in this information because it's essentially cyber information on every phone, every tablet, and every computer on the planet. I'm not exaggerating when I say this. Everybody sees targeted ads as they move around the web. It's not all deeply rich information. If you take a lot of privacy precautions, there won't be a ton of metadata in there. But if you don't, they're going to get the full haul.

The data was ultimately flowing to all sorts of US government entities. You had the Air Force Cyber Operations, you had the National Geospatial-Intelligence Agency, you had the NSA, you had the Defense Counterintelligence Agency, you had the Joint Special Operations Command, and this existed in other channels. So a lot of other brokers were giving it to the Department of Defense.

So in this instance, I found several government contractors who had a relationship with one of these advertising companies. So Near Intelligence was plugged into this advertising system and they were saving billions of bid requests a day or a month. And they were selling the bid requests to pretty much anyone that wanted. There were a lot of commercial entities. They were legitimately a commercial data broker. But they also would knowingly contract with government contractors. So there was one called Aelias that was working with US Special Forces. Another one was called nContext. That was a marketing company that was set up by a defense contractor. And the data was ultimately flowing to all sorts of US government entities. So you had the Air Force Cyber Operations, you had the National Geospatial-Intelligence Agency, you had the NSA, you had the Defense Counterintelligence Agency, you had the Joint Special Operations Command, and this existed in other channels. So a lot of other brokers were giving it to the Department of Defense.

I've been told that sometimes governments just set up straight-up shell companies. There's a full list of Google ad partners. It's several thousand entries long. And if somebody scrubbed that, I'm sure they would find some very interesting things.

So that's essentially how this data flows from phone apps to government entities. There are other pathways. Sometimes states, there was an Israeli company that set up something very similar that had set up a kind of a marketing company, or they partnered with a marketing company, and then they were sharing the data with a defense contractor. And then it was going to states around the world that were buying this finished surveillance product. So that's a little, that's just one data flow that I identified there. But there are other ways. I've been told that sometimes governments just set up straight-up shell companies. There's a full list of Google ad partners. It's several thousand entries long. And if somebody scrubbed that, I'm sure they would find some very interesting things.

Thorsten: Thank you for a very good illustration. And thanks for sharing the link to the article with our audience. I saw a tab open on your computer — sorry for spying on your computer here —it was a Freedom of Information Act template, I think. This brings me to a question about your research methods, Byron. You devoted five years to this research. And I think to gain as much insight as you did, you conducted interviews, especially with mid-level intelligence practitioners. You reviewed thousands of pages of government contracts. You studied secondary information and the findings, obviously, of other fellow investigative journalists. But you also sued Special Operations Command. You mentioned this in your discussion on the Berber Hunter toolkit, another tool that we don't need to get into details here, but what specific information were you after when you processed litigation and how did litigation serve your interest?

Often it takes years and often the only way to get them to take you seriously is to litigate it.

Byron: Sure. So yeah, I sued. It was a combined lawsuit against I think five or six government agencies, all for outstanding and unanswered Freedom of Information Act requests. In the US, you make a request to the government. In theory, they have 20 days to comply. In practice, they never do that. They have pretty much never gotten one in 20 days. So often it takes years and often the only way to get them to take you seriously is to litigate it.

This lawsuit, even though I filed it in December of 2021, is still pending. It's now June of 2024. One of these government agencies has not turned over its final production. So even when you sue, they can run out the clock.

I retained lawyers at my own expense because this was a book project and we went to court. I would not say that was particularly fruitful. I did get a very helpful and revealing trove of DHS documents that I probably wouldn't have gotten otherwise. But in terms of the military documents, I did not get a lot back. The ones I got were heavily redacted. And this lawsuit, even though I filed it in December of 2021, it's now June of 2024. Still pending, one of these government agencies has not turned over its final production. So even when you sue, they can run out the clock. They can make it difficult knowing that you have a deadline and you have to turn your book in to a publisher. So I got some records from that.

The best information I found was actually in the unclassified contracting records. The US government's contracting system is a mess, but ironically, there are data aggregators and data brokers that digitize it and make it easily searchable.

The best information I actually think I found was actually in the unclassified contracting records. So the US government's contracting system is a mess, but ironically, there are data aggregators and data brokers that digitize it and make it easily searchable. And so there's one called GovTribe, which I paid them an annual subscription to be able to search the US government's contracting database. In the US, a lot of these programs are unclassified or they weren't classified when I started writing about them, and maybe since they've been up classified. So there's a lot of information in the contracting records if you know how to read them if you know what you're looking for. The US government is fairly transparent, relatively speaking. So there are a lot of other documents, including DHS write these privacy impact assessments. I was able to get a lot of clues about what the government was doing there. So procurement stuff, a little bit of FOIA. I did a fair amount of FOIA at the state level, weirdly. So sometimes these contractors would partner with state universities, and State Freedom of Information Act laws are applied to universities, and they are a lot quicker than the federal ones. So I was able to get hundreds of pages of documentation that way.

LinkedIn was a tremendous resource. I know there's been reports of Chinese and Russian spies recruiting on there, while journalists use it too. Lots of people talk about the programs that they were on, and if you know the code words, you know the right thing to search.

And then, honestly, just LinkedIn was a tremendous resource. I know there's been reports of Chinese and Russian spies recruiting on there, while journalists use it too. Lots of people talk about the programs that they were on, and if you know the code words, you know the right thing to search. You can find really interesting people and try to get them to talk to you. Most of the time they won't, but sometimes they will. And so just kind of using human networks. And then once you get a few people to talk to you, they'll introduce you to others. And so that was my primary research methodology of just both going after the documents. I forgot court records. There were a number of actually interesting employment disputes that spilled into the court records. And I got a lot of interesting information that way. So never underestimate that. And the fact that companies and people will sue one another. And when they do, they dump a ton of information into the public records. So that was basically how I reported this.

Has oversight in that sense has failed the American people in this regard? Did they did look into this enough? Were they perhaps not interested enough or active enough?

Thorsten: That's very insightful. Thanks. I have a follow-up question related to this because we know that there are comprehensive oversight mechanisms in place in the US. And the US, I think, rightly prides itself with sporting a lot more transparency on national security matters than its Western European allies. But oversight bodies, be it oversight committees in the House or in Congress, PCLOB, or internal review bodies, civil liberties offices at the Office for the Director of National Intelligence … presumably, they should have access to government contracts with private entities, some even to classified contracting records, but certainly the unclassified contractual records, and draw their conclusions about the implications for American privacy interests. Yet you felt the need to investigate too. Do you think that oversight in that sense has failed the American people in this regard at the time when you wrote the book. Did you feel that they didn't look into this enough? Or that they were perhaps not interested enough or active enough?

Byron: Yeah, it's a good question. I think on these sorts of frontier technologies, I don't think these oversight bodies are often the fastest moving. So sometimes it takes them a little while. Sometimes people at the operational level get something started. Lawyers say it's okay at a low level. They're not thinking through the big-picture implications. And so I think part of that's what happened here. And none of these programs were run sort of off the books or illegally. But I don't think the government had fully thought through the implications of relationships between these commercial entities and commercial data brokers, and the surveillance programs that they were putting together.

There is no prohibition or law on using any of this data at the subnational level. It is considered commercial, it's considered open source, and there's nothing in our law that would stop a police agency or a private investigator from obtaining large amounts of it and doing sort of an end run around what we have traditionally thought of as the limits on policing.

The second thing I'd say is that while I do, I truly believe the US intelligence community is fairly well overseen, that is not true across the board of every public safety agency in the United States. So, I think the FBI and I think DHS, there's a lot of eyeballs on them. They tend to come under a lot of journalistic, congressional, and civil society scrutiny, but there are 19,000 police departments in the United States, not all of whom are overseen well. And truthfully, there is no prohibition or law on using any of this data at the subnational level. It is considered commercial, it's considered open source, and there's nothing in our law that would stop a police agency or a private investigator from obtaining large amounts of it and doing sort of an end run around what we have traditionally thought of as the limits on policing. And not to say they're doing it illegally, but it's de facto a runaround, a barrier that we have put in place. And that barrier was there historically for good reasons. And so I thought both the big picture implications were not being thought through.

Nobody was putting in their privacy policy that we may sell it to the Department of Homeland Security or the US intelligence community. There was an element of misleading the consumer about exactly what was happening with their data when it left their devices.

And then just at the pure consumer level, consumers were being told when this data was collected that it was anonymized or de-identified and that it was being collected for commercial purposes. Nobody was putting in their privacy policy that we may sell it to the Department of Homeland Security or the US intelligence community. There was an element of misleading the consumer about exactly what was happening with their data when it left their devices. And then the government agencies were also playing this game a little bit, right? They were saying, "Well, people have opted in, they've consented to this kind of tracking." Well, they've consented to it for commercial purposes. You haven't told them the full story. And so, I mean, my reporting was generally aimed at telling the public the full story. Like, "Look, this is actually where your data goes when it leaves your phone. It's not just de-identified data to an advertiser. It goes to firms that are private investigators. There are instances where journalists have gotten this data and tracked individual people. It goes to think tanks and civil society groups that use it for whatever their agenda is. And then it goes to government." And if you don't like that as a consumer, you should know that when you put the app on and you accept the terms of service.

Thorsten: I want to give a reminder to our audience that you can start posing questions right now in the Q&A section.

Byron, yes, this is another interesting element of your book. You recast the internal deliberations at the Department of Homeland Security on the implications of the Carpenter ruling by the US Supreme Court. When it comes to the final government position at the time, it boiled down to the mere assertion that people had opted in, and that the people had been notified about their data being transferred to third parties, including government agencies. In your book you call this a lie.

This brings me to my last question on the future governance of data purchases by national security agencies. Looking from the outside in, I know that the US has recently begun to adopt a new policy framework on commercially available data. This includes President Biden’s Executive Order on this but there's also an ODI framework on sensitive commercially available data. The Department of Justice has sought comments on forthcoming regulation, too.

My question to you is this: Are we approaching the end of the golden era of ad exchange exploitation against this backdrop of recent regulatory developments? Are you confident that the new generations of data brokers will have a harder time going forward to sell data to what you call three-letter agencies? And will agencies be more restrained?

Those rules promulgated by various agencies, including by the ODNI, are a belated acceptance that this is sensitive data. It's not just commercially available. It's not to be treated lightly.

Byron: It's a good question. First of all, I think those rules promulgated by various agencies, including by the ODNI, are a belated acceptance that this is sensitive data. It's not just commercially available. It's not to be treated lightly. People are putting a lot of information in the hands of corporations, and when they do so, they're doing so on some level of trust that that information won't be abused. It is good to see these agencies recognize that, no, this isn't just basic commercial information. This reveals a lot about people.

The biggest difference from when I started reporting on this to now is that, honestly, Apple and Google have made a lot of changes to their operating systems. Those changes have resulted in less location data being available.

As to the policy changes, I think the biggest difference from when I started reporting on this to now is that, honestly, Apple and Google have made a lot of changes to their operating systems. Those changes have resulted in less location data being available. In addition, I think these ad exchanges are beginning to understand exactly how valuable their data is and are potentially also moving.

They're not trying to go away, they're not giving up their business line, but they are trying to do a better job of vetting who's on them and what the uses are. As much as various states, state governments in the United States, as well as states at the national level, meaning European states have put in place privacy regimes, the entities with the most influence on policy is still the tech giants.

Of course, they're not trying to go away, they're not giving up their business line, but they are trying to do a better job of vetting who's on them and what the uses are. So it's kind of an interesting revelation, if you think about it, that as much as various states, state governments in the United States, as well as states at the national level, meaning European states have put in place privacy regimes, the entities with the most influence on policy is the tech giants. And so there just simply is less location data in the commercial market because consumers have been given real meaningful ways to opt out of some of it, have been given persistent reminders that an app in the background is tracking their location 1,500 times in a day or whatever. And consumers have responded. They don't necessarily want to be tracked across apps. They don't necessarily want to have some shady weather app collect their location points that many times. And so many have turned a lot of this off, and that has made it a less valuable tool.

We are now moving into a completely different generation of interesting commercial capabilities, including AI systems and large language models. I don't even think we're beginning to understand how data can be collected off that and how it could be used or re-identified, as much as these large language models claim that they're either not saving the information or it's never traceable back to an individual. Most of these commercial companies don't have a tremendous incentive to fix those flaws.

But of course, technology continually changes, we're now moving into a completely different generation of interesting commercial capabilities, including AI systems and large language models. I don't even think we're beginning to understand, A, how data can be collected off that and how it could be used or re-identified, as much as these large language models claim that they're either not saving the information or it's never traceable back to an individual. I think the thing we've learned over and over again is that all technology systems are breakable, all of them have flaws and most of these commercial companies don't have a tremendous incentive to fix those flaws. They have more of an incentive to put a shiny new thing in front of the public. And so I think it'll just be different data. The pipeline of location data and other data broker data may have been changed, may have decreased, but there'll be other data sets and it will be up to journalists like me or the next generation of journalists to figure this stuff out and to tell the public what's going on.

There's probably a flow of ad tech data through the direction of Five Eyes countries and/or Israel. I found that companies, I think the biggest and most responsible companies are genuinely trying to comply with GDPR and to try to do right, especially if they're in the European market, they have a presence there.

Thorsten: Thank you, Byron, for those insights already. I think we go right to the questions from the audience now in the interest of time.

Have you looked at what is happening in Europe in relation to ad agencies selling data to law enforcement agencies? Do you know if this is happening in the EU and does the GDPR limit law enforcement use of data in Europe?

Byron: I am not an expert on the GDPR. Here's what I can say, I know there are Israeli companies that do business with countries around the world. I'm assuming they do some business with European companies and they are selling ad exchange data. I know that, well, I don't know this for a hundred percent sure, but I believe that several of the contractors that I wrote about, I think they do work in another Five Eyes countries and possibly other Nine Eyes countries or other US allies. So there's probably a flow of ad tech data through that direction. The one thing I can tell you explicitly in my research is that I found that companies, I think the biggest and most responsible companies are genuinely trying to comply with GDPR and to try to do right, especially if they're in the European market, they have a presence there.

A number of small companies that you have never heard of who are either solely in the United States or often even not in the United States, in Singapore or India or somewhere else, that do not see themselves as having any obligation under GDPR because they have no presence in the European Union, but they are collecting European data. There are gaps in the regulatory framework.

I will say there are a number of small companies that you have never heard of who are either solely in the United States or often even not in the United States, in Singapore or India or somewhere else, that do not see themselves as having any obligation under GDPR because they have no presence in the European Union, but they are collecting European data. There are gaps in the regulatory framework. I do think GDPR has probably had some effect on this market.

The systems that companies, primarily the advertising entities, have set up are not able to audit who gets European data at this point, or at least they weren't when I was doing this reporting. When you have entities that are outside of the jurisdiction of the European Union that don't bank there, they don't have employees there. Many of them don't fully comply and are brokering data on European citizens.

And again, I think many companies want to comply and want to be good corporate citizens, but some don't. Some are bad actors. And these systems that these companies, primarily the advertising entities, have set up, they are not able to audit who gets European data at this point, or at least they weren't when I was doing this reporting. Maybe something's changed in the last few months. And they're not able to enforce the terms of GDPR. It's kind of a wink and a nod and a promise that they're doing so. And so when you have entities that are outside of the jurisdiction of the European Union that don't bank there, they don't have employees there. Many of them don't fully comply and are brokering data on European citizens.

Thorsten: In your book, just as a footnote to that question, you have the story of a Norwegian broadcaster who found out through his investigations that his European apps via intermediaries have sent data to Venntel. So that is happening.

Next question: Do you think the development of these technologies, and the work in investigating processes relying on them has changed the stance that national security bodies will take toward large data sets? Now that the appetite has been wetted, will intelligence and national security entities be able to build systems to collect this information without relying on commercial partners in the same way?

Byron: That's a good question. I think there's always been an appetite for big data, at least I've mostly studied the US intelligence agencies, I know less about the European ones. But I mean, the NSA has been in the big data business for a long time, it was getting lots and lots and lots of data through the telecom system and from the traditional sort of big tech companies under legal process. I think some of that went away with the rise of encryption and part of that is partly in response to that they turn to the commercial market it's a good question as to whether -- Yeah, like now that these commercial tools and this big data is out there that they'll just keep, they'll find other ways to get it.

I think the rise of these large language models and AI tools is only going to increase the hunger for data on the commercial side, and will even further solidify data as a commodity in the minds of these companies. The ability to use those systems to sort through large piles of data will also make acquiring huge amounts of data more attractive to public safety and national security entities.

I think the answer is yes. I also think that the rise of machine learning and AI and all these new tools that we're talking about, all of which post-dated my book, I turned in my book a month before ChatGPT, I turned in the first draft anyway, a month before ChatGPT launched. I think the rise of these large language models and AI tools is only going to increase the hunger for data on the commercial side, and will even further solidify data as a commodity in the minds of these companies. And a commodity can be sold and bought without real true thought about its implications. The ability to use those systems to sort through large piles of data will also make acquiring huge amounts of data more attractive to public safety and national security entities. So the better these systems get at sorting through piles of data and spitting out outliers or suspicious patterns or whatever they're scanning for, I think the hunger for more and more data will follow from that.

Thorsten: Are there vendors who are more or less reputable? Are there times that you think that governments should not be doing business with them?

Byron: So I guess my thought on the vendors is that almost none of them are doing anything outright illegal. Some of them were pretty -- Some of the commercial vendors were pretty much violating the terms of ad exchanges. So yes, there are some disreputable ones and a lot of the location data is sourced from disreputable or partially disreputable sources. On the government tool side, I think most of those vendors were building systems to the best of their ability to comply with the law, and they saw an opportunity to bring a new capability to the government. They saw a new data set that was out on the commercial market, and they responded, I don't think they're necessarily evil companies, but I think the social consequences of making that kind of data available so easily, so frictionlessly, without the paperwork that governments have traditionally had to go through to do intrusive surveillance.

I think that raises a lot of privacy and civil liberties questions, and at least raises the question of, is this the kind of world we want to live in, in rule-of-law democracies? Again, I don't think the vendors are inherently doing anything wrong. I think it's up to policymakers and oversight bodies and civil society and the press and average citizens and their roles, voters and consumers to ask these tough questions about historically, all that paperwork is not just a bureaucratic exercise. It's there as a privacy mechanism, right? We put those barriers in place because we don't necessarily trust that the state or its bureaucrats are always 100% of the time going to act as perfect angels, no human is. And so as a method of protecting privacy, we put a number of layers and checks on what kind of information you can obtain. And the commercial market, the commercial data market kind of upends that historic privacy protection. Maybe we want to live in a world that we give law enforcement unfettered access to things they bought commercially. I don't have a firm answer on that but I do think it needs to be debated very openly and very publicly. And it can't just be done in the shadows or by excluding the public and civil society from what's going on in these agencies.

There's a discussion about a TikTok ban for protecting citizens' data from collection by the Chinese government. But how effective would this be if this data can easily be sourced from a variety of vendors?

Corbinian Ruckerbauer, Policy Researcher Digital Rights, Surveillance and Democracy at interface: I'm just going to ask the next question maybe because we are having technical difficulties. So there was also the question, what is, in your opinion, the most important and possibly most accessible step for users to do in order to protect their data? And maybe combined with this, there's a discussion about a TikTok ban for protecting citizens' data from collection by the Chinese government. But how effective would this be if this data can easily be sourced from a variety of vendors? So if the Chinese government just uses the same sources that you describe for getting exactly this data? And the combined question, is there any control on the exporting of data to foreign and especially authoritarian governments?

Byron: Great questions. Let me take them one at a time. What can users do? I mean, I think the most simple thing they can do is just watch their permissions and watch what they put on their devices. I try to choose American apps whenever possible, not because the US, rah, rah, USA, but because you get the maximal legal protections as an American citizen in the United States using an American app, if not an American app, something in a rule of law country with a functioning court system that helps also just simply watch what permissions you give apps, right?

There are encrypted drive providers. You can harden both Macs and also Google and Android devices pretty well.

Like a lot of apps want your location, they want your contacts, and your photo rolls well they'll work fine without them. Maybe you have to type where you are to get an Uber or to get DoorDash delivered or whatever, but that's helpful for privacy. It depends on what your threat model is, depends on what you care about, but there are encrypted chat providers, there are encrypted email providers, there are encrypted drive providers. You can harden both Macs and also Google and Android devices pretty well. It's not perfect. There's no perfect security, but there are things you can do, including there's a special lockdown mode on Apple devices. There is advanced data protection, which will encrypt your entire iCloud drive. There are lots of cool tools built into it. I know the Mac world better, but, you know, you can also do very cool things to Android phones if you're a little bit technically sophisticated.

It's trivially easy for the Chinese government to set up a cutout in ad exchanges, to even have a cutout in a third-party country that sends this data to the Chinese government, to the People's Republic. Russia can do this. It's not just China. It's any country can set up a shell company, say they're in a marketing vertical, and collect data from these ad exchanges. And that data is fundamentally very similar to what TikTok is getting on the technical side.

To the TikTok question, it's a great question. So I think the US policymaker's concern on TikTok is twofold. One is the data collection aspect of the app. And the other is the propaganda or information operations, whatever you want to call it. The concern is that the algorithm could be used to tweak, could be tweaked to show certain content, or could be used to directly inject certain content into the global public conversation. So on that side, on the propaganda or information influence operations side, the ban would, whatever, the bill that Congress passed would directly address that by making TikTok sell, to be sold to an entity that's not in the People's Republic of China. On the data collection side, I agree. There's all this other information floating around. In ad exchanges, it's trivially easy for the Chinese government to set up a cutout, even have a cutout in a third-party country that sends this data to the Chinese government, to the People's Republic. And so Russia can do this. It's not just China. It's any country can set up a shell company, say they're in a marketing vertical, and collect data from these ad exchanges. And that data is fundamentally very similar to what TikTok is getting on the technical side.

We are still sort of sorting out what the definitions are and what that will look like. But that is a step towards cutting off countries that we think are concerning from accessing data from things like ad exchanges or data brokers.

So I do think whatever sale of TikTok doesn't address that potential flow of data. I do think the US government was an executive order earlier this year, and then Congress codified it into legislation, did pass a set of first rules, and then a law that would ban the flow of data to certain adversary countries. Bulk data, we're still sort of sorting out what the definitions are and what that will look like. But that is a step towards cutting off countries that we think are concerning from accessing data from things like ad exchanges or data brokers.

I have not seen evidence that merely requires some sort of business contractual provision in a contract that it won't go to China or won't go to Russia. We'll see if that's effective, historically has not been.

So that will be interesting to see how it's implemented and what kind of teeth the US puts behind enforcement because I've discovered a lot of this stuff is very difficult to please. Some of it happens beyond the borders of either Europe or the United States. And so, if you don't have jurisdiction, it's really hard to -- And then on top of that, sometimes these countries use cutouts and it's hard to know who's behind that cutout. And by and large, contractual provisions have not adequately safeguarded the global public's privacy in this space. Maybe regulators will start to put some teeth behind enforcing those contractual provisions or accusing companies that violate them of fraud. But I have not seen evidence that merely requires some sort of business contractual provision in a contract that it won't go to China or won't go to Russia. We'll see if that's effective, historically has not been.

Thorsten: That's a great segway to our next question. And many thanks, Corbinian, for stepping in here. Your book, Byron, expands our vocabulary when we learn about intermediaries, cutouts, and shell companies, for example. Sometimes when we compare us in the West with authoritarian regimes, we want to say with confidence that our governments do not engage in unconstrained, disproportionate or arbitrary access to personal data. But in your book, you speak of the hidden alliance in the West and you juxtapose this with China, where surveillance is ubiquitous. Yet, it’s also less hidden and more open in the sense that everybody knows that they engage in surveillance. By contrast, in the West, you often need to go through a lot of FOIA requests and rely on investigatory journalism to get a full picture of what is happening.

So the question from Fabian Geier relates to this: Could you describe what could be a totalitarian infrastructure that is not yet used in a totalitarian way? Is this not a supreme threat to democracy because once an authoritarian government could get elected and get their hands on such means of population control, their power would be impossible to challenge?

The biggest difference is the Chinese state enforces data fusion, enforces that private companies turn over large data sets to Chinese state entities, and then the state pushes that data down to the very lowest levels of its public safety infrastructure to make sure that local police in every province and every city have a 360-degree view of what citizens are doing.

Byron: Yeah. I mean, I'm not an expert in Chinese surveillance. But I did try to read a couple of books about it, and to try to understand, well, what are what is the Chinese state collecting? What are they doing? And a couple of civil society reports. My understanding as a kind of an educated layperson, is that the data that's collected in China is not all that different from the data that's collected in Western countries. I think the biggest difference is the Chinese state enforces data fusion, enforces that private companies turn over large data sets to Chinese state entities, and then the state pushes that data down to the very lowest levels of its public safety infrastructure to make sure that local police in every province and every city have a 360-degree view of what citizens are doing.

Our medical records are in the hands of private entities. There are strong privacy protections that ban the state from in bulk collecting all that. It's more a question of the will to fuse all this stuff together.

In the West, the type of data collected is not all that different. The difference is that the data fusion has not occurred yet, right? We have our driver -- At least in the United States, our driver's license is in the hands of the state, but that information never goes to the federal government. Our medical records are in the hands of private entities. There are strong privacy protections that ban the state from in bulk collecting all that. So I think it's less a question of the infrastructure and more a question of the will to fuse all this stuff together.

It's more that these walls still exist between lots of kinds of data and the central government of the state. And, my book talks a lot about the danger of some of these walls falling and some of the data that was traditionally privately held, that traditionally required some sort of court order, that corporations used to think of their role as protecting it against not just giving it or selling it in bulk to the government.

Yes, like if the citizens of the governments of Europe or the government in the United States or Canada or Australia were to make lots of private sector companies' turnover, lots of information on its citizens, you would have that totalitarian infrastructure already. There's nothing you need to build. It's more that these walls still exist between lots of kinds of data and the central government of the state. And, my book talks a lot about the danger of some of these walls falling and some of the data that was traditionally privately held, that traditionally required some sort of court order, that corporations used to think of their role as protecting it against not just giving it or selling it in bulk to the government. You know, of course, the state's always been able to get information to solve crimes with a warrant, with a court order. But when bulk information is given to the state, that's when this totalitarian architecture gets created.

And so think about today, ring doorbells are on every other house in my neighborhood. There are traffic cameras on every street that do red light enforcement. All of those are there for good reason. I'm not saying we should not have them, but this impulse to weave it all together, to put it in a data set, to give the government access to it, that's what's dangerous, and that's what needs to be challenged. Ring doorbells are perfectly useful to solve crimes, but should the police be able to just look them up whenever, or should they need to show some -- Ask permission from the homeowner, or go get a court order? Those are the really important questions that I think divide totalitarian societies from free ones.

Free societies provide data subject rights to their people, too.

Thorsten: Yes, and free societies provide data subject rights to their people, too.

There's a question on OSINT tools. How do they help you to get your information to write your book?

Byron: Yeah, that's a good question. I did not use a ton of OSINT tools because, frankly, the book was about OSINT tool vendors. At least, they call themselves that. And so, a lot of them were not super inclined to sell me a license. They are useful for journalists. They are useful for private investigators. Yeah, but I just didn't have access to a ton of them. I did use a lot of Michael Bazzell's free resources so he has a kind of an OSINT tool thing. Those are quite excellent. There are a few others I have his book. And generally speaking, I mostly just did it manually, right? I just combed through LinkedIn and looked for certain keywords. I did a lot of deep Google searches, just stuff that a tool probably could have made easier. I could have automated it. But for me, doing it manually was part of the discovery process. And there are probably more efficient ways for me to do it. But yeah, I didn't rely on a ton of them.

Thorsten: So in Europe, we have the GDPR and that has made our data more expansive perhaps, but it's not a perfect solution. Plus, we have some grave problems with its enforcement. In your book, you have a section that is entirely devoted to recommendations on how to protect your privacy as an individual citizen. Perhaps you can tie this to the next question:

What in your opinion, is the most important and possibly the most accessible step for users to do in order to protect their data?

What's important to you to keep private? I for one am very interested in keeping the content of my emails private. I am trying to maintain the confidentiality of my communications and I am trying to protect my files, my notes. I only use things that are end-to-end encrypted at least for those services, so chat, email, and drive.

Byron: Yeah. So I think I talked a little bit about this already, but depends. I think the real question you need to answer is what's important to you to protect. And you're not in this 21st-century world we live in. You're not going to be able to protect everything. So the experts call this threat modeling. I think this is a pretty expert crowd, so I can use that term. But for average people, sometimes that turns them off. The question you should ask yourself is like, what's important to you to keep private? So I think like I'm very interested in keeping the content of my emails private. I'm trying to maintain the confidentiality of my communications and I'm trying to protect my files, my notes. So I use only -- Sorry, my robot vacuum started in the middle of this. I only use things that are end-to-end encrypted at least for those services, so chat, email, and drive. I'm very careful with my permissions on my device. I used to rely pretty heavily on a VPN. Kind of comes around more to just using Apple's built-in private browsing feature, which is not built into Safari, but it's a private relay. That's a pretty darn good technology if you look at the specs on it, assuming it works as advertised. But yeah, you can go down that rabbit hole. But the simplest thing, I think, is to A, try to take control of your data, and B, to just watch who you take candy from, as one of my cyber researcher friends likes to say. Watch what you put on your device. Watch what permissions you grant.

I would generally encourage people to try to find paid-for services because it's never free to make an app. Apps need coders, they need legal help or privacy people, they need HR if they have a small team. None of that's free.

In the bigger picture, I think a lot of this world was created in part by the consumer desire to have things for free and I think that's an unhealthy dynamic. I think if you don't like the way the internet is being exploited if you don't like the way the model, I would generally encourage people to try to find paid-for services because it's never free to make an app. Apps need coders, they need legal help or privacy people, they need HR if they have a small team. None of that's free. And then, of course, they need the bandwidth, the server space, and all that. So, when they can't count on the public to pay 0.99 cents for the app, that's when they turn to monetizing data. So for people that have any sort of reluctance about paying for things on the web. I think it's a healthier web when you're willing to give a few bucks, usually less than your coffee at your local coffee shop to these services. It's not perfect. Some of them might still sell your data, but you're at least creating an ecosystem where they're not motivated to do that.

Thorsten: We have reached the end of our session. I'd like to thank you, Byron, for fielding so many of our questions. I also thank Corbinian and my colleague Justus for their hard work at preparing and running this in the background and for stepping in even.

Many thanks to you, the audience, for being here today and engaging with us. If you'd like to stay in touch with our work on digital rights, surveillance, and democracy at interface, please sign up to our newsletter. I would love to have your feedback on this event because there are so many more questions and we luckily were able to get to some of them.

Byron, it's been a pleasure. I wish you all a very good rest of the day.

Thank you so much.

Byron: Thanks so much for having me.