That's why it sucks up information on everyone.
- By Shane Harris
Shane Harris is a senior staff writer at Foreign Policy, covering intelligence and cyber security. He is the author of The Watchers: The Rise of America's Surveillance State, which chronicles the creation of a vast national security apparatus and the rise of surveillance in America. The Watchers won the New York Public Library’s Helen Bernstein Book Award for Excellence in Journalism, and the Economist named it one of the best books of 2010. Shane is the winner of the Gerald R. Ford Prize for Distinguished Reporting on National Defense. He has four times been named a finalist for the Livingston Awards for Young Journalists, which honor the best journalists in America under the age of 35. Prior to joining Foreign Policy, he was the senior writer for The Washingtonian and a staff correspondent at National Journal.
The National Security Agency has said for years that its global surveillance apparatus is only aimed at foreigners, and that ordinary Americans are only captured by accident. There’s only one problem with this long-standing contention, people who’ve worked within the system say: it’s more-or-less technically impossible to keep average Americans out of the surveillance driftnet.
"There is physically no way to ensure that you’re only gathering U.S. person e-mails," said a telecommunications executive who has implemented U.S. government orders to collect data on foreign targets. "The system doesn’t make any distinction about the nationality" of the individual who sent the message.
While it’s technically true that the NSA is not "targeting" the communications of Americans without a warrant, this is a narrow and legalistic statement. It belies the vast and indiscriminate scooping up of records on Americans’ phone calls, e-mails, and Internet communications that has occurred for more than a decade under the cover of "foreign intelligence" gathering.
The NSA is routinely capturing and storing vast amounts of the electronic communications of American citizens and legal residents, even though they were never individually the subject of a terrorism or criminal investigation, according to interviews with current and former intelligence officials, technology experts, and newly released government documents.
A significant portion of this secret information-gathering is the result of so-called "incidental collection" of U.S. persons’ information; Americans’ communications just happen to be in the way when foreigners’ data is scooped up.
This incidental collection is partly the result of the way the global communications network is constructed. When the agency receives authorization from the Foreign Intelligence Surveillance Court to collect a broad range of e-mails or electronic communications that it believes are coming out of a foreign country, it’s inevitable that it will collect some U.S. persons’ information, too.
"There are U.S. persons in every country," said a former intelligence official. "The NSA knows that when it collects great gobs [of communications] there are going to be U.S. persons in that country. They know that happens."
But new documents reveal that the NSA has also deliberately gathered communications metadata that it had reason to believe was associated with Americans.
On Thursday, the Guardian reported that NSA had been collecting vast amounts of e-mail data in bulk, stemming from a secret program that was first authorized by President George W. Bush soon after the 9/11 attacks.
The Guardian also disclosed a November 2007 memorandum prepared for then-Attorney General Michael Mukasey by Kenneth Wainstein, who was in charge of the Justice Department’s National Security Division. On behalf of the NSA, Wainstein requested that the attorney general approve a powerful form of computer-assisted analysis of U.S. persons’ metadata, including their phone and e-mail records, as well as Internet Protocol addresses of individual computers. This information was obtained "by various methods, including pursuant to the Foreign Intelligence Surveillance Act," the memo states.
"NSA has in its databases a large amount of communications metadata associated with persons in the United States," the memo states.
NSA wanted to subject this large store of metadata to a form of link analysis known as contact chaining, in which an analyst starts with a particular phone number, e-mail address, or Internet Protocol address, and then uses algorithms to find the corresponding communications to which the "seed" target is linked. Contact chaining also finds the communications to which that first layer of communication is linked. Each one of these steps outward in the original target’s network is sometimes called a "hop." In just a few hops, the number of individuals swept up in the analysis multiplies exponentially.
The memo states that the NSA had already been conducting contact-chaining, but that based on the "informal advice" of the Justice Department office that represents the government before the FISA court, "NSA’s present practice is to ‘stop’ when a chain hits a telephone number or address believed to be a United States person." The agency wanted to keep going, however, even when it encountered communications believed to belong to Americans and legal residents. The hope, the memo states, was that by chaining through "all telephone numbers and addresses," the NSA would "yield valuable foreign intelligence information primarily concerning non-United States persons outside the United States."
In effect, the NSA was arguing that it needed to see everyone’s metadata in order to find meaningful information about foreigners. Mukasey approved the new contact chaining procedures.
In the memo, Wainstein argued, as other government officials have over the years and continue today, that metadata is not content, and therefore is not subject to protections under the Fourth Amendment. Nevertheless, technology experts say that metadata can reveal deep and meaningful information about who a person knows, where they go, and what they are doing, both online and off. (It’s worth remembering that the U.S. government authorizes lethal U.S. drone strikes based on a target’s associates and movement — an analog version of metadata–and that information about those foreign terrorists and their associates is gathered using FISA.)
The memo also asked Mukasey’s permission to give metadata on U.S. persons directly to the Central Intelligence Agency and other Defense Department "entities." It doesn’t elaborate on what those organizations were doing with the data or why they wanted it.
The Guardian, citing a senior Obama administration official, reported that the intentional collection of Internet metadata was stopped in 2011. However, the paper found that "it is clear that the [NSA] collects and analyzes significant amounts of data from U.S. communications systems in the course of monitoring foreign targets."
When the agency collects the communications of Americans, it is supposed to follow a set of minimization procedures designed to protect individual privacy and keep innocent Americans from being implicated in terrorism investigations. But first, the agency has to determine if, in fact, the sender of a particular message is a U.S. person.
That’s hard to do. The breadth of global communications, and the digital mixing of messages from all corners of the world, can make it difficult to know with precision who is being targeted, and where that person is located, without subjecting a particular email to closer inspection.
According to former intelligence officials, the NSA routinely opens e-mails and reads their contents to determine if the sender was a U.S. person. Reading that message doesn’t require the agency to obtain a warrant, and if an analyst discovers that the communication belongs to a U.S. person, he is supposed to destroy it if it has no intelligence value and does not contain information about a crime. But the NSA’s guidelines allow the agency to hang onto this information for up to five years before trying to determine its origin.
"I think it’s important to understand that there are certain things that the government is doing that by their very nature are going to involve vast amounts of information about Americans, even if that’s not their intent," said Chris Soghoian, an expert on privacy and technology at the American Civil Liberties Union.
In congressional testimony earlier this month, Gen. Keith Alexander, the NSA Director, discussed two programs that had recently been disclosed in press reports: The NSA’s collection of telephone metadata in the United States, and the system known as PRISM that gives the agency access to information from Internet companies including Google and Facebook.
"These programs are limited, focused, and subject to rigorous oversight," Alexander said. "They have distinct purposes and oversight mechanisms. We have rigorous training programs for our analysts and their supervisors to understand their responsibilities regarding compliance."
Alexander did not address the collection of Internet metadata that began under the Bush administration, nor did he discuss the 2007 memo, which had not yet been disclosed. Current and former intelligence officials stressed in interviews that agency employees are trained to follow specific rules and procedures when handling U.S. person data, and that in light of recent revelations they have become more cautious.
Precisely how much U.S. person data is being collected in the course of spying on foreigners has been a subject of considerable debate, but clearly it has been large. In 2009, the New York Times reported a "significant and systematic" collection of Americans’ emails and phone calls during the course of searches authorized by the Foreign Intelligence Surveillance Act.
The NSA has avoided saying how much data on U.S. persons it is collecting, even though it appears to have a way to find out. Last year, the NSA told a pair of senators looking into the issue that the agency could not estimate how many Americans’ communications had been collected, in part because it would "violate the privacy of U.S. persons" to try answer the question. That implied that those communications were stored somewhere and accessible, but that reading them to see who was the sender would effectively constitute a search under the law.
Former officials contacted for this story were also reluctant to say how many Americans’ communications were incidentally collected during broad FISA searches. But they suggested that the number was large and knowable.
Among the U.S. person communications that the agency may retain, even though they weren’t directly targeted, are those "acquired because of limitations on NSA’s ability to filter communications," according to a set of procedures that the agency uses to minimize the intrusion into Americans’ privacy. The document was disclosed last week by the Guardian.
"They do know that U.S. person data will get through. They admit that," the former intelligence official said with respect to this provision in the rules. Sometimes a communication may slip through the filters because it’s encrypted and the system cannot scan it for keywords that might help determine the nationality of the sender. Or, the NSA could be collecting information at such a high volume that’s practically impossible to filter every message. "They don’t listen to everything and process everything," the former official said. "Sometimes they may keep it and look at it later."
When there’s a question about the sender’s nationality or location, a human analyst steps in and examines the content of the communication, former officials said. One former analyst said this only happens if there’s some indication that the communication is suspect. For instance, a known terrorist is communicating repeatedly with someone who is not yet on the agency’s radar.
There appear to be some high-level controls on how much U.S. person data the NSA gathers inadvertently, but they are relatively crude. The former intelligence official said that when the government asks the FISA court for the authority to collect communications from a particular cable, it estimates based on historical information and geography how likely it is that most of the data moving on that cable will be coming from foreigners. The court is not likely to approve broad surveillance on a cable that contains a "significant" amount of U.S. person data, the former official said.
How can the NSA know? A fiber optic cable routing traffic out of Saudi Arabia, for example, is likely to contain mostly foreigners’ communications. However, network routing is dynamic, and can change day to day. If, for instance, that same line was suddenly getting traffic from Malta, where there’s likely to be a larger number of U.S. persons, the NSA can block the Maltese traffic, the former official said. If that happened, the agency is required to inform the FISA court and describe the steps they took to filter out those communications.
Soghoian, the ACLU technology expert, said that if the NSA were tapping into undersea cables emanating from foreign countries, the likelihood of them containing U.S. persons data would be low. The likelihood increases, however, if those cables were located in the United States, where foreign data would be mingling with Americans’ communications. Using the PRISM system, the NSA collects electronic communications from service providers such as Google and Facebook that are based in the United States and use equipment here.
A U.S. person is also more likely to have his communications intercepted if he’s communicating with someone overseas, Soghoian said. But Americans who only talk with other U.S. persons can be caught in the driftnet, too — in part because of the NSA’s push into so-called cloud computing.
The NSA’s impulse to collect more information has been encouraged by the agency’s investments in big data and distributed databases. The agency bet big on Hadoop, a piece of open source software that allows massive amounts of data to be both stored and processed across a seemingly unlimited array of computers. It also lets that data sit on servers uncharacterized until the nanosecond an analyst needs the information. In other words, NSA doesn’t have to drop its information into discrete compartments like "foreigner" or "American." The data can be stored, and those characterizations can be made later. This is a great advantage for the agency: It’s slurping up billions of records but doesn’t have to make sense of them all at once.
The NSA also reverse-engineered Google’s most important database, layered it on top of their Hadoop-based system, and added inventive security controls. Older databases can be divided like spreadsheets into rows and columns; analysts can be authorized to access the data from a given column or a given row. The NSA’s database, called Accumulo, allows for much more fine-grained permissions; a single cell — the intersection of a row and column — can be hidden from an analyst. And even if it is hidden, and analyst can still use that data (even if he can’t see it) to help him spot trends and build models.
In his recent testimony, Gen. Alexander said that individual NSA analysts don’t have the authority to read someone’s e-mails or listen to his phone calls. But with Accumulo and Hadoop, it doesn’t matter. Americans’ information can be used anyway.