Moving the needle, delivering results

News and Announcements

About Us Banner
July 16, 2019

The Myth of Privacy

In December 2018, the New York Times published an article titled “Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret”. It’s a very well-researched piece that includes a lot of great interactive elements to demonstrate the issue. I’m convinced that it’s an eye-opener for many people. The authors used smartphone location data to give some compelling examples of the premise: they mapped out the travels of a number of users whose data they obtained, showing precise times and locations at places such as schools, doctors’ offices, places of worship. They even mapped out Trump’s presidential inauguration, showing precisely where the President walked by the Capitol, and when. Curiously, in rereading the article on the New York Times website just a few hours later, that specific graphic seems to have been deleted—perhaps the Times got a call from the Secret Service. Alarmingly though, the article also includes an example plotting the whereabouts of workers at the Indian Point Nuclear Power plant, with the implication that a bad actor could figure out where they live and interact with or influence them for nefarious purposes. Having just watched the original 1962 version of The Manchurian Candidate, it makes me quite nervous.

Most requests to access your location inform you that the information will not be accessible to others or, if it is accessible, it will be anonymized or aggregated to prevent revealing an individual’s identity. This gives users a false sense of security, and lulls them into accepting the “data for free services” exchange that has become commonplace in our modern world. The New York Times article goes into this in great detail, and I’d like to expand on this.

When an app says your data will be anonymized, the language that is used is that no “Personally Identifiable Information” (often referred to as PII) will be collected or recorded. However, each time a smartphone communicates with an app or a server, a vast amount of information is visible and can be recorded. There are about 20 parameters on smartphones that app developers can access and record. These include such things as phone model, OS version, which languages are installed, your keyboard settings, WiFi settings, and many others. These settings are unique for each user and paint a picture that can be read by app developers to create an identity. So, for example, by reading what phone model I have, who my cellphone carrier is, what OS version is installed, which language keyboards are installed, if I’ve enabled Bluetooth or not, and many other such parameters, an app developer can define a profile for that phone. Using that profile, it’s possible to determine “this is device 1234567”, and track it. Once the parameters of that same handset are recorded across multiple apps, the data can be aggregated and a profile mapping behavior can be established: “device 1234567 appeared in this app, this app, and this app”. When that profile can be combined with GPS data, the profile can be used to track: “device 1234567 was here at this time, here at this time, and here at this time”. From that data, a map can be developed. The data is in fact aggregated because one of the big revenue sources in technology, particularly mobile technology which includes location, is selling user data. So far the application can be relatively innocuous: device 1234567 appears near a store in Connecticut and the user is shown that store’s offer in Facebook or in the New York Times app. Or perhaps the user is shown a competitor’s offer. At this point it’s still anonymous but here’s the catch: there are many points where real world data and virtual data intersect. One, of course, is a physical address. If a smartphone regularly goes to a particular location and “spends the night there” most of the time, then that location can reasonably be determined to be the smartphone user’s home. Address data is easily available through companies like Acxiom, probably the largest database marketing company, which collects, analyzes and sells customer and business information. So now the smartphone is associated with the Smith family. If the smartphone user spends weekdays at a business location, then the user is likely to be an adult. If the user engages in other location-based behavior we can then determine whether the user is male or female, and define other demographic data points to narrow down the set. In this way, an increasingly detailed profile of the “anonymous” owner of handset 1234567 can be developed. This is called a “probabilistic model”. Probabilistic models identify users by using anonymous information and connecting the dots to define profiles, often with accuracies greater than 85% and sometimes approaching 100%. In the case of combining a smartphone identity with real-world data, we can know with a fair degree of certainty that it’s the smartphone of a mother, Jane Smith (not just a phone in the Smith household).

But what if there is some point where Personally Identifiable Information and non-PII information intersect? That’s when the precise identity of an individual is established and it’s possible to determine that handset 1234567 = individual X. Game over. Facebook uses a “deterministic” approach. Think about it. You register on Facebook and create a user profile. Each time you access Facebook, you log in. Facebook thus knows exactly who you are and what information you access (that’s what makes their approach “deterministic”). Facebook has been analyzing what you do, what you’re interested in, who you know, all kinds of things. It has a very detailed set of information about every one of its members, and it owns—and resells—that data. Facebook’s business model is almost entirely based on reselling your data, primarily by using it to sell targeted ads to advertisers who want to know that they are showing car ads to people who are able to buy them, or women’s haircolor ads to women and not men (or children). The ability to target advertising precisely enables Facebook to sell their ad space for a higher price, generating more revenue for them. The more precise the targeting, the higher the revenue. The nefarious version of this is the Cambridge Analytica case: based on the Facebook data the company obtained, they could see who was more impressionable, and they could influence public opinion in a manner that could only be dreamed of before.

Of course, most developers and publishers have lengthy opt-ins and disclaimers which talk at length about their commitment to privacy. The European Union’s General Data Protection Regulation (GDPR) legislation further tries to protect users and their data. But there’s a glaring gap that isn’t really talked about. Data has value and a large database with extensive consumer information can be an extremely valuable asset. What happens when an unscrupulous user gains access to that data and can use it to connect the dots between anonymous information and PII? Even worse, there’s another question I’ve been asking myself for a while now. What happens to the data collected by the vast number of tech companies that fail? Does that get scrupulously deleted—or might it not end up on some sort of data black market where personal information is bought and sold much as credit card numbers are reputed to be bought and sold in some Eastern European countries? I’ve never heard that possibility discussed. If that happens, and I’m pretty sure it does, then the whole argument that one’s information is private and not identifiable goes out the window. 

There is no simple solution to this problem, especially since we’ve all been spoiled by the Faustian bargain of “free access to information in exchange for data” paradigm, and most people would be unwilling to give up the convenience of apps like Google Maps or Facebook. The New York Times article gives tips on how to limit one’s exposure, and I just went into my iPhone’s location settings to turn off location access to everything unless I’m specifically using the app. Nevertheless, there’s a deluge of data coming from my iPhone which I will never be able to control, short of turning the phone off, making sure the battery is dead, and leaving it in a drawer

There are a lot of spy thriller scenarios we might imagine, many of which are probably true. We saw a lot of abuse in the 2016 election with companies like Cambridge Analytica. What will the future hold? Let’s hope governments—and end users—start being even more proactive about privacy protection soon. The European Union took a first step with GDPR. Let’s hope others follow suit soon.