At a dinner party the other night, a very accomplished business person told a story about how he and his wife were certain that their devices were listening to their conversations. “I was talking to my wife about a pair of designer shoes that she wanted to purchase, and not 10 minutes later while she was doing some online research for work, she saw an ad for that exact pair of shoes. She hadn’t searched for the shoes; the ad just appeared. Clearly, our computers or our phones are listening.” Some people nodded in agreement, and others began to chime in.
I listened politely for a few minutes more as the story was embellished and other guests shared their own versions of “surveillance state” anecdotes.
Then, I raised my hand like a school kid and said, “OK. Wait. Which do you think is more likely? (A) There is secret software that breaks about 20 different local, state, and federal surveillance and privacy laws, that neither I nor any of my clients know about but that are being secretly used by me, my clients, and other advertisers to put the right message in front of you at the right time in the right place?
Or, (B) Thanks to your online behaviors (and the privacy policies, terms, and conditions you have agreed to) we have access to enriched data sets and our predictive models and machine learning tools have evolved so quickly that we have an uncanny ability to understand your behaviors well enough to put the right message in front of you at the right time in the right place?”
Questions that followed included “What is an enriched data set?” “What is an online behavior?” “What is a predictive model?” What kind of machine learning are you talking about? Is that AI?” And my favorite, “How do you know what I’ve been talking about with my friends?”
Enriched Data Sets
Data is more powerful in the presence of other data. If you have someone’s name and email address, you can send them a general offer via email. If you know where they live (phone book), what car they drive (warrantee lists), if they own or rent their home (public records), where they work (location data from your phone, LinkedIn, or other public websites), what they do (LinkedIn or other public websites), what their hours are (location data from their phone or Yelp or Google), how many people they are responsible for (inferred from their purchasing data), what they ordered for dinner last night (their social media posts), where they had dinner last night (their credit card info – which is legal to obtain if the company has a business relationship with them), how much debt they carry (their credit report), their credit score (credit reporting organizations), etc., you can send them a more targeted offer. The more data you have, the more accurate your predictions can be. But there is more to enriched data sets than passive information. Let’s add in online behaviors.
When you click on something, you are exhibiting an online behavior. This includes links in search, links to articles, links on websites, visiting a website, stopping while scrolling a social media site to look at a meme or message, swiping left or right, tapping an icon on your smartphone, picking up your smartphone (accelerometer), walking or running using a health app (GPS), using Waze, Google Maps, or Apple Maps for wayfinding, talking to Siri, Alexa, Google, Cortana, or Bixby, or playing a game of any kind on any device. All of these behaviors are captured, logged, and used to enrich your profile.
Is All My Data in One Place?
Your enriched profile is not in one place. But every company that wants to send you a targeted message does everything it can to create a “single view of the customer.” This includes cobbling together the most robust, most enriched data profile possible. The better the profile, the better the predictions. The private profiles that big tech organizations such as Google, Facebook, Amazon, Netflix, Microsoft, and Apple have for each of us are unimaginably large, and the predictions they make are extraordinarily accurate. In China, the government has pretty much 100 percent of the data everyone creates. In the EU, GDPR has been enacted to protect people from this. It is too early to tell if GDPR works.
Most predictive models fall into two general categories: classification and regression.
The goal of classification algorithms is to identify new data as belonging to a specific class or category. There are binary classifications (two possible outcomes such as male/female) and there are multi-class classifications (data may belong to multiple classes or categories). This is roughly analogous to a person asking, “What is this?” then thinking about it and then declaring, “Oh, it’s a cup. Let me put it in the cupboard.”
You are part of several classes including your family members, your friends, and your communities of interest. If you have been mathematically placed in a class with people who are likely to be discussing designer shoes, you’re going to see ads for designer shoes. Is it a coincidence that you were “just” talking about designer shoes? No. The algorithm was 92 percent confident that you had a 71 percent chance of talking about designer shoes.
Regression analysis can be used to infer relationships between independent and dependent variables. If I know a bunch of stuff about you (such as your income, zip code, monthly mortgage payment, type of car you drive now, age, and gender), I can use regression analysis to predict what car you’ll want to buy or lease next.
Why is a PhD in data science is worth $1.5 million per year to a data-rich organization? Because a PhD in data science knows how to creatively and efficiently apply analysis techniques to make super-accurate predictions.
Machine Learning and AI
If I showed you a 10 x 10 spreadsheet of data about your business, in only a few minutes you could tell me everything it represented. You know your business, your customers, your industry. The numbers would describe things you have experienced in real life, and you would be able to explain (using the language of arts and letters, not the language of mathematics) how the numbers spoke to you.
However, if that data set was 25,000 columns by 25 million rows, there is no way you or any other human being could ever look at or interpret the data. That’s why it’s called “big data.”
To look at big data, you need computers. And to make the data actionable, you can teach machines to do predictive analysis. Machines can now learn, and predictive analytics is one of the things machines learn to do very, very well.
Why You Think Your Devices Are Listening to You
First, we need to define “listening.” With respect to your private audible conversations, meaning spoken words that might be recorded and interpreted, unless you are under surveillance by a government agency with a warrant or being illegally eavesdropped on, no one is listening to your conversations with any tool that will be used to put advertising messages or content in front of you. No matter what you think Alexa or Google Assistant does when you have not said a wake word, it’s just not happening. (Note: There have been some sensationalist headlines recently about Alexa and Siri quality assurance (QA) workers that have heard things they should not have. They certainly were not using the data for anything nefarious, the company policies around QA were just poorly thought through.)
However, every other device in your world (including Alexa and Google Assistant after you say the wake words) takes whatever behaviors you exhibit and whatever data that can be gathered about you and uses it to make predictions about your behaviors.
So, in practice, everything is “listening” to you. Not humans in rooms with headphones, but rather computers in data centers using AI. The data you create about yourself is being gathered, analyzed, and used all the time – 24/7/365.
What to Do About It
Now comes the hard part. We have to figure out if the benefits of accurate messaging and the convenience of our machines knowing us at the most intimate level are worth the risks. Designer drugs created from our own DNA seem great. Tools that can read our emotions and help us cope with complex issues seem scary, but also great. Custom movies with custom soundtracks created in real time specifically for us individually seem like science fiction, but they are only a few years away. Self-driving cars that know where we are going and how we like to travel seem awesome too.
All of this requires big tech to have unfettered access to our data. Should it? If so, what data? Whose data? In the next few years we are going to have to vote – mostly with our wallets – about this. Our elected officials will have to deal with it too.
Do yourself, your kids and grandkids, and your unborn future descendants a favor: make this a personal priority. Our future depends on it!
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. (This article was originally published on August 5, 2019, it was revised June 26, 2022 and revised again on November 26, 2022.)