3 Facts I Bet You Didn’t Know about Data Science and Scientists:
- Data scientists are not mystical practitioners of magical arts.
- Data scientists are “sexy” according to a recent Harvard Business Review article.
- Data science can “call presidential races, reveal more about your buying habits than you’d dare tell your mother, and predict just how many years those chili cheese burritos have been shaving off your life.”
I learned these facts minutes after picking up John Foreman’s new book Data Smart: Using Data Science to Transform Information into Insight. Data Smart is the textbook for anyone wanting to turn raw data into action that makes a difference.
John is the Chief Data Scientist for MailChimp.com, the email service powering subscriptions marketing campaigns. MailChimp also powers blogs like this one, allowing you to sign up and receive blog posts in your inbox. John has also worked with a range of organizations from the FBI and Department of Defense to global corporations including Coca-Cola and Intercontinental Hotels. You can follow him on Twitter @John4man.
John, who did you write this book for?
I wrote Data Smart for anyone who wants to learn the cutting edge analytic techniques that businesses like Amazon and Facebook are using to turn their data into revenue.
And when I say “learn,” I don’t mean just “learn about.” In Data Smart, readers use actual techniques, such as artificial intelligence and data mining, to solve real business problems. That way the reader can get a sense of how to apply them to their own work. Think of the book as on-the-job training.
That’s why each chapter works through a data science technique in Excel – spreadsheets are a safe environment that readers feel comfortable working and following along in.
I wrote Data Smart for anyone from business intelligence analysts to programmers to quantitative marketers to sports analysts to C suite executives. For anyone who truly wants to learn analytics, this is the most accessible book for gaining a foothold in the discipline.
Misconceptions about Data Science
Give us your definition of data science. What’s the biggest misconception people have about your field?
Data science is the use of transactional business data (think sales data, website traffic, social interactions, ad conversion data, employee performance data, etc.) to make decisions that result in revenue growth for the business.
There are a few big misconceptions about data science. First, the field isn’t just for those who do online advertising (e.g. Facebook, Twitter, or Google). No, a brick and mortar mom-and-pop shop can benefit from artificial intelligence models too. For instance, if you run a hotel, being able to forecast demand in light of your prices and competitors’ prices is invaluable. And that’s true whether you’re a single hotel or Intercontinental.
Second, you don’t need a Ph.D. to do data science. Some of these techniques, like customer segment detection, are analytically tough, but anyone with the motivation and some spreadsheet skills can learn how to do it.
How to Use Data Science
What are a few of its uses?
There are so many it’s hard to choose! Here are some data science applications that I have personally done in my own day job:
- Forecast demand in light of price adjustments (revenue management)
- Optimize the timing of supply shipments from China to increase inventory availability and reduce costs
- Detect when an online account has been compromised by hackers
- Discover hidden customer segments in sales data to better target individuals in future marketing campaigns
- Predict demographic data of your customers (age, gender, marital status, etc.)
- Make product recommendations to customers automatically, or predict what a customer is likely to be interested in purchasing
Why is predictive analytics and data science gaining momentum?
In a word, revenue.
Data science isn’t something that companies should be doing just for fun. In the 90s and early 2000s, the business intelligence movement was all about the capture, storage, and basic reporting of data. Data science is an extension of this. There is more value in data than just in basic reporting.
But to extract that value, you need to understand practices such as data mining and predictive analytics. Past customer behavior helps forecast and predict future behavior – historical data can guide your business’s actions in the present. You just have to know how to transform that data into something useful. That’s where data science comes in.
In one of your chapters, you discuss regression as the “granddaddy of artificial intelligence.” Target got into hot water, didn’t they, when they started using AI models and alerted an unknown parent that his teenager was pregnant? What are the uses of this and what are the boundaries?
Artificial intelligence isn’t magic. AI models just suck in signals and spit out predictions. “This teen is buying maternity clothing and folic acid supplements? She’s probably pregnant.” Those are sensible dots to connect.
But like any other analytical tool, the danger is not in the tool itself but in its execution and use by the business.
For instance, dating websites such as Match.com and eHarmony have been using predictive models for years to help match users with each other. Predicting compatibility is a valid use of AI – indeed, these companies businesses depend on it.
But what if they used these same models to tell a person, “Hey, you. You’re predicted to be compatible with no one. You’re hopeless!” That would be a bad move. The model wouldn’t be to blame though – the business would. And that’s why businesses need to think through exactly what data science models they need to build and how their customers will receive and perceive them.
Forecasting the Future
You provide great hope to the field when you say, “The only guarantee with forecasting is that your forecast is wrong.” I am guessing you didn’t use that line when looking for funds for your work.
I’m a realist. When predicting the future, you’re most likely going to be wrong. But Data Smart teaches you how to put bounds around your predictions, so you can tell the business, “I forecast demand will be no lower than $X with 95% confidence.” No one wins when people don’t account for risk in their decisions, so the book takes readers through how to do that.
John, you also end your book with perspective saying that data science is “not the most important function” of the organization. In the world of data science, is that the current thinking—that it is more important than any other part of the company?
Recently, a lot of publications have really been pushing this notion of “do data science or go out of business.” The hype is at a fever pitch. (http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/)
But most businesses are not Facebook or Google. They don’t survive based on massive data-driven ad targeting efforts. When you think about a more typical business, such as a clothing retailer, they’re not going to succeed or fail based on their analytics alone. But analytics can certainly help with forecasting demand, handling reordering and stocking decisions, adjusting pricing, and figuring out how to best market and discount items. Data science for that business can become a differentiator. But what’s the most important function of the business? That’d be the production and sale of clothing that people actually want.
I just think people need to take a step back and see data science for what it is: icing on the cake. If your business is already a great cake, then the icing can make it better. But if your business is a fruitcake, well, adding some data science icing on top of it isn’t going to change the nastiness inside. (My apologies to those who really like fruitcake!)