Big Data and Data Science are overused catch phrases that can mean anything anyone wants them to mean. But the hype doesn’t change the facts. We are being overwhelmed with data, here’s what to do with it.
Data Science Can Help
The fundamental goal of data science, a mash-up of three foundational skills – domain expertise, mathematics and computer science – is to turn information into action. You can learn about how they work together in my article, Are You Ready for Data Science?
The truism “If you ask the wrong question, you’re guaranteed to get the wrong answer” is often tossed about in data science meetings as profound insight. It is not. It is an axiom.
Importantly, the obverse is almost never true. Asking the right question might be a path to enlightenment, but it certainly does not guarantee finding the right answer. To attempt that, you need the appropriate analytic tools and techniques. This is where mathematics and computer science add value.
An Overview of Analytic Techniques
What are the technical members of your data science team going to do with your data? Transform, learn and predict. So, in the interest of searching for the right answers, let’s review a few common classes of analytic techniques and see how you might put them to good use.
Aggregation – a class of techniques used to summarize data including basic statistics such as mean and weighted averages, median, Gaussian distribution and standard deviation. Other aggregation techniques include probability distribution fitting (the repeated measurement of variable phenomena – remember “method of moments” and “maximum likelihood” from Stats class?) and good, old-fashioned plotting points on a graph.
Enrichment – a set of techniques employed to add information to, or fill gaps in, a data set – for example, adding zip + 4 to five-digit zip codes, appending purchase data or credit scores or even simply standardizing prefixes or suffixes.
Processing – everything from data munging or data wrangling (the cleaning up of data) to entity extraction (identifying key terms in unstructured data that have value) to true feature extraction (building derived values from existing data).
Regression – a common way to predict the future based on the past by exploring spatial relationships. There are many types of regression techniques, but all share the common goal of predicting the value of a dependent variable where partial related variables are available, or estimating effects of an explanatory variable on the dependent variable.
Clustering – is just what it sounds like. The goal is to group a set of data points so that the ones with the most in common are closest together. Importantly, clustering is not a specific formula; it is accomplished by using a series of algorithms. And it is almost always an iterative process.
Classification – algorithms and other techniques used to identify to what category or subpopulation a data point belongs. When speaking about classifications, you must be careful to also identify the discipline you are speaking about. Statisticians use the term differently than practitioners of machine learning do.
Simulation – a set of techniques used to create a simulated environment for testing predictive models.
Optimization – a wide-ranging tool set for making optimal selections from a set of alternatives. Commonly used for pricing and maximizing yield.
We have a team ready to help you prepare to work with your data, understand the opportunities afforded by machine learning and pattern matching and even do a data science readiness assessment. Just shoot me an email and I’ll be happy to work with you to help you achieve your business goals.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it.