Data Science 101 – Definitions You Need to Know

Data Science

Our 7th Annual Media Technology Summit is just a few weeks away, and the subject of data science is going to be front and center. Each and every one of us creates a wealth of data every day, but… information is not knowledge. The data must be wrangled and put in context to make it actionable. There are many different techniques one can apply to data to accomplish this goal, but an important part of the process falls in to the multi-disciplinary field of data science. Which is what exactly?

  • Data Science – The analysis of data using the scientific method. (It may be the most overused term of the year, but you’re unlikely to have a meeting where the topic does not come up.)
  • Data Scientists – There are many who say that data scientists (people who practice data science) are not really scientists. That seems unfair. While there are a bunch of charlatans (people and organizations) passing themselves off as data scientists, I would argue that if the scientific method is applied (which if you remember from middle school science class is generally a statistically controlled six-step process: question, research, hypothesize, experiment, analyze, conclude), the professionals doing the work qualify as scientists. Let’s not get caught up on whether or not data science is real science. There are people who use analytical tools to find patterns in data; let’s call them data scientists.
  • Data Wrangling or Data Munging – A laborious process of manually extracting, mapping, converting or generally cleaning up data in raw form. Data wranglers use algorithms (a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer) to parse disparate types of data and fit it into defined structures. The ultimate goal is to prep the data for storage and future use.
  • Big Data – This can mean anything that anyone wants it to mean. It is on my list of banned words and really is more of a concept than an agreed upon thing. However, it is usually defined as sets of data that are too large and complex to manipulate or interrogate with standard methods or tools – in other words… big. If you collect big amounts of data, go ahead and call it big data. A good example of a big data set is all the digital health records in the United States or all of the viewer data from all of Comcast’s set-top boxes.
  • Hadoop – Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. It is commonly used in “Hadoop clusters,” which are purpose-designed computational clusters.
  • Multivariate statistical analysis – This is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. There are several uses for multivariate analysis, such as: capability-based design, inverse design (where variables can be treated independently), AoA (Analysis of Alternatives) and correlations across hierarchical levels.
  • Time-series analysis – Use of a model to predict future values based on previously observed values. It differs from regression analysis (which is often used to test theories that the current values of one or more independent time-series affect the current value of another time-series) in that time-series analysis focuses on comparing values of a single time-series or multiple dependent time-series at different points in time.
  • Multidimensional array – A data structure that has the semantics of an array of arrays, all of which may be indexed with values of any data type, usually with a supporting syntax built-into a programming language.

These are just a few of the terms that you should know if you’re going to discuss data science with your HR department or anyone else for that matter. There is a great deal of myth and mystery around this subject, such as:

  • Are data scientists just statisticians with fancy titles?
  • Don’t we need super-expensive data appliances to support a data science department?
  • Aren’t all these people just academics who don’t know anything about business?

Of course, everyone really wants to know: Where can I find one? These are all great questions. For great answers, come join us at the 7th Annual Media Technology Summit on October 23 at the Sheraton Times Square. BTW, if you’ve read this far, email me for a discount code – you deserve it!

About Shelly Palmer

Shelly Palmer is the Professor of Advanced Media in Residence at Syracuse University’s S.I. Newhouse School of Public Communications and CEO of The Palmer Group, a consulting practice that helps Fortune 500 companies with technology, media and marketing. Named LinkedIn’s “Top Voice in Technology,” he covers tech and business for Good Day New York, is a regular commentator on CNN and writes a popular daily business blog. He's a bestselling author, and the creator of the popular, free online course, Generative AI for Execs. Follow @shellypalmer or visit shellypalmer.com.

Tags

Categories

PreviousApple, Google, Intel, Adobe Reject $325 Million Anti-Hiring Settlement NextApple Adds Alerts to iCloud, Boosts Two-Factor Authentication

Get Briefed Every Day!

Subscribe to my daily newsletter featuring current events and the top stories in technology, media, and marketing.

Subscribe