We deal with trade-offs all the time. “You can have it good, fast or cheap. Pick any two.” The implementation constraints for this decision tree are clear-cut and obvious. If you want it good and fast, it won’t be cheap. If you want it fast and cheap, it won’t be good. If you want it good and cheap, it won’t be fast.
The data science question you ask may be as simple as, “Do you want it good, fast or cheap?” but achieving a desired outcome will most likely require more than just the right mathematical and computer science tools. Successful implementation may require zero-sum trade-offs that have an artistic side to them.
Here are seven data science issues that will compete for your attention as you think about solving problems that enable data-driven business decisions.
Algorithmic Complexity – How quickly or slowly will the algorithm(s) perform? Algorithms are processes or rules followed to solve problems. They can be extremely simple or remarkably complex. Every algorithm has a speed of execution parameter determined by the computing environment (how fast or slow is the CPU, front-side bus speed, storage seek time, etc.).
Quantity of Data – How big is the data set? Big Data may be an overused marketing term, but in practice, the data set you need to analyze may be colossal. The size of the data set will have a substantial impact on the amount of computer power required to accomplish your analysis.
Input Speed – How fast will the data have to be ingested? Will they come in real time from the Twitter firehose or at hundreds of gigabits per second over a fibre channel connection from a data warehouse? Are they sitting on a storage device that is local to the CPU? Is there a wireless network connection involved? Are the data coming through various networks, bounced around the earth via satellite and reassembled via load-balancing tools before you can get to them? And on and on.
Output Speed – Do you need your results in real time, as in, right now? Do you want an hourly report, or will a daily or weekly digest suffice? I’ve been in meetings where executives have asked for real-time reports (because dashboards are cool) when a daily or weekly digest would more than suffice. The consequences of asking for something to be delivered in real time when real-time reporting is not a necessity are dire. It is not uncommon for this simple, ego-driven request to double, triple or even quadruple the cost of a system. Be careful about what you ask for; you just might have to pay for it.
Accuracy – What level of accuracy is required to achieve the desired outcome? Are approximations acceptable, or does the problem require nth-decimal-place accuracy? Put another way, is it OK to be 80 percent sure, or do you need to be 100 percent sure? This is a trade-off that is worth a Socratic discussion with your mathematician.
Confidence – What is the acceptable range of confidence in the results? You can make your own scale for confidence levels, but a systemic “high confidence” rating is a very important component of any calculation.
Data-Set Complexity – How complex is the data set? Is it structured or unstructured? How much data overlap exists (annual financial reports in the presence of monthly reports, etc.)? Are component parts linearly separable? Are the data distributed in multidimensional arrays, etc.? There is literally no end to the hot mess of data-set complexity, and it will have a huge impact on the efficacy and even the feasibility of any analytic technique.
The Art of Selecting Analytic Techniques in a Zero-Sum Game
Pick any one of the above components as most important, and the other six will have to give. Selecting analytic techniques is a delicate balance of art and science.
But there’s more. While selecting analytic techniques is a zero-sum game, the process is often made more complex by the addition of business policies, legal restrictions or regulatory constraints. How will you deal with personally identifiable information (PII) or personal health information (PHI)? Both require special handling and a targeted approach.
We have a team ready to help you prepare to work with your data, understand the opportunities afforded by machine learning and pattern matching and even do a data science readiness assessment. Just shoot me an email and I’ll be happy to work with you to help you achieve your business goals.