While I was out at a meeting, my meticulously neat wife armed herself with a feather duster, donned protective pink rubber gloves, and viciously, mercilessly straightened, organized, and dusted the desk in my home office. My desk was clean, but the results were suboptimal.
Everyone is talking about data and how valuable it is. But do you know the names and functions of the most basic algorithms being used to turn data into action? Have you brushed up on your math skills and started to think about how data will flow through the systems you are asking your engineers to create?
As it has taken me less than a day to re–mess up my desk, I thought it would be fun to talk about some of the algorithms we use every day that are also used by computer scientists, data scientists, AI, and machine learning systems.
My wife unknowingly used an algorithm to sort my papers. She sorted the papers alphabetically by arranging them one by one in their appropriate position. This technique is called Insertion Sort, and while it is intuitive, the time spent sorting increases exponentially (aka quadratic time). It gets worse. After she sorted the documents, she grouped them by letter and put each group into corresponding hanging folders in my desk file drawer. Her choice to insertion sort the data on my desk (and then store the data alphabetically) did nothing to speed my access to the data or help me efficiently find the data I’m most likely to need next.
An Aesthetic Triumph, but a Predictive Disaster
While my desk was devoid of paper, was dust free, and looked great, all of my papers were now tucked away in a file drawer organized for first-letter alphabetic search. There was no respect or attention paid to frequency of the data usage or temporal constraints (such as an invoice with a due date). Sorting data by an arbitrary criterion may satisfy your aesthetic sensibility, but it may make no sense in a data-driven world.
In practice, there is a caching algorithm that is currently thought to be the very best at predicting your future data access needs. And my messy desk is a great example. The algorithm is known as LRU (Least Recently Used). This optimization algorithm is based on the idea that the papers you most recently used are the ones you are most likely to use again. Conversely, the papers you have not used in a long time will probably remain unused.
So the optimal way to deal with papers on your desk is to put the paper you just finished working with on the top of the pile. If you follow this algorithm (which requires no thought at all), the math says the maximum time you will spend searching for anything in the pile will never be more than twice the time you would have spent if you were clairvoyant. Messy desks are perfectly optimized.
If you need a practical test (if you don’t trust the math), time how long it takes to find something you did yesterday in the pile of papers in front of you. I would be surprised if it was more than a few seconds.
Now, try to find the same thing (with the same search criteria) in the file folder system. It will likely take much longer. Plus, unlike the cache in your messy pile, searching the folders will require you to remember the name of the vendor or title of the paper. Importantly, even if searching the alphabetic folders takes the same amount of time as searching the pile on your desk (which it won’t), you have to add back the hours spent organizing the files (doing meta-work) for no purpose at all.
Messy Is Awesome!
So the next time someone walks into your office and politely inquires, “How can you find anything in this mess?” you can tell them that your desk is perfectly optimized for temporal search and is an example of the best predictive model known in current mathematics.
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it.