New Year's Resolutions: Reflecting On 2019 Realities In All Things Data

Progress is Still Largely About the Availability of Data

I can distinctly remember the day I became fascinated with a paper towel dispenser. I was in high school physics, and my most-amazing teacher, in a wonderfully Socratic way, asked the class to posit theories on the inner-workings of the lab paper towel dispenser based on our external observations (we could use it, but not disassemble it). It was the old-fashioned, turn-the-silver-crank model that dispenses a stream of white paper with little square cutouts along the edge. The more you turn the crank, the more paper you get. What followed was a fascinating exercise in questions begetting questions. How does the circular motion of the crank translate into the linear motion of the paper? (worm gears). What is the significance of the notches in the paper? (paper holding mechanism with radial supports). What is the explanation of the clicking sound? (dog mechanism to prevent backward motion). What about the relationship of number of turns to amount of paper dispensed? (a massive argument ensued). I learned a lot about asking questions in the face of things that had been taken for granted. I learned a lot about making progress through observation and hypothesis. I learned… a lot. So, as the new year is on us once again, I find myself observing our digital world and wondering how another trip around the sun has changed our interaction with data and the technologies that consume it. Are we just turning the crank and taking the paper? Or is there anything we can learn from the experience if we step back and observe it a bit?

Realities: Some things are still true, and haven’t changed a lot

It’s always a good idea to get grounded in the aspects of a complex system that are reasonably constant or in some way universally true. The first such observation with Artificial Intelligence (AI), autonomous devices, and all the other amazing technologies that are advancing in modern times is that the progress is still largely about the availability of data. AI methods that rely on prior data (e.g. supervised methods) are coming of age in large part because we have more data for them to learn from, and ultimately to substantiate increasingly complex use cases. Even methods that use newly-curated data (e.g. convolutional methods that are used for classification, such as object recognition) have access to a substantially larger corpus of data to ingest. Imagine walking into a library with a thousand books vs. walking into a library with a billion books, and millions being added daily.

We must be careful not to drown in the richness of our data. Is it all true? Certainly not. Is the data manipulated in some cases to influence outcomes? Probably so. Certainly, in some domains, like health care or product design, large amounts of historical data can be extremely helpful. Other areas, such as cybersecurity or consumer behavior, can actually be substantially confounded by the fact that the past simply doesn’t look enough like the present to use longitudinal data in a completely effective way. Methods based on the stability of a data-producing environment require a period with minimal perturbation to be effective, but these are turbulent times indeed.

Consider how we might be confounding future anthropologists. Our amazing technology is currently being used to produce incredible capabilities, like the Parker Solar Probe that is exploring closer to the sun then ever-before thought possible, and at the same time we have recently seen the introduction of a new AI-enabled pet-food dish to help individual pets from stealing food intended for other pets. Are we placing our technology bets where we can quickly reap incremental benefits, or boldly going where no man has gone before? Probably, the answer is a little bit of both, but the ratio is left to each of us to ponder.

As we look at the truths that seem self-evident in our use of data and data-related technologies, we would do well to consider how much is changing as we get more of everything.

Learning: Are we getting smarter, or overwhelmed?

There are some phenomena with respect to data which are interesting, to say the least. It is worth considering the question of how we will progress going forward in our relationship with our data. One of these is privacy. Virtually every region of the world has regulatory change or current events that are influenced by differing views on privacy and ownership of data. Who controls the data and what may be done with it in various contexts is a subject of much debate and no clear near-term resolution. At the same time, we know that more privacy generally means less data (or at least different data or less easily accessible data). Will data-consuming technology take a step backward in evolution with a change in the flow and character of available data? What are the implications for critical applications?

We have certainly learned that merely collecting information about past cyber-events is insufficient to mount a defense against future malfeasance.

Privacy and security are close cousins. We have certainly learned that merely collecting information about past cyber-events is insufficient to mount a defense against future malfeasance. New methods of classification and some very promising technologies for projecting risk across a set of observations are examples of how such capabilities are advancing. Encryption continues to be a carefully-considered sector where enough is never enough. One recent evolution based on observing plant characteristics to formulate more opaque encryption keys is but one of many examples of how we continue to search for more robust ways to protect our data. At the same time, technologies like quantum computing present a clear and present reminder that this field will likely continue to change for quite some time.

The Internet is not yet 30 years old. For a person, this would be an age of enlightenment, self-discovery, new opportunities. But our Internet is actually old. It is being asked to do things it was never designed for, and at volumes and speeds never imagined. A second generation of digital natives (those who were born with an Internet in their life) have expectations about ubiquity (wi-fi everywhere), inter-connectivity (everything has an address and can be part of the web), and parity (we can do what we want where we want equally), that are simply unrealistic given the current pace of progress. As nations around the world contemplate the future of the Internet, options are wide and varied. Many would argue that we haven’t begun to implement the best of our advice from years gone by, while expectations continue to outpace reality. Couple this reality with the reality that most information is not searchable on the Internet (e.g. residing behind a password, a firewall, on the Dark Web), and it seems clear that the future is not easily predictable in terms of where we will keep our data and how we will get to it.

The future of how information will be shared, and to what extent it will be compartmentalized and controlled is far from certain. What is certain, it seems, is that what we are doing today will not be sufficient for long.

Resolutions: What to expect in 2019

With these observations of what is, and trends for where things seem to be going with respect to data and related technologies, some 2019 resolutions seem in order. Here are mine.

I will look more closely at hype vs. reality, especially with regard to technologies that consume and produce data.
I will challenge others to have good answers, not only to what data they are using, but also to what data they are not using and why.
I will contribute to the dialogue in important issues such as data privacy, the future of digital work, and cyber-everything.

It has been aptly pointed out that the only constant in nature is change. We have progressed to drying our hands with high speed air and touchless paper-towel dispensers. That progress is interesting, but evolutionary. In data, it seems, this constant is not a constant at all. With data, the rate of change is so great that it is difficult to describe. We continue to see amazing opportunity and new types of ominous risk associated with the discovery and use of data. Here’s to another year of wonderful surprises!

To read more thought leadership and predictions by Anthony Scriffignano, visit his Perspectives page at dnb.com.

This article was also published on LinkedIn.