Caroline Sinders, Fellow, The Mozilla Foundation + Visiting Fellow, digital HKS

Data Ingredients: A Provocation Towards Making Algorithms Human Readable

We need warning labels that are easy to understand for products using AI, machine learning and algorithms, the same way we have caloric information or ingredients in food products



Data and algorithms go hand in hand, they are inexplicably linked. Data can be analyzed, served up and misused by algorithmic systems inside of everyday products. But data and data sets are what are used to even create the algorithm itself, such as being used to train and create the data model which helps the algorithm work. I like to think of data as the ingredients inside of an algorithm, an idea also shared by the machine learning technologist and artist Hannah Davis. How does an algorithm use my data, especially inside of a product? And what if that idea was shared more with consumers?


In time of proprietary algorithms and personal data misuse,  I believe we need new kinds of labeling systems for products that use machine learning, even if the use case is incredibly benign or innocuous. Much like food labels for calories and ingredients, we need something similar for AI, machine learning and algorithms to create more transparency that’s easy to understand for all people.

Who this is for?

  • Let’s take a benign but a personal favorite example of mine such as Spotify’s Discover Weekly. I, like a lot of users (according to Spotify), look forward to my Discover Weekly playlist a play list created from a secret sauce of user data and algorithmic inference.

  • The users in this case are people are curious Spotify users, perhaps with a bit more concerns about how user data is used by companies in this post Cambridge Analytica world.

questions by the person who is using this:

  • How did my playlist come to exist?

  • What data was it trained on: holistically as a algorithm and personally from myself as a user?

  • Does it infer and create just from my own listening habits, does it look at only repeated plays or does it weight just the most popular?

  • Does one genre take into account over another, what about time?

  • Do my friends’ playlists affect my playlist?

  • Can I ever just say no this was wrong, please auto-generate a new one STAT?

Scenario: Discover weekly playlist

This concept can also work for more harmful products, such as ones that use data sets for facial recognition or any kind of predictive analysis. We deserve to know what is in those datasets to generate  those results. These kinds of labels can help offer insights to how data interacts with systems, and provide clearer warnings when datasets are biased. Users deserve to have specific knowledge and awareness if a product they are using is also using an algorithm trained on small or biased data set.

A slide from a talk I gave at Interaction 2019 Conference on Designing for Transparency in Machine Learning.

A slide from a talk I gave at Interaction 2019 Conference on Designing for Transparency in Machine Learning.

By showing the data,  Spotify could also offer an area for me to intervene as the user, as well as providing context into the creation and tailoring of my playlist. Does my playlist keeps suggesting neo-folk because four years ago I went through a break up and exclusively binged listen to Mumford and Sons? What I’ve described is something that actually happened and affected my Discover Weekly playlists for one a year. More specifically, I’m using this example as one of heightened hilarity to highlight something important- that there is a lack of space for me as a user to intervene into these algorithmic findings and provide context, across all products from food ordering to entertainment to social networks. I’m not a neo-folk fan per se and what I listened to during one specific time may not be something I want to see long term algorithmic residual effects of.

Show me how interferences are made, show me the data ingredients even for the enjoyable algorithms in my life, and then, more importantly, give me the designed space to suggest alternatives to that algorithm. Spotify, if you’re listening, I’d love some recommendations that call to mind 1970s punk, and some electroclash dance music, please. What kinds of suggestions can you give from that input? I promise, the folk was just a short, passing phase.