Developing a Product Mindset for Machine learning

Sharon Ibejih
4 min readJun 23, 2022
Photo by Christina @ wocintechchat.com on Unsplash

As the buzz around the importance of data remains existent, more companies are creating ways to include it as part of what their businesses should be known for, while others who have much been into its use are expanding their horizons. Data is vast — way more than we think. This article is focused on the current users of data for machine learning purposes whether for businesses or personal projects. While the points here may not be your exact reality, it should be useful to improve how you centre model development on users.

Again, data is vast and has been processed in different ways to arrive at a meaningful end or maybe no end [yet]. But we just want to make sure that it solves a problem. Just like every product person would think, machine learning engineers should think of their end-users. This starts by understanding the problems the prospective users face in relation to the proposed solution, and how many users may see a “solution” in the product.

Many data scientists were raised in a way that leaned away from a product mindset. They are ignorant of the fears and challenges that come with disappointing users. This could have somehow affected the way we approach problems. There are so many fun projects done, thanks to the tons of data available on Kaggle and Zindi. But here’s the thing — in reality, we would work for production — for users. So what’s my point?

The data scientist’s product mindset starts from mellowing down the excitement to jump on any given project or gig that provides a dataset and spending more time understanding the existing use case. It comes from a point of interrogation:

  1. What are you doing? This is very simple. It’s the task your team lead or contractor may give you. It could be something like “Can we develop a model that predicts or forecasts xxx”.
  2. Why are you trying to do that? Every project has a reason behind it. “I need to feel the impact of what I am doing otherwise, I guess I’m just doing it to get paid or have fun and I don’t care if it’s useful or not”. Interviewers sometimes get curious to know what project we found most interesting. It starts from here. Disclaimer: Since it’s ML, you might hear funny responses. These reasons could lead you to suggest whether it’s a machine learning problem or not and how effective it might be.
  3. Where did you get this data from? A source of data determines the longevity of that model. Models are not developed to last forever. They wear out. However, it’s important to project how soon they can become irrelevant. Take the finance sector, for example, a lot has changed in the past few years. Data older than a couple of years back may fail in real life. Understanding a deep context of your data will be a good guide on whether this project will end up pleasing your boss or solving the problem.
  4. Will this data always be available even in a couple of years and more? If you enjoy what you are hearing from numbers 1 to 3, or maybe you just have to, then another important thing to find out is the source of the data. Can I get fresh ones in the coming time? As stated earlier, models wear out. When they do, new data is used to update the model. So if there are many limitations in reproducing data, that might be a problem in the future and should be important to bear so in mind and recommend the likely lifespan of the developed model.
  5. What features in the data are most important? When experimenting in a notebook, we are taught to make certain preprocessing decisions based on statistical analysis. While this is very helpful, it is still crucial to know what features in your data an average user can comfortably provide. Say for example you are to develop a house price forecast model, where users input the age of the house, square feet of the house and parking lot, etc. A product-inclined person would want to know if these are realistic for a house seeker to know, at least not in some parts of the world. As much as your model may perform greatly with some of these features included in it, you should optimise for ease on the user end. Features that are difficult to provide will definitely reduce the number of users you may have, which is not good for the product.
  6. Speed or accuracy? It’s also important to know what your product should rather be good at. Quick response or very accurate response. If the model will be integrated as a web service and not a batch prediction, then maybe a fast model should be opted for. This part is technical so it’s also your bit. Whatever decision will boil down to what metrics and what score to have as a baseline. Just like in the other arms of product development, you do not need to produce a very perfect model at the first launch.

Getting feedback from users is also a part of the loop — do they complain about speed, accuracy, and features? Do they use the model frequently and find it helpful?

In the coming weeks, I’ll be releasing a series of articles in bits on machine learning operations. I, however, do hope that this article has improved the perspective of developing models for users.

If you felt inclined to do this and would love to share an experience, I would be happy to read your comment :)

Stay safe!

--

--

Sharon Ibejih

I got into data science out of curiosity to explore how much one can do with data. I believe they should be bedrock for all decision making.