Nowadays, it has become relatively standard to state or hear that data scientists spend, or should spend, about 80% of their time on anything but data science. Rather than building models, doing some feature engineering work, or evaluating the performances of their output, some organizations will expect them to work on collecting data, processing, sanitizing and storing data, productionizing models, meeting with product managers or various stakeholders to understand the business. Sadly, it has become a joke, and I’m tired of reading the same point repeatedly, so I thought It might be the right time to throw an unpopular opinion; I believe it needs not to be that way.
When you’re in charge of a football team, and you hire the best football players, you hire them because they’re good at playing football, at winning games. Their aspiration is precisely to win games. When they’re not playing football, you expect them to exercise to stay fit, rigorously follow a healthy diet, and join the training sessions on time. When you don’t have a considerable budget, the most you can do is to tell them to pay attention and do their best. Conscious players will understand that to be the best athlete on the pitch, they should not follow their friends at that once-in-a-lifetime rave party happening next Friday night with free beers and pizzas. Instead, they will show discipline, buy and cook high-quality fresh vegetables, exercise when they can, etc.
Now, when you have the resources of a professional team playing in the first division and want to lift trophies, you hire specialists to drive the gym sessions. You hire chefs to prepare healthy (and hopefully delicious) meals and physicians to track their physical health. That’s all good. Not only these persons are much more competent at what they do than the football players themselves, but the players can, in return, focus and enjoy what they desire the most and what they excel at: playing football, winning games — and lifting trophies, along with the rest of the team.
I appreciate football players don’t dedicate 100% of their time to play championship matches. In most professional teams, this is during the training sessions they play the most. Symmetrically, data scientists spend some of their time training, deploying, and analyzing models in production and a significant part of their time conducting offline experiments. In my view, this is where the most substantial portion of their time should go. Collecting data, joining data, sanitizing data, and storing data, should be like cooking meals: it’s more effective when there is a professional to provide adequate support — in the case of our 20% data scientist, that support should probably come from a group of data engineers. When discussing with teams and finding the right opportunities, football players have their agents. Data scientists should ideally have their product managers or technical product managers. I find this analogy quite relevant: it’s commonly advised to football players that they should attend the negotiation meetings to have their say and have the opportunity to influence the decisions (most of them don’t), although they have to trust their knowledgeable agent to negotiate and decide what’s best. The same goes for the partnership between data scientists and product managers.
This is my long way of saying that yes, indeed, data scientists won’t spend the whole time practicing data science. They must have some knowledge in the adjacent fields and feel accountable for doing what it takes to shine. Still, the same could probably be said for pretty much any other responsibility. Morose statements such as “80% of the job is processing data” are, in my opinion, signs that those organizations need to hire for another role. They don’t need to hire more football players when they need professional chefs to prepare meals for the rest of the team, although players would probably be capable of helping. It’s more than likely that some companies need more generalist profiles, but we should never take this 80% for granted.