Finding value in data, integrating open source software, a small talent pool, and ethical concerns around data were found to be trouble areas in a new state of data science report.
A report on the state of data science from software firm Anaconda finds that data science is anything but a stable part of the enterprise. In fact, it has several serious challenges to overcome.
SEE: Tableau business analytics platform: A cheat sheet (free PDF download) (TechRepublic)
Luckily, Anaconda’s report provides four recommendations organizations should focus on to address problems it found in its survey of data science professionals: A lack of value realization, concerns over the use of open-source tools, trouble finding and retaining talent, and ethical concerns about bias in data and models.
“The institutions which rely on [data science] are still developing an understanding of how to integrate, support, and leverage it,” the report said.
The four trouble areas that Anaconda found are keys in the continued evolution of data science from an emerging part of enterprise business to a fundamental part of planning for the future of work.
1. Getting value out of data science
This problem stems mainly from production roadblocks like managing dependencies and environments, a lack of organizational skills needed to deploy production models, and security problems.
Combined, those three problems lead to 52% of data science professionals saying they have trouble demonstrating the impact data science has on business outcomes. This varies across sectors, with healthcare data pros having the most trouble proving benefits, where 66% said they sometimes or never can do so, to consulting, where only 29% said the same.
“Getting data science outputs into production will become increasingly important, requiring leaders and data scientists alike to remove barriers to deployment and data scientists to learn to communicate the value of their work,” the report recommends.
2. Difficulty integrating open-source data science tools
According to the report, open-source programming language Python dominates among data scientists, with 75% saying they frequently or always use it in their jobs.
Despite the popularity of open-source software in the data science world, 30% of respondents said they aren’t doing anything to secure their open-source pipeline. Open-source analytics software is preferred by respondents because they see it as innovating faster and more suitable to their needs, but Anaconda concluded that the security problems may indicate that organizations are slow to adopt open-source tools.
“Organizations should take a proactive approach to integrating open-source solutions
into the development pipeline, ensuring that data scientists do not have to use their preferred tools outside of the policy boundary,” the report recommended.
There’s a caveat to mention here: Anaconda is the manufacturer of a Python-based open-source data science platform. The results of its survey may be tilted in favor of open-source products since people surveyed were recruited via social media and Anaconda’s email database.
3. Trouble finding and keeping qualified data scientists
There are several layers of problems to parse through here. First, the report found that what students are learning and what universities are teaching isn’t necessarily what enterprises need from new data scientists.
The two most frequently cited skill gaps by businesses—big data management and engineering skills—didn’t even rank in the top 10 skills universities are offering their data science students.
Another layer of problems comes in talent retention, which the report found is closely tied to how often data science professionals are able to prove the value of their work. Across the board, however, 44% data scientists said they plan to look for a different job within the next year.
Anaconda makes three recommendations to address this problem:
- Businesses need to collaborate with educational institutions to ensure their programs are teaching students the skills businesses need.
- Employers should design holistic data science retention plans that include helping employees learn to articulate the value of their work and providing opportunities for training and growth.
- Ensure that data scientists have the opportunity to cross train to increase the value of their contributions.
4. Eliminating bias and explaining machine learning
“Of all the trends identified in our study, we find the slow progress to address bias and
fairness, and to make machine learning explainable the most concerning,” the report said.
Ethics, responsibility, and fairness are all problems that have started to spring up around machine learning and artificial intelligence, and Anaconda said enterprises “should treat ethics, explainability, and fairness as strategic risk vectors and treat them with commensurate attention and care.”
Despite the importance of addressing bias inherent in machine learning models and data science, doing so isn’t happening: Only 15% of respondents said they had implemented a bias mitigation solution, and only 19% had done so for explainability.
Thirty-nine percent of enterprises surveyed said they had no plans to address bias in data science and machine learning, and 27% said they have no plans to make the process more explainable.
“Above and beyond the ethical concerns at play, a failure to proactively address these areas poses strategic risk to enterprises and institutions across competitive, financial, and even legal dimensions,” the report said.
The solution that Anaconda recommended is for data scientists to act as leaders and try to drive change in their organizations. “Doing so will increase the discipline’s stature in the organizations which depend on it, and more importantly, it will bring the innovation and problem-solving, for which the profession is known, to address critical problems impacting society.”