The field of Data science is still in its early stage and in spite of a surge in AI/ML hype, there are lot of challenges in data science. We have compiled a list of interesting facts and statistics regarding the current state of data science. These facts have been compiled from “2020 State of Data Science Survey Results” released by Anaconda.
Popular Data Science and Machine Learning Languages
75% of the Respondents use Python for Data Science Work
A majority of 75% of the survey respondents use Python for work related to Data Science either always or quite frequently. This statistic is in line with the popularity that Python has garnered in recent years and this trend will definitely continue in the area of data science in 2020 and 2021.
27% of the Respondents prefer R language for Data Science Work
On the other hand, R finds itself 2nd in this ladder with 27% of users. R is a scripting language that is powerful yet simple and data scientists coming not from software engineering background find it easy to adopt R language for their machine learning and data science work.
Below are the individual statistics of these “other” Data Science / Machine Learning languages usage –
- 4% of data scientists are using C# regularly.
- 9% of data scientists are using C/C++ regularly.
- 10% of respondents prefer Java for regular data science work.
With over 20 million users, Anaconda has been the first choice as a Data Science Platform
The survey has also revealed that Anaconda has more than 20 million user base across the globe. This popularity and reason for being the first preference are credited to Anaconda’s ease of use, one place for accessing different tools and the different options it provides to users.
Departments of Data Science
28% of Professionals work in a dedicated Data Science Center of Excellence (DSCoE) in various Organizations
The survey says that 28% of the respondents are working in an organization where a data science department is present comprising of several team members on different hierarchies. This points to the fact that companies are still reluctant in investing in setting up a dedicated center of excellence for data science.
Almost 60% of Data Science related jobs are in R&D centers, Business, and IT Companies
As per the survey, 22% of data scientists are providing their services in the research & development department of companies. Whereas 21% are working as solitary data scientists in a specific line of business (e.g. HR, Marketing, etc.) and 15% are working in IT companies. Clearly, there is still a long way to go before we can see a dedicated team for Data Science in each company.
Time spent on various Data Science Tasks
Data Scientists spend over 60% of total time just on data management tasks
It should be noted that data science aspirants are attracted to the hype of machine learning frameworks, libraries, and model training but the reality is just the opposite and an eye-opener. More than 60% of survey responses tell us that they majorly spend their time in tasks like data loading, data cleaning, and data visualization.
Data Scientists devote just 34% of total effort in creating and deploying ML model
Only 34% of complete-time is used up in deciding the model, building and tuning the model and at last, deploying it. This statistic from industry insights, strongly emphasizes that machine learning model creation is a small aspect of a data science project. The data science enthusiasts must also work on their data handling and data management skills before jumping to ML model creations.
Use of Open Source Technology in Data Science
47% of Data Science Professionals use Open-Source Tools for its speed and usefulness
Open-Source Technology is popular among the market and Data Scientists also prefer them because it increases the pace of development and there several open-source tools that are easily accessible for their needs.
37% of Data Scientists prefer Open Source Tools as it is economical and avoids vendor lock-in
Another reason for Open Source popularity is because they are available for free and data scientists can use them for experimentation without paying for anything or worrying about vendor lock-in.
Roadblocks faced by Data Scientists in Model Deployment
39% of Data Scientists consider managing Dependencies/Environments as major challenge in Model Deployment
Data Science and Machine Learning are not just limited to building models in Jupyter Notebook but we must deploy our models in the production environment. In the survey, 39% of Data Scientists have highlighted that the management of dependencies and environments is their biggest concern during model deployment.
38% of Data Scientist attribute skill gaps in ML Deployment as a challenge
The report says that 38% of data scientists agree that there is a huge skill gap for technologies used in machine learning deployment such as Docker, Kubernetes, etc. This is a major challenge in data science at the moment.
34% of Data Scientists struggle to meet IT Security Standards for production
The productions have their own IT security standards to safeguard them from vulnerability attacks and 34% of data scientists find it difficult to meet these security standards when going for ML deployment. Security is actually a very ignored area in the current state of data science.
Data Scientist’s concern for managing Security and Vulnerabilities of Open-Source Technology is found to be 2.9 on a scale of 5
When asked about the security concerns of Open-Source Tools in their Organizations, Data Scientists gave a rating of 2.9 out of 5 which is on the lower side. This shows that since Data scientists usually end up working in silos they are oblivion of the organizational security concerns of the open-source tools. Instead, they focus more on experimentation without worrying much about the security aspects of tools.
Respondents that belonged to academia showed negligible importance to the security of Open-Source Technologies
One of the major survey findings tells us that respondents from the academic sector are least bothered about the level of security their Open-Source tools possess. This raises some serious questions about the data science curriculum followed in educational institutions nowadays. The institutions must realize this fact that apart from playing with data, one must also worry about the potential security vulnerabilities of open-source.
Effectiveness Business Communications
Around 50% of respondents find it easy to convey the effectiveness of Data Science in Business related decisions
Around 50% of the respondents find it easy to showcase the importance of Data Science in taking crucial business decisions. The report also indicated that individuals from IT organizations consider it to be a tough job.
70% of Data Professionals in the Consulting Industry showcase the impact of Data Science in Decision-making processes with ease
Industry-wise, almost 70% of professionals from consulting backgrounds have said that they can convey the positive effects of data science to the business team successfully whereas data scientists from the healthcare sector see their success graph drop to meager 34%.
Nearly 34% of Individuals in Data Science Departments are looking for a job change within the next year
The Data Science Industry has always been facing a scarcity of experienced professionals, on the other hand, due to dissatisfaction in their current jobs, almost 34% of data professionals working in the current organization are looking for a change. This is a worrisome situation for companies who should try better employee engagements for retaining data scientists.
Skill Gaps in Industry Standards and Educational Institutions
40% of Enterprises expect aspiring data scientists to possess knowledge of Big Data Management and Engineering Skills
The report reveals that there is a huge gap between the skill of the aspiring data scientists and what skills the industry expects them to have. 40% of companies expect budding data scientists to have experience in handling Big Data and Engineering Skills. On the contrary, Students, and University Curriculum mainly focuses on learning Python, Data Visualization, Probability, and Statistics. Clearly, a lot of awareness is required for bridging this gap.
Challenges in getting Data Science Job
66% of Beginners believe that Lack of Experience and Incompetent Technical Skills is the biggest barrier in finding their ideal Data Science job
The survey comprised of data science aspirants, going by the stats, 66% of job seekers think they do not have adequate industry experience or they lack the required skills especially technical. This highlights the lack of appropriate industry training in the curriculum followed by universities.
Ethics, Bias and Explainability in Data Science Industry
Only 15% of professors teach AI/ML ethics and only 18% students care about this
Artificial Intelligence and Data Science aren’t free from problems, the two major problems of bias in dataset/models and explainability of the results have been a serious concern in the AI community. When inquired about the inclusion of such topics in the university curriculum, only 15% of instructors have included such topics in their teaching whereas 18% of students are studying such subjects. The situation is alarming for the current state of data science and more focus is needed on AI/ML ethics
39% of Enterprises do not have a plan to implement solutions for bias mitigation and that 27% have no plans to tackle explainability
The problems encountered at the grass-root level are found at higher levels as well. Almost 65% of Companies have either no plans for reducing bias in their models or working on reducing the “black box nature” (Explainability) of models. Unfortunately, this is a very concerning state of data science.
49% of Respondents think the biggest concerns in AI/ML is social impact from bias and data privacy
People are becoming more aware of social and data privacy issues the issues with AI/ML and this reflects in this survey. 27% of people think social issues due to bias to be the major concern of AI/ML and 22% of them believe data privacy as a major challenge in data science.