This year, for the first time, we conducted an industry-wide survey to establish a comprehensive view of the state of data science and machine learning. We received over 16,000 responses and learned a ton about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field. The below report shares some of our key findings and includes interactive visualizations so you can easily cut the data to find out exactly what you want to know. Here are some sample takeaways:
There are a lot of ways to look at who is working with data, but we’ll get started with some demographic info on the jobs and backgrounds of people doing doing data science today:
How old are you?
On average, survey respondents were around 30 years old, but this value varies between countries. For instance, the average respondent from India was about 9 years younger than the average respondent from Australia.
While we define “data scientist” as someone who uses code to analyze data, we found that there are a ton of job titles that fall into the realm of data science. For example, in both Iran and Malaysia, the most popular job title for those doing data science work is “Scientist or Researcher”.
Although “compensation and benefits” was ranked as slightly less important than “opportunities for professional development” in our survey, it’s still good to know what should be considered reasonable compensation. In the US, Machine Learning Engineers bring home the most bacon (on average).
So, should you get that next degree? In general, the highest percentage of people in working data science, obtained a Master’s degree. But those people in the highest salary ranges ($150K - $200K and $200k+) are just as likely to have a doctoral degree.
he average survey respondent was a 30-year-old with a Master’s degree, a job as a Data Scientist, and who makes about $55,000 per year. But people are not averages. These first few demographic questions give just the surface-level view of how diverse Kaggle’s data science community is in age, gender, country of residence, job title, salary, experience level, and formal education.
We define a data scientist as someone who ‘writes code to analyze data’. We asked these people what fills their day-to-day and here’s some highlights of what we found out:
Logistic regression is the most commonly reported data science method used at work for all industries except Military and Security where Neural Networks are used slightly more frequently.
Python was the most commonly used data analysis tool across employed data scientists overall, but more Statisticians are still loyal to R.
Relational data is the most commonly reported type of data used at work for all industries except for Academia and the Military and Security industry where text data’s used more.
Although the highest percentage of respondents share their code at work using Git, people in large companies are more likely to stay off the cloud and use file sharing softwares like Email. Those in the smallest companies are staying more agile by sharing in the cloud.
Ah, dirty data, we meet again. It looks like, in general, dirty data is the most common problem for workers in the data science realm. One exception are those necessarily meticulous Database Engineers. After dirty data, company politics, lack of management and/or financial support are the real thorns in a data scientist’s side.
When you’re starting a new career path, it’s helpful to find out how other people managed to find success. We asked people working in the data science industry how they ‘made it’. Here are a few of our favorite pieces of advice:
Everyone data scientist has an opinions on what language you should learn first. As it turns out, people who solely use Python or R feel like they made the right choice. But if you ask people that use both R and Python, they are twice as likely to recommend Python.
Data science is a quickly changing field and there are a lot of valuable resources to help you learn and stay at the top of your game so you’re always eminently employable. Those already working in the data science field are using Stack Overflow Q&A, Conferences, and Podcasts more frequently to stay up to date than people entering the field. If you’re making content or open source software, keep in mind that people entering the field are more commonly using the Official documentation and watching YouTube videos.
There’s no data science without the data. When it comes to learning data science skills, knowing how to find clean open datasets to use for practice and projects is incredibly valuable. We’re glad that dataset aggregators, like ours tend to be used the most frequently by members of the data science community.
When you’re job hunting, it may be tempting to look for work on company websites or tech-specific job boards, but according to people who are employed in the data science realm, these are among the least helpful ways to find work. Instead, try to contact recruiters or build up your network to break into the field.
https://www.kaggle.com/surveys/2017