Data Science Survey Visualizations

A must-read for data science aspirants!!

iManassa
7 min readJun 22, 2021

Data science is now becoming an essential part of nearly all the industries in today's world because of the huge influx of data across all of them. Its growing popularity has made working professionals switch careers and students learn all the fancy terms related to data science. Companies have started to create more data related jobs and hire professionals to implement data science techniques and grow their businesses. Even schools have started including data science in their curriculums. It is truly keeping up its word of being the most “sexiest job” of the 21st century!

So, if you really fancy becoming a data scientist, do you know what should you do to become one? The best way to know is to find out from the people currently working in the industry. In this article, we will be going through the results of a project, that I did with the Kaggle 2020 Machine Learning and Data Science Survey dataset. For newbies who do not know what Kaggle is, it is actually a great platform used by data scientists and other techies to solve data science, machine learning, and analytical problems.

Kaggle had conducted an industry-wide survey that presents a comprehensive view of the state of data science and machine learning in the year 2020. The survey was live for 3.5 weeks in October, and after cleaning the data it was finished with 20,036 responses which is indeed a very large number! Those who want to view the data set can click on the link below.

The results of my project will be immensely useful for students who think of pursuing data science or data analytics for a career and also for working people who are looking to switch to this profession. We will answer their frequent questions like — What level of formal education is required for becoming a data scientist or a data analyst? What kind of skills is preferred such as how many years of programming experience should a person have or what kind of programming languages should they use? and many more such questions.

I will not be explaining the technical aspects of the project, rather only the conclusions that I derived from it. So let's get started!!

First of all, I wanted to know participants from which demographic areas dominated the survey. I found out that most of the participants were from India and the US as you can see from the below graph.

Major Insights From The Survey:

  • Programming Experience

One of the major aspects to learn for data science aspirants is coding. People who shy away from coding must really think twice before choosing this profession since it is an integral part of becoming a data scientist. However, one need not be a hardcore programmer, instead, knowing and understanding some of the basic stuff such as constructing conditional statements, looping constructs and defining functions, etc. are enough. Also, having enough understanding of data structures is of paramount importance. This will help you to write more efficient and faster code during all the major steps of a data science project — data collection, data cleaning and data visualization.

From the below heatmap, we can see that people must have at least 2 to 5 years of programming experience to enter the industry.

  • Prefered Programming Language

To have a more generic viewpoint, we will be concentrating on three main roles in the industry: data scientists, data analysts, and software engineers. This is mainly because people mostly have a dilemma on whether data scientist and data analyst have the same requirements.

We see that Python is one of the most preferred programming languages for all three roles(Data Scientist, Data Analyst and Software Engineer) alike. But then we should note that Python is alone not enough for a person to excel in this field. They will need to have at least basic knowledge of various programming skills such as C++, R, SQL, etc. with Python being the most common and preferred coding language. All this knowledge together will help them to manage the structured and especially the unstructured datasets.

Do not worry that you will have to learn all of these programming languages. Mostly if you see, once you learn a procedural language such as python or java, it is much easier to grasp the concepts than you think, while learning another programming language. This is because you will be already familiar with the base concepts such as variables, conditionals and looping, it is only the syntax that is different.

  • Prefered Database

To become a data scientist or a data analyst, you must be ready to face a very large dataset often. Datasets can be both structured and unstructured and there is really nothing to be afraid of. It will be easy to handle all the large datasets once you get somewhat familiar with database management.

We can see that the most preferred big data products (relational databases, data warehouses, data lakes, or similar) by students is MySQL, probably because it is free to use and an open-source database providing stable and reliable solutions. So if you are to get familiar with managing such large datasets, it is high time that you start learning SQL.

  • Preferred IDE

IDE or integrated development environment is software that combines various tools into a single user interface. Because of all the features that IDE offers, such as writing and debugging code more quickly with code completion and code insight, it will be extremely helpful for data scientists.

As before, we are comparing the different IDEs used by Data Scientists, Data Analysts and Software Engineers only. We can infer from the graph that Jupyter (JupyterLab, Jupyter Notebooks, etc.) are more prevalent among Data Scientists. Data Analysts on the other hand seem to prefer Jupyter and Visual Studio Code followed by Pycharm and Notepad++.

  • Education for different roles

One of the major confusions for students who complete their Bachelor's degree is to whether continue with higher education or search for a job right away. They are not sure if they have to have completed their Master's degree for a getting a job.

From the below graph, we can see that about 51% of the Data Scientist participants of this survey has completed Masters Degree. So it looks like doing a Masters is kind of required for becoming a Data Scientist. Whereas 48% of the Data Analysts have a Masters Degree and 33% have a Bachelors Degree. Although students can opt for Data Analyst jobs right after completing their Bachelor's Degree, having Masters Degree will give them a better chance for the same. Finally, for Software Engineers, people with both Masters Degree and Bachelors Degree have a somewhat equal chance of landing jobs.
Also, having a masters or not, having good subject knowledge as well as practical knowledge is really important.

  • Learning platforms for data science

Nowadays various learning platforms are coming up with data science courses and boot camps. And according to the participants from this survey, it seems that Coursera is the most preferred platform for learning the same, followed by Kaggle Learn Courses and Udemy.

  • Gender Analysis

It is common to see a gender gap in many industries of today. This part will see how the gender gap has affected the data science and machine learning industry.

The below graph tells us that more woman in the sample proportionally has a Masters Degree and Doctoral Degree, while significantly, a way lesser woman in the sample proportionally have no formal education past high school.

From the below graph we see that Malaysia, Tunisia, Iran etc have proportionally more woman and countries like Japan, Korea, Italy have very heavily dominated male profession.

We can see that there is proportionately lot more woman who are Students, Statisticians, Data Analyst, Research Scientist, Business Analyst and also who are currently unemployed. Whereas there are very less women who work as Product Managers, ML Engineers, Database Engineers and Data Scientists.

We see that woman proportionally have less coding experience than men.

The woman across the industry have proportionally lower salaries when compared to men.

Conclusion:

Data Scientists are often referred to as ‘unicorns’ because of the diverse skill set that they must possess. It is because of this reason that they are highly valued. However, it can be challenging to learn data science and the right training is always the building blocks for success!

--

--

iManassa

Aspiring data scientist and ML enthusiast. Curious about life and people.