• Save
  • Download
  • Clear Output
  • Runtime
  • Run All Cells

Loading Runtime

Want to become a professional Data Scientist?

If you do, then you quite simply HAVE to learn Python.

Proficiency with Python is far and away the most important technical skill for any data scientist.

But, let me prove it to you –with data.

The website Kaggle.com performs an annual industry-wide survey called the "Machine Learning & Data Science Survey." All kinds of data enthusiasts respond to the survey including students, and new learners, but many working professionals respond to the survey as well.

The data that I'm about to show to you comes from 1929 of the respondents who indicated that they were currently employed with the job title of "data scientist".

Programming languages used on a "regular basis" by professional data scientists

programming languages used on a "regular basis" by professional data scientists

This graph shows the programming languages that are used on a "regular basis" by professional data scientists. nearly 94% of them said that they use Python on a regular basis -that's 19 out of every 20 data scientists. And not only that but this already very high percentage has been trending upwards slightly in recent years.

SQL or "sequel" as it's often called is also a very important technical skill that you should absolutely be competent with before you start interviewing for jobs, but even so it is dwarfed by Python in its usage.

I also want to point out the downward trend in the use of the R programming language by data scientists. This doesn't mean that it isn't a useful tool, just that its use is becoming less common among professionals who hold this specific job title.

I hope this data makes it very clear to you that if you want to be a data scientist, that you have to learn Python.

And if that's what you want to do, then you're in the right place.

Why is Python so popular?

But why is Python so popular in this profession? Well, the groundwork for Python's dominance as the language of choice for Data and Machine Learning applications stems from decades of dedicated work by the Python community. Python has the most powerful (in my opinion) set of packages and tools for scientific computing, data science, machine learning, and overall –just doing cool things with applied math– of any programming language.

Some of the incredible Python Packages that we get to benefit from are things like:

  • NumPy - "The fundamental package for scientific computing with Python."
  • SciPy - "Fundamental algorithms for scientific computing in Python."
  • Pandas - A very popular data analysis and manipulation tool.
  • Matplotlib - A fundamental yet quirky tool for creating graphs and visualizations.
  • Scikit-Learn - The most popular machine learning toolkit.
  • TensorFlow and Keras - For machine learning, particularly neural networks. Keras makes the code required to work with TensorFlow more simple and approachable. They're commonly used together.
  • PyTorch - Also a machine learning platform (similar to TensorFlow). Quite popular.
machine learning framework usage
This one's from the 2021 Kaggle Survey

Take a look at this graph from the 2021 Kaggle Survey that shows the popularity of different machine learning frameworks among professionals. We'll be learning about and using many of these tools in upcoming courses.

Python is popular because you can do so many cool things with it. And I'm not even mentioning here the popular Python web development frameworks like Django and Flask (among others) –those are cool too. My overarching goal is to help you be able to make cool data science stuff with Python.

Is this really the right place for me to start?

You may be saying to yourself. Ryan, Python sounds really cool, but those are a lot of tools to learn, and applied math? –this sounds intimidating. Is learning Python really the right place for a beginner to start?

Yes! Absolutely!

Take it from someone who has taught thousands of beginners data scientists. Learning Python is square #1.

And you're in luck because not only is Python very popular among data professionals, it is also well-known for being one of the most approachable programming languages for beginners, and for being one of the most-loved programming languages by people who use it day-in and day-out.

Don't take it from me though, check out some of the results of Stack Overflow's 2022 Developer Survey. Please note that the respondents to this survey were not only Data Scientists but are all kinds of programers.

Python is Popular Among Professionals

In this first graph I want to show you that among developers –people who write code for a living– Python is the fourth most common programming language. It's popular in general, not only among data scientists.

python is popular among professionals

Python is Popular Among Learners

Python is also popular among new learners as well. If you want to make websites, then HTML/CSS and JavaScript should be what you start with, but they're not really top priority for us data enthusiasts. Right after those comes Python, it really is very beginner friendly. It would be a great choice for anybody's first programming language.

python is popular among learners

Python is one of the most loved languages

Among developers who use Python day-in and day-out a very high percentage of them (relative to other programming languages) say that they love working with Python.

You'll particularly love it if you have experience with some of the other dreaded programming languages. I started with some dreaded programming languages and Python was a breath of fresh air.

python is one of the most loved languages