How to Become a Data Scientist: The Ultimate Guide (2020)

In this article, I share a list of online resources that will guide you on how to become a data scientist. Data science is a complex and multidisciplinary field with subjects that range from statistics to software engineering and business. I made this guide including skills that I think are essential to become a data scientist. The resources on this list are broad. Some of them touch on programming and others on statistics and mathematics. This list is especially useful if you are a beginner because many courses don't have knowledge pre-requisites. At the same time, I included advanced programs for those who already have knowledge of data science, so everyone can use this guide.

I divided this guide into five parts. The five parts are:

  1. Introduction
  2. Statistics and Probability
  3. Mathematics
  4. Programming
  5. Full "How to Become a Data Scientist" Programs

Let's get started! πŸ’ͺπŸš€


How to use the resources on this guide: "How to become a data scientist"?

To use the how to become a data scientist guide correctly, you can start with the resource in the introduction section and then choose at least one course from sections two, three and four. Or you can go straight to section five and choose one of the resources over there. You should bookmark this guide to keep access to the resources here.

For each part of this guide, I list my favorite courses, you can choose the one that fits your profile the most. Some courses are for beginners and some are for advanced students with more experience. For most of the items on this list, I share a brief description of the resource and I also explain the platform that hosts it. So, for example, if I suggest a course from Khan Academy, you will see an explanation of what Khan Academy is followed by a description of the course I am suggesting.

These are the titles of the resources from each section:

Introduction on how to become a data scientist

Statistics and Probability

Mathematics

Programming

Introduction to coding

Python Programming

The Command-Line, Git & GitHub

Full "How to Become a Data Scientist" Programs

Coursera Specializations

Udacity Nanodegrees

Datacamp Career Tracks

Codecademy Career Path


Index

How to Become a Data Scientist: The Ultimate Guide (2020)

Introduction on How to Become a Data Scientist

Data Science from Scratch by Joel Grus

I will start this guide on how to become a data scientist by sharing a book. My first suggestion is Data Science from Scratch: First Principles with Python by Joel Grus. If you never opened a book or watched a course about Data Science, I suggest starting with this one. I think this book is an excellent introduction to key principles in Data Science.

You can do most of the work that the book demonstrates using Python libraries, but I find important to know the basic building blocks. The book has a crash course in Python, visualization principles, linear algebra, statistics, probability and lots of examples. In my opinion, a great place to start. Besides, you can choose your own pace with the book and take the time you need to learn.

To wrap-up the introduction, I want to mention that there are several types of professionals who work with data. There are data analysts, data scientists, data engineers, machine learning engineers, etc. You can read more about what each of these professionals does here. In the meantime, you can take a career path quiz from Coursera and discover what job profile best matches your skills and interests.

Career path quiz

Coursera has a 7-question quiz that helps you decide which data career path is more suitable for you. It will take only a few minutes to finish. You can take the quiz for free here.

Additional readings:


How to Become a Data Scientist with Statistics and Probability: 11 Courses

In this guide on how to become a data scientist, statistics and probability are subjects that gain highlight. Data Science itself is a combination of three fields, statistics, mathematics and computer science. In this article, I will focus on statistics and math, to read more about computer science and data science take a look here.

In this section, I listed eleven fantastic courses. For each recommendation, I briefly introduce the platform that hosts it and a description of the course.

1. Statistics and probability by Khan Academy

Khan Academy Logo

Khan Academy is an educational platform that provides quality education for anyone around the world. They are a nonprofit organization, so there are no fees with using Khan Academy.

Their videos are on YouTube, but it is worth registering on their official website where besides watching the lectures, you're able to build a profile, track your progress, win badges, participate in discussions and projects. Besides that, the platform has a vast collection of courses, not only in the statistics and mathematics field but also in history and arts.

The statistics and probability course include 16 topics with more than 70 videos and interactive exercises. You can access the course in this link.

2. Statistics by CrashCourse on Youtube

CrashCourse Youtube Logo

Crash Course is a YouTube channel with educative purposes. At the moment I am writing, they have more than 10 million subscribers. They create amazing, eye-catching videos over several subjects like physics, philosophy, games, economics, history, computer science, etc.

Their statistics series have 45 videos with a lot of real-life examples. You can access the course by clicking in this link.

3. Introduction to Statistics by Udacity

Udacity Company Logo

Udacity is an online education platform that offers massive open online courses (MOOCs). While they have free classes in their catalog, you have to pay for most of them. They offer programs in several interesting subjects like Artificial Intelligence, Blockchain development, Computer Vision, Virtual Reality, Flying Cars & Autonomous Flight, etc. In the last section, you will see that I share some of Udacity's full programs that they call Nanodegress. Check the Nanodegrees here.

Udacity offers a free course called Intro to Statistics with seven lessons. It intends to teach you how to identify relationships in data, probability, estimation, outliers, normal distribution, inference, and regression. Click in this link to access this course.

4. Introduction to Descriptive Analytics by Udacity

Intro to Descriptive Statistics is also a free course in the Udacity platform. It is a beginner level course that proposes to teach the basics concepts of describing data. It is excellent for those who want to begin a career in data science, data analysis, or in the field of psychology and economics.

This course, as the title says, will introduce you to descriptive analytics. It contains seven lessons with the following subjects: research methods, data visualization, central tendency measures, variability measures, standardizing methods, the normal and sampling distributions. You can watch the course for free via this link.

5. Introduction to Inferential Statistics by Udacity

Intro to Inferential Statistics is the second part of the previous course. It is a free course, beginner level that covers the inferential part of statistics. This course's program has seven lessons that will show how to perform estimation calculations, hypothesis testing, t-tests, ANOVA, correlation measures, regression, and Chi-squared tests. You can watch the course for free via this link.

6. Introduction to Probability and Statistics by MIT

MIT OpenCourseWare Logo

MIT OpenCourseWare is an online platform that publishes the Massachusetts Institute of Technology (MIT) lectures and online textbooks. Their mission is to make quality education available worldwide. The release date was 2001, and it has inspired more than 250 other institutions to make their course material available online for free.

There is no need to signup, enroll or pay to watch the courses! Everything is open and free. Don't miss the chance to browse all their classes on their website. The course does not provide a certificate of completion.

Introduction to Probability and Statistics is a highly rated introduction course that covers topics like basic combinatorics, random variables, probability distributions, Bayesian inference, hypothesis testing, confidence intervals, and linear regression. Besides the lectures, the course includes readings, class slides, assignments, and exams. It is also possible to download all the content of the course offline. You can access the course via this link.

7. Introduction to Probability by Harvard on EdX

edX Company Logo

edX is an online platform that offers massive open online courses (MOOCs). Top-rated Universities provide the classes over there. Most of the courses are free to watch, and it is possible to receive certification after completing a course by paying a fee. Some courses are credit-eligible. The platform also offers complete Master Degrees.

It was created by The Massachusetts Institute of Technology and Harvard University in May 2012. Besides its teaching purposes, all the activity on the website is being used to research student's behavior. So, the site also aims to collect data and research to improve retention, course completion and learning outcomes in education.

Introduction to Probability is provided by Harvard University (HarvardX) via the edX online platform. The course is for beginners and includes seven units that go from the introduction to probability to Markov chains.

HarvardX Logo

The course is free to follow, and it costs $99 to receive a verified certificate. You can enroll and access the course via this link.

8. Statistics with Python Specialization by the University of Michigan on Coursera

Coursera Company logo

Coursera is an online platform founded in 2012 that offers MOOCs. Top universities and big companies create the courses over there. Like edX, Coursera also provides full Master Degrees.

The content on the platform is sometimes free to watch, and the students can receive a verified certification that authenticates successful course completion after paying a fee.

Statistics with Python Specialization by University of Michigan

Statistics with Python is a Coursera specialization. A specialization is a collection of courses created to provide a complete overview of the subject. I list more specializations from Coursera in the last section of this guide.

The University of Michigan (USA) offers this specialization on Coursera. It includes three courses:

  1. Understanding and Visualizing Data with Python
  2. Inferential Statistical Analysis with Python
  3. Fitting Statistical Models to Data with Python

This specialization teaches statistics using Python. So, I recommend taking this course if you have already some programming knowledge in Python. You can access the course via this link.

9. Statistics with R by Duke University on Coursera

Statistics with R Specialization by Duke University

Duke University certifies this specialization on Coursera. It has a beginner level and it includes the five following courses:

  1. Introduction to Probability and Data
  2. Inferential Statistics
  3. Linear Regression and Modeling
  4. Bayesian Statistics
  5. Statistics with R Capstone

You can access the course via this link.

10. Statistics for Business Analytics by SuperDataScience on Udemy

Udemy Company Logo

Udemy is an online learning platform that offers MOOCs. However, different from platforms like Coursera and edX, the courses are not produced by Universities or big companies.

Anyone can produce and submit a course to the platform. If the course follows Udemy's standards, Udemy will sell it, and both the creator and Udemy will get a cut from it.

Statistics for Business Analytics and Data Science

Statistics for Business Analytics and Data Science has 7 hours and 45 lectures. I like this course because it shows how to apply statistics for Business. So, the course is not only suitable for those people who are interested in scoring a job in Data Science or Data Analytics, but also for business owners who want to understand better what taking data-driven decisions means.

The course covers topics as variable types, normal distribution, central limit theorem, hypothesis testing, Z-score, confidence intervals, standard deviation, statistical significance, p-value, etc. You can access and enroll for this course via this link.

11. Probabilistic Graphical Models by Standford University on Coursera

Probabilistic Graphical Models by Stanford University

Probabilistic Graphical Models is an advanced level Specialization. If you watched any of the courses above in this list or if you already have a sound knowledge of statistics and probability you can follow this program without big problems. This specialization includes the following three courses:

  1. Probabilistic Graphical Models 1: Representation
  2. Probabilistic Graphical Models 2: Inference
  3. Probabilistic Graphical Models 3: Learning

The main content of the courses is:

  • Introduction to probabilistic graphical models
  • Bayesian network representation and its semantics
  • Dynamic Bayesian Networks
  • Structured CPDs for Bayesian Networks
  • Markov Networks
  • Belief Propagation Algorithms
  • MAP Algorithms
  • Inference in Temporal Models
  • Parameter Estimation in Bayesian Networks

You can access this course via this link.


How to Become a Data Scientist with Math: 11 Courses

The mathematics topics I chose for this guide on how to become a data scientist focus on are linear algebra and multivariate calculus. The majority of the courses are free to watch, and for some of them, it is possible to receive a verified certificate upon the payment of a fee. I introduced the platforms that host these courses in the previous section. Under every course title, I will provide a brief explanation of the course content.

Khan Academy

1. Linear Algebra

The Linear Algebra course covers topics as vectors and spaces, matrix transformations and alternate coordinate systems. Linear algebra is essential for Data Science and Machine Learning. You can access this program via this link.

2. Multivariable Calculus

The Multivariable Calculus course covers multivariable functions, derivatives, applications, and integration of the multivariable functions. Besides that, it teaches Green's, Stoke's, and the divergence theorems. You can access this program with this link.

Krista King Algebra and Calculus Courses on Udemy

Krista King Algebra and Calculus Courses

3. Become an Algebra Master

This course has 346 lectures and several practical exercises. You can watch the course via this link.

Krista King has three courses on Calculus, you will probably not need all of this content for Data Science, but it is never wrong to learn. The calculus courses are the following:

4. Become a Calculus 1 Master

This course has 18.5 hours of videos. Among the subjects, you will find precalculus, derivatives, and limits & continuity. You can access the course via this link.

5. Become a Calculus 2 Master

This course has 32 hours of videos. You will learn Integrals, Polar & Parametric, and Sequences & Series. You can access the course via this link.

6. Become a Calculus 3 Master

This course has 32,5 hours of video, and it covers partial derivatives, vectors, multiple integrals, and Differential Equations. You can access the course via this link.

7. Data Science Math Skills by Duke University on Coursera

Data Science Math Skills by Duke University

Data Science Math Skills is a single course, not a specialization. It is designed to teach the mathematics that you need to succeed in Data Science. Topics in the course:

  • Set theory
  • Properties of the real number line
  • Interval notation and algebra with inequalities
  • Uses for summation and Sigma notation
  • Math on the Cartesian plane
  • Graphing and describing functions and their inverses
  • The concept of instantaneous rate of change and tangent lines to a curve
  • Exponents, logarithms, and the natural log function.
  • Probability theory

The course does not dive deep into the subjects, so it is useful if you want to refresh your mathematics knowledge or if you're beginning to learn it. You can access it via this link.

8. Introduction to Discrete Mathematics for Computer Science on Coursera

Introduction to Discrete Mathematics for Computer Science Specialization

The National Research University Higher School of Economics certifies the Introduction to Discrete Mathematics for Computer Science specialization. It has five courses which are:

  1. Mathematical Thinking in Computer Science
  2. Combinatorics and Probability
  3. Introduction to Graph Theory
  4. Number Theory and Cryptography
  5. Delivery Problem

The courses use Python programming language, so make sure you have some python programming knowledge before you start it. I like this specialization because it includes a section of mathematical thinking, which helps you with developing reasoning in mathematical terms.

I don't consider it a beginning course though, so make sure you feel comfortable with the mathematics subjects that you learned in the previous classes of this list.

You can access this specialization via this link.

9. Mathematics for Machine Learning by Imperial College London on Coursera

Mathematics for Machine Learning Specialization by Imperial College London

The Imperial College in London certifies the Mathematics for Machine Learning specialization. It has three courses which are:

  1. Mathematics for Machine Learning: Linear Algebra
  2. Mathematics for Machine Learning: Multivariate Calculus
  3. Mathematics for Machine Learning: PCA

This specialization focuses on the math subjects for Machine Learning. Topics that you have already seen in the Khan Academy's courses.

I consider it an intermediate level course. It also requires basic Python knowledge. You can access this specialization via this link.

Multivariable Calculus and Linear Algebra by MIT

10. Multivariable Calculus

This course is made of recorded lectures from MIT. The course syllabus is:

  • Vector and Matrices
  • Partial Derivatives
  • Double Integrals and Line Integrals in the plane
  • Triple Integrals and Surface Integrals in 3-space

Besides the lectures, you can download lecture notes, take the exam (and check the solution later) and take assignments. The course is free, so you won't receive a certificate for it.

You can access the course via this link.

11. Linear Algebra

This course is also made from recorded MIT lectures. It is extensive, with more than 30 lectures. Similar to the Multivariate Calculus course, you can download the course materials and assignments and related resources for the subject. The course is free to watch, so you can't receive a certificate of completion.

You can access the course via this link.


Programming for Data Science

Which programming languages should I learn to become a data scientist? That is a good question. The truth is, there is no right answer. I want to say that different tasks demand different tools, so there will be a language that performs better for every data science task.

In this guide on how to become a data scientist, I will list resources to learn Python because I believe Python has many essential libraries and a big community around it which makes it an excellent language to learn for Data science.

Introduction to coding

If you are entirely new to coding, I recommend the following course that is an introduction to programming in general.

1. Learn how to code by Codecademy

Codecademy Logo

Codecademy is an online platform that offers free and paid coding lessons. For the pro (paid) version you can have access to learning plans called Paths that are a collection of courses, similar to Coursera's specializations and Udacity's Nanodegress. Check the last section to see my list of Codecademy's Paths.

Learn how to code is a free course. It has six hours of lessons and zero prerequisites! Perfect for someone who wants to start now and don't know where to begin. You can find the course via this link.

Codecademy also offers a full career path called Code Foundations in their Pro version, not necessarily needed for Data Science, but really cool if you are interested in coding!

2. Introduction to Programming Nanodegree by Udacity

Introduction to Programming Nanodegree by Udacity

This is a more general Nanodegree that gives a full overview of Programming. From Web and App development to Artificial Intelligence. You can find this Nanodegree via this link.

07 Courses to learn Python

Python is the number one language when people talk about data science. I am a personal fan of Python too, I couldn't leave it out of this guide on how to become a data scientist 😊. Besides that, Python can be easy to pick up whether you're a first time programmer or experienced with other languages.

1. Python tutorial by SoloLearn

SoloLearn Logo

SoloLearn is is an online and mobile learning platform that offers free coding classes. It is very dynamic and interactive as you can also complete the lectures on your mobile phone.

Their Python Tutorial is a great way to start learning Python. It has 92 lessons and 275 quizzes. More than 4 million people have already done this tutorial! That's really impressive. You can access this tutorial via this link.

2. Python 3 Programming Specialization by the University of Michigan on Coursera

Python 3 Programming Specialization by University of Michigan

This specialization has five courses, starting from the basics until a final capstone project. You can access and enroll for this specialization via this link.

3. Python for Everybody Specialization by the University of Michigan on Coursera

Python for Everybody Specialization by University of Michigan

This specialization has five courses including data structures, web scraping, databases, and a final capstone project. You can access and enroll for this specialization via this link.

4. Introduction to Scripting in Python Specialization by Rice University on Coursera

Introduction to scripting in Python by Rice University

This specialization has four courses which are:

  1. Python Programming Essentials
  2. Python Data Representations
  3. Python Data Analysis
  4. Python Data Visualization

You can access and enroll for this specialization via this link.

5. Applied Data Science with Python by the University of Michigan on Coursera

Applied Data Science with Python specialization by University of Michigan

This specialization will not only improve your Python coding skills but it will explain how to use Python for Data Science. There are five courses in this specialization:

  • Introduction to Data Science
  • Applied Plotting, Charting & Data Representation in Python
  • Applied Machine Learning in Python
  • Applied Text Mining in Python
  • Applied Social Network Analysis in Python

You can enroll in this specialization via this link.

6. Python Programmer by DataCamp

Python Programmer from DataCamp

Click this link to access this fantastic course.

7. Programmer for Data Science with Python by Udacity

Programmer for Data Science with Python by Udacity

This is a new Nanodegree from Udacity to learn programming for data science with Python. Use this link to access this program.

Learning how to use the Command Line, Git & GitHub and Stack Overflow

Independently of which career path you’re going to take, you will need to know how to use the command line, git and GitHub.

1. Learn the Command Line on Codecademy

Learn the Command Line Course on Codecademy

The command line is an interface in which you can interact with your computer using text (commands). By using the command line you can perform tasks more efficiently and access some programs or functionalities that are not available for common user interfaces.

Learn the Command Line is a free, 10-hour course with no prerequisites, offered by Codecademy. You can access the course and enroll via this link.

2. Introduction to Git & GitHub by FreeCodeCamp in Youtube

freeCodeCamp logo

freeCodeCamp is a non-profit organization that intends to make learning web development accessible to everyone. It is useful for Data Science because you can learn Databases (good for getting Data Engineering skills), Git & Github and D3.js (for Data Visualization). You can access the lectures on their online platform or from their Youtube channel.

Git & GitHub is a series of 11 videos that can be found in the freeCodeCamp's Youtube channel which has more than 800k subscribers and awesome content! You can access this playlist via this link.

3. How to Use Git and GitHub by Udacity

If you still have doubts about how to use Git and GitHub you can watch Udacity's How to Use Git and GitHub free course. You can access the course via this link.

StackOverflow

Stack Overflow logo

This is not a course, but a platform where developers from the whole world post their questions and struggles with coding. Other developers will answer these questions with their own solutions. Stack Overflow goes way beyond that, as users can build a reputation within the platform, so it is worth registering on the website and get familiar with how things work over there because you have a 100% chance of using it in the future.

By the way, don't expect to get answers over there without showing that you have done active effort to find a solution. You can access Stack Overflow via this link.


Full "How to Become a Data Scientist" Programs

A Full program, in my perspective, is a program that intends to teach you how to become a data scientist. So, it does not teach you only one subject like statistics or mathematics or programming. These programs aim to teach all of the subjects that you need to master in order to become a Data Scientist.

I divide this section per platform that offers this kind of program. Every platform calls the program with a different name. The platforms and the name of the programs I included are:

Coursera Specializations

Beginner Specializations

1. Data Science Specialization by Johns Hopkins University

Beginner Data Science Specialization by John Hopkins University

Access the specialization via this link.

2. IBM Data Science Professional Certificate Specialization

IBM Data Science Professional Certificate Specialization from Coursera

Access the specialization via this link.

IBM has also two other specializations offered by Coursera that have a shorter duration because they hold just a part of the courses from the Data Science Professional Certificate specialization that I mentioned above. Those specializations are:

1. IBM Introduction to Data Science Specialization

Introduction to Data Science specialization by IBM

Access the specialization via this link.

2. IBM Applied Data Science Specialization

Applied Data Science Specialization by IBM

Access the specialization via this link.

3. Executive Data Science Specialization by John Hopkins University

Executive Data Science Specialization by John Hopkins University

Access the specialization via this link.

4. Business Analytics Specialization by the University of Pennsylvania on Coursera

Business Analytics Specialization by the University of Pennsylvania on Coursera

Access the specialization via this link.

Intermediate Specializations

1. Machine Learning by Stanford University and Andrew Ng on Coursera

Machine Learning by Stanford University and Andrew Ng on Coursera

Access the specialization via this link.

2. Data Mining Specialization by the University of Illinois at Urbana-Champaign on Coursera

Data Mining by the University of Illinois at Urbana-Champaign on Coursera

Access the specialization via this link.

3. Deep Learning Specialization by deeplearning.ai on Coursera

Deep Learning specialization by deeplearning.ai on Coursera

Access the specialization via this link.

4. Machine Learning specialization by the University of Washington

Machine Learning specialization by the University of Washington

Access the specialization via this link.

5. Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud on Coursera

Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud

Access the specialization via this link.

6. Machine Learning and Reinforcement Learning in Finance Specialization by New York University Tandon School of Engineering on Coursera

Machine Learning and Reinforcement Learning in Finance Specialization by New York University Tandon School of Engineering

Access the specialization via this link.

7. Data Engineering, Big Data, and Machine Learning on GCP Specialization

Data Engineering, Big Data, and Machine Learning on GCP Specialization

Access the specialization via this link.

Advanced Specializations

1. Advanced Data Science with IBM Specialization by IBM on Coursera

Advanced Data Science with IBM Specialization by IBM

Access the specialization via this link.

2. Advanced Machine Learning Specialization by the National Research University Higher School of Economics on Coursera

Advanced Machine Learning Specialization by the National Research University Higher School of Economics

Access the specialization via this link.

3. Advanced Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud on Coursera

Advanced Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud

Access the specialization via this link.

Udacity Nanodegrees

Nanodegrees are fantastic programs! They are the answer to the question of how to become a data scientist because they are all-in-one programs.

01. Become a Data Scientist

Become a Data Scientist Nanodegree

Access the Nanodegree via this link.

02. Data Analyst

Data Analyst Nanodegree

Access the Nanodegree via this link.

03. Data Engineer

Data Engineer Nanodegree

Access the Nanodegree via this link.

04. AI Programming with Python

AI Programming with Python Nanodegree

Access the Nanodegree via this link.

05. Deep Learning

Deep Learning Nanodegree

Access the Nanodegree via this link.

06. Artificial Intelligence

Artificial Intelligence Nanodegree

Access the Nanodegree via this link.

07. Machine Learning Engineer

Machine Learning Engineer Nanodegree

Access the Nanodegree via this link.

08. Intro to Machine Learning with PyTorch

Intro to Machine Learning with PyTorch Nanodegree

Access the Nanodegree via this link.

Codecademy Paths

01. Data Science Path

Data Science Path Codecademy

Access the path via this link.

Datacamp Career Tracks

01. Data Scientist with Python

Data Scientist with Python DataCamp

Access the career track via this link.

02. Data Science for Everyone

Data Science for Everyone DataCamp

Access the career track via this link.

03. Data Analyst with Python

Data Analyst with Python DataCamp

Access the career track via this link.

04. Data Engineer with Python

Data Engineer with Python DataCamp

Access the career track via this link.

05. Machine Learning Scientist with Python

Machine Learning Scientist with Python DataCamp

Access the career track via this link.

05. Machine Learning for Everyone

Machine Learning for Everyone DataCamp

Access the career track via this link.


I hope you enjoyed this guide on how to become a data scientist! Don't hesitate to contact me if you have a constructive comment or suggestion on how to improve this guide 🀝

If you want more content like this one, follow me on Instagram or subscribe to my newsletter!πŸ‘©πŸ½β€πŸ’»