Getting into ML/Data Science

I’m going to take several courses about Data Science at my University. It’s a fast moving industry where a lot of innovation happens, which is why I’m excited to learn about it.

In order to prepare for my lectures, I decided to take a crash course. I took a course from Santiago Valdarrama, because he has been in the field for many years. Santiago also writes a lot on his Twitter, which are incredible insights about the workflow and evolution in Data Science / ML. I will probably write another article about this in the future.

Unfortunately, I couldn’t find his course again, so I can’t provide a link. It seems like the course was a one-time event. But here are my take aways:

How do I get started?

  • many positions/career paths possible
    • researcher
    • data analyst
    • data scientist
    • MLops engineer (like Devops, but for ML)
    • ML Engineer
  • you don’t really “transition” to ML, you add it to your skillset
  • take advantage of existing work
  • focus on practical aspects of ML→ don’t need to understand them fully, understand them step-by-step
  • there is always a better way, focus on providing value
  • is still a new field, ML is mostly happening at academia→ Practical Information is still scarce!

Different Approaches to learn ML:

Theory-First-Approach:

Math → Machine Learning Theory → Applications of Machine Learning

  • Focus is on theory
  • Math is important, but it shouldn’t be the main part

better:

Problem-First-Approach:

Problem → What do I need to solve it? → Go as deep as needed → add Complexity(Algorithm, ….)

  • You don’t need to know all Math, every Algorithm→ look them up if you need them

Start “playing” with ML

  • Programming skills are fundamental
  • Analysis > Code→ it’s not about how, but whye.g. find an item in a list: sort + binary search or only linear search? what is faster? → you can always google the implementation (how) but you should know which solution to choose (why)→ Always ask “Why are you doing it this way?”
  • don’t worry about Hardware (GPU), use Google Colab
  • use Notebooks! E.g. Jupyter, Google Colab, kaggle
  • Literature focuses much on the modules→ ~ 90% of all modules don’t make it into production→ Skills you definitely need are about deploying the model
    • Containers (Docker)
    • REST APIs (FastAPI, Flask)
    • cloud (Hosting)

Resources to follow

For complete beginners:

Machine Learning Crash Course by Google

more challenging:

ML coursera – Andrew Ng

For Deep Learning (Subset of ML):

fastAPI – Deep Learning for developers (Book)

Solve Proplems with kaggle

Solve in this order:

  • classification
  • data engineering
  • decision trees
  • regression
  • “Hellow World” of regression problems
  • depends on how you frame it, its either a Regression problem, or a classification problem
  • unsupervised learning (clustering)
  • “Hello World” of Computer Vision
  • Neural Networks
  • Computer Vision
  • Deep Learning
  • Convolutional Neural Networks (CNNs)
  • NLP
  • Sentiment Analysis on IMDB
  • Time Series Analysis
  • Recommendation System
  • Computer Vision
  • Object Detection