Getting into ML/Data Science

I’m going to take several courses about Data Science at my University. It’s a fast moving industry where a lot of innovation happens, which is why I’m excited to learn about it.

In order to prepare for my lectures, I decided to take a crash course. I took a course from Santiago Valdarrama, because he has been in the field for many years. Santiago also writes a lot on his Twitter, which are incredible insights about the workflow and evolution in Data Science / ML. I will probably write another article about this in the future.

Unfortunately, I couldn’t find his course again, so I can’t provide a link. It seems like the course was a one-time event. But here are my take aways:

How do I get started?

many positions/career paths possible
- researcher
- data analyst
- data scientist
- MLops engineer (like Devops, but for ML)
- ML Engineer
you don’t really “transition” to ML, you add it to your skillset
take advantage of existing work
focus on practical aspects of ML→ don’t need to understand them fully, understand them step-by-step
there is always a better way, focus on providing value
is still a new field, ML is mostly happening at academia→ Practical Information is still scarce!

Different Approaches to learn ML:

Theory-First-Approach:

Math → Machine Learning Theory → Applications of Machine Learning

Focus is on theory
Math is important, but it shouldn’t be the main part

better:

Problem-First-Approach:

Problem → What do I need to solve it? → Go as deep as needed → add Complexity(Algorithm, ….)

You don’t need to know all Math, every Algorithm→ look them up if you need them

Start “playing” with ML

Programming skills are fundamental
Analysis > Code→ it’s not about how, but whye.g. find an item in a list: sort + binary search or only linear search? what is faster? → you can always google the implementation (how) but you should know which solution to choose (why)→ Always ask “Why are you doing it this way?”
don’t worry about Hardware (GPU), use Google Colab
use Notebooks! E.g. Jupyter, Google Colab, kaggle
Literature focuses much on the modules→ ~ 90% of all modules don’t make it into production→ Skills you definitely need are about deploying the model
- Containers (Docker)
- REST APIs (FastAPI, Flask)
- cloud (Hosting)