I’m going to take several courses about Data Science at my University. It’s a fast moving industry where a lot of innovation happens, which is why I’m excited to learn about it.
In order to prepare for my lectures, I decided to take a crash course. I took a course from Santiago Valdarrama, because he has been in the field for many years. Santiago also writes a lot on his Twitter, which are incredible insights about the workflow and evolution in Data Science / ML. I will probably write another article about this in the future.
Unfortunately, I couldn’t find his course again, so I can’t provide a link. It seems like the course was a one-time event. But here are my take aways:
How do I get started?
- many positions/career paths possible
- researcher
- data analyst
- data scientist
- MLops engineer (like Devops, but for ML)
- ML Engineer
- you don’t really “transition” to ML, you add it to your skillset
- take advantage of existing work
- focus on practical aspects of ML→ don’t need to understand them fully, understand them step-by-step
- there is always a better way, focus on providing value
- is still a new field, ML is mostly happening at academia→ Practical Information is still scarce!
Different Approaches to learn ML:
Theory-First-Approach:
Math → Machine Learning Theory → Applications of Machine Learning
- Focus is on theory
- Math is important, but it shouldn’t be the main part
better:
Problem-First-Approach:
Problem → What do I need to solve it? → Go as deep as needed → add Complexity(Algorithm, ….)
- You don’t need to know all Math, every Algorithm→ look them up if you need them
Start “playing” with ML
- Programming skills are fundamental
- Analysis > Code→ it’s not about how, but whye.g. find an item in a list: sort + binary search or only linear search?
what is faster?
→ you can always google the implementation (how) but you should know which solution to choose (why)→ Always ask “Why are you doing it this way?” - don’t worry about Hardware (GPU), use Google Colab
- use Notebooks! E.g. Jupyter, Google Colab, kaggle
- Literature focuses much on the modules→ ~ 90% of all modules don’t make it into production→ Skills you definitely need are about deploying the model
- Containers (Docker)
- REST APIs (FastAPI, Flask)
- cloud (Hosting)
Resources to follow
For complete beginners:
Machine Learning Crash Course by Google
more challenging:
ML coursera – Andrew Ng
For Deep Learning (Subset of ML):
fastAPI – Deep Learning for developers (Book)
Solve Proplems with kaggle
Solve in this order:
- classification
- data engineering
- decision trees
- regression
- “Hellow World” of regression problems
- depends on how you frame it, its either a Regression problem, or a classification problem
- unsupervised learning (clustering)
- “Hello World” of Computer Vision
- Neural Networks
- Computer Vision
- Deep Learning
- Convolutional Neural Networks (CNNs)
- NLP
- Sentiment Analysis on IMDB
- Time Series Analysis
- Recommendation System
- Computer Vision
- Object Detection