Data Science Projects

Data Science Projects

In May 2022, I worked on several data science projects as part of my Lectures in Data Science / ML. Here’s a collection of the different algorithms and approaches I experimented with:

HuberAdrian/DataScience-Lectures Star Fork

Some data science projects I worked on while learning about machine learning, including classification, regression, recommendation systems, and neural networks.

Naive Bayes Classification

I used the MNIST dataset of handwritten digits to experiment with Naive Bayes classification. The project involved comparing different supervised classification algorithms to see how they perform on image recognition tasks. I looked at how well each method could identify the handwritten digits and what trade-offs existed between the different approaches.

MNIST Handwritten Digits Samples
View Naive Bayes Classification Notebook

Random Forest Regression

This project looked at predicting New York taxi fares using Random Forest Regression. I tried to find the best parameters for the model by testing different combinations. The goal was to accurately predict fares based on trip data like distance, time of day, and passenger count. I experimented with different model settings to see what worked best.

NYC Taxi Fare Prediction Model
View Random Forest Regression Notebook

Movie Recommender System

For this project, I built a movie recommender system using the MovieLens database. I tried different approaches to collaborative filtering, mainly focusing on how to recommend movies based on similar users’ ratings. The system looks at what movies are similar to each other based on how people rated them and tries to suggest new movies you might like.

Movie Recommender System
View Movie Recommender System Notebook

Image Classification with CNNs

I experimented with building and training the AlexNet CNN architecture for image classification using the CIFAR-10 dataset. The project involved setting up the different layers of the neural network and tweaking the learning parameters to improve accuracy. I played around with various settings to see how they affected the model’s ability to correctly identify the different image categories.

CNN Architecture for CIFAR-10
View CNN Image Classification Notebook

All of these projects were part of my data science learning process. The full code and notebooks are available in my DataScience-Lectures GitHub repository if you want to check them out.

Technologies Used

  • Python
  • pandas
  • numpy
  • matplotlib
  • sklearn
  • TensorFlow
  • Google Colab