Julia Butler

Back to Projects

Introductory Machine Learning Project: Iris Flowers

Link to GitHub Repository

Introduction
I have always wanted to learn more about machine learning and machine intelligence, so I found a brief introductory project to familiarize myself with the discipline over the summer. I followed project instructions from Machine Learning Mastery and completed the project on a weekend when I had some downtime from my internship.

Process
Since this project was meant to show me the basics of machine learning, the process was very simple and straightforward. I began this project by setting up a virtual environment complete with Python 3.7.6, scipy, numpy, matplotlib, pandas, and sklearn. Then, I read thorugh the instructions and began to download the data, which was all about irises. As the writer, Jason, described, the project was very easy to follow and did a great job of highlighting important concepts and tools for machine learning.

In this tutorial, Jason guided me through basic data analysis and machine learning skills/tools. In particular, we experimented with 6 different models (both linear and non-linear): Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Gaussian Naive Bayes (NB), and Support Vector Machines (SVM). Although I have studied Logistic Regression in my Statistical Data Analysis class, I am not familiar with the other 5 models and hope to conduct more research on these algorithms in the future—whether that be in an academic setting or through self-studying. (Note: I may update this page with my personal research on these modeling algorithms in the future.)

I've learned that the basic process for this project has 4 main steps: visualizing the data, splitting the data into a training set and a validation set, using modeling algorithms on the training set to determine the best one, and verifying the best model on the validation set. From that point on, if the model proves to have a high accuracy, then predictions can be made using this model.

Conclusion
Overall, this was an informative introduction in the machine learning. As stated above, I decided to go through the steps of this tutorial to learn more about the discipline. If I had more time outside of my internship, I would have researched each of the other modeling algorithms. However, for now, I am happy with what I have learned from this tutorial and that I can see the overlap of my mathematics studies and computer science.