Image

How We Used Machine Learning to Classify Images of Dogs and Applications in Biology Research

16 May

By Rebecca Salcedo (rebeccasophiasalcedo@gmail.com), Carmen Le, Jeremiah Ets-Hokin, and Saul Gamboa

What can we do with Machine Learning?

Machine learning with images is a very powerful tool that can help aid in analyzing large data sets. In biology, there are many different applications that machine learning can have. 

Examples of Applying Machine Learning 

Identifying mutant embryo’s

One application is the ability to identify mutant embryos. Here are two images of frog embryo’s, one with a mutation in muscle development and one that is “normal”. (Photos courtesy of Dr. Julio Ramirez from Dr. Carmen Domingo’s group at San Francisco State University: https://biology.sfsu.edu/domingo-lab). You can see how similar these images are and how having to classify them manually would be both difficult and time consuming! This is a perfect place where machine learning and image classification becomes a great asset to developmental biology research. 

Mutant:

Wildtype:

Coastal ecology/Marine biology research

Machine learning can also be useful in the world of marine ecology. One example of this is when trying to quantify abundance of different organisms underwater. Traditionally this would be done by SCUBA diving with a clipboard, measuring tape, and some sort of quadrat. You would individually count the abundance of whatever you are interested in and use that measurement as a subsample of the larger area. The two major limitations of this method is one: there is a lot of room for human error, and two: that SCUBA diving is limited to a very narrow amount of time and hand surveying takes a lot of time. A method that many researchers are moving two is taking images of underwater areas. The limitation with this is that they then end up with huge amounts of images that need to be analyzed. This is where machine learning comes in. You can write code that goes through huge amounts of images and classifies types of organisms. This method will allow for larger and higher resolution data sets for an underwater world that is so hard to see.

Our project 

As a group we wanted to test out how accurate machine learning with images can be. For this project we decided to see if the machine can identify if a dog exists in an image. We noticed that Google Photos cloud also has image recognition. However, there are some errors because there are times when a sea lion or an alligator is outputted when searching for images of dogs. So we wanted to see if we can code an even more accurate image recognizer. Here are some of the images with dogs and no dogs that we wanted to see if a computer could tell the difference between. 

Images with Dogs:


.

Images without dogs:

About the code

To be able to recognize images of dogs we could utilize some code that already existed. For our project we decided to use Oxford’s machine learning code. The machine learning code is called VGG16 and is a network that has the ability to conduct image recognition learning. You can read about it here.  

We utilize the VGG16 as a starting point to and to train it to recognize the images associated with our project. We had the help from Professors at San Francisco State University, Illmi Yoon and Pleuni Pennings, who assisted in adjusting the code to be able to obtain our image dataset, output the results, and analyze the learning accuracy of the training done with VGG16. We also had some help from twitter user @Ana_Caballero_H thanks to a blogpost showing how to split the images into the multiple folders.  

Our changes to the code

Even though an existing image recognition model existed, we couldn’t use it directly without some minor modification to both the code and our data. There were two main steps to prep our data. First was some data wrangling to train the machine with our existing images. In order to train the machine, you have to split your images into three different groups: training, test, and validation. Training images provide examples, and the test and validation images allow the model to test its skills out, see if it’s right, and fine-tune its decision making. 

While working to split our images, we encountered a number of errors. Eventually we realized that all of the images had to be in the same format. Even though all the images we had were from our cell phones, they were in a variety of different formats (jpeg, png, heic, and more). To fix this we made sure to export the images in the same format. Eventually, we were able to successfully split our images!

Additional changes to the code were made. We had to make sure that the two options it was deciding between were labeled “dogs” and “no dogs”. Importantly, this has to match the name of the folders we used to initially group the images the model was trained on.

 

Once all of that was done, our model was ready to go! But does it work?

Is our Machine ‘Learning’?

Our results suggest that our model is overfitting our data, i.e. is too specific! 

After the code was optimized and debugged, the pre-trained model VGG16 was imported in order to create a new model where all layers (except the output layer) were copied from the VGG16 model. This new model was trained to detect a condition i.e. “Dogs” and “No Dogs.” Next, the model was compiled in order to assess its performance, for this, we used “categorical_crossentropy” in the “loss” parameter. In this model, after the training data is evaluated, the model is adjusted to minimize the loss. We indicated 10 epochs for the number of times to go through the whole training set. And after running the model, we compared validation loss to training loss.

Our results indicate much higher validation loss than training loss, which is a clear sign that our model is overfitting. Next, we completed a final test of the model where the percent of images classified correctly was determined and a confusion matrix was produced. Our accuracy was 100% and this number was supported by the confusion matrix that shows the subset of 30 images to be classified as 17 true positives (Dogs), and the rest as true negatives (No Dogs). 

Lastly, we completed our project by visualizing the true and predicted labels of our images to visually assess how we did.

And that’s how we made our model! If you’d like to take a look at our code or the images we used to check out our repository on github.

This was all done as part of the GOLD program at SFSU, a wonderful program that provides graduate students the chance to develop their coding and data science skills.

One Response to “How We Used Machine Learning to Classify Images of Dogs and Applications in Biology Research”

Trackbacks/Pingbacks

  1. SFSU bio and chem Master’s students do machine learning and scicomm | Being A Better Scientist - May 20, 2021

    […] How We Used Machine Learning to Classify Images of Dogs and Applications in Biology Research […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: