Using a Convolutional Neural Net to differentiate Bagels from Donuts

16 May

Article by: Adrian Barrera-Velasquez, Rudy Cheong, Joel Martinez, Huy Do

Why Bagels and Donuts?

Our group was initially torn on what to use for our classification assignment but ended up deciding we wanted to do something fun outside of the usual science data/image sets given we’ve all been working all semester with these. The initial suggestion was McDonald’s vs Burger King’s chicken nuggets but that seemed like it wouldn’t work too well. Keeping with the food theme however, we decided on donuts vs bagels which is actually an interesting set to compare. Morphologically, these two items are very similar but in terms of food are very different. We as humans can tell the difference between donuts and bagels pretty easily so it was interesting to see if this was enough for our neural net.

Nature of the Image Sets

As we mentioned, donuts and bagels are very similar in terms of morphology but have a very clear distinction when it comes to food. As such, they are presented differently and we can see this even in our image set. We acquired our images by writing a Python script that would automatically download Google Image search results for donuts and bagels along with their link). From a cursory glance we can see that both items are usually displayed as multiples but one of the biggest differences is that the donuts are more colorful. In addition, often times the bagels are presented as sandwiches with things like cream cheese and smoked salmon. There is a variety within each set of images but we felt like this makes it more exciting to see how well the neural net performed.

What is VGG16?

Convolutional networks have made it easier than ever to conduct large scale image and video recognition analysis. In particular, the VGG16 convolutional neural network has demonstrated superior recognition capabilities compared to other convolutional neural networks because of its network architecture. Through using small 3 × 3 convolution filters in every layer the overall depth of the network is increased. This increase in depth is what ultimately leads VGG16 to achieve a very high level accuracy in classification and localization tasks.

Results

The VGG16 neural network returned accurate results in classifying the labels of the 10 tested bagel images and 10 tested donut images. The percentage of images classified correctly is 1.0, indicating perfect accuracy. The confusion matrix illustrates this performance where zero bagel true labels were misclassified as donuts (bottom left quadrant), and zero donut true labels were misclassified as bagels (top right quadrant). 

The compositions of the tested bagel images present a wide variance along parameters such as individual bagel or an ensemble, varying profile angles, and with or without fillings or cream cheese spreads. Regardless of this variety, VGG16 predicted the true labels of the bagel images with perfect accuracy (bottom right quadrant). The following table shows the set of 10 tested bagel images:

The compositions of the tested donut images also present a wide variance along several parameters and VGG16 predicted the true labels of the donut images with perfect accuracy (top left quadrant). The following table shows the set of 10 tested donut images: 

It is interesting to note that VGG16 accurately labeled bagel and donut image pairs that lack any major salient features useful in classifying one image as clearly bagel and the other as clearly donut. Such a pair is shown here:

The ability to make such a distinction with a minimum of distinguishing features is indicative of the power of the VGG16 neural network for images classification. 

Discussion

The neural net performed so well in fact that we were left wondering if it found a very simple method of classifying these images. Personally as humans we thought that the color and toppings is an immediate dead giveaway so we think it might be a color space separation or some kind of edge density on the surface depicting textures. Unfortunately we cannot peer into the black box to see but nonetheless this was a very satisfying project and result.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: