FruitBooth: Factory Output Prediction Using Deep Learning

ece ml star
Last Update: 2019-08-01


In a fruit processing factory, an easy way to optimize the final manufacturing output is to select the best fruits as input.

Some specimens of the same variety have distinctive characteristics which make them better inputs for our process.

We believe some of these characteristics can be detected by analysing the appearance of individual fruit. There is probably a way to visually inspect the fruits and compute metrics allowing us to predict factory outputs like the sugar content and final yield.

One use case is a fruit buyer could use the device to select and buy only the best batch of fruit for his factory. That would be the final goal of our FruitBooth.

The FruitBooth could also be used in the factory, at various stages where visual interpretation can offer a reliable feedback to optimize a process (act as a sensor).

Using deep learning and computer vision, we attempt to predict factory outputs like sugar content and yield, using pictures of individual cherries captured at the beginning of process.

This page is a quick introduction to our problem and a brief explanation of the system we built. Technical details will be added someday.

Team members are Étienne Dubeau, Nicolas Turcotte and myself.

Contact me for any questions!

FruitBooth Summary

This summary presents the best solution we tested.

Keep in mind the FruitBooth is a prototype. We completed this project in less than 100 hours. Dataset capture was completed in less than 3 months. Many tests are not included in this report. We did not perform an extensive search for the best model architecture and the final result analysis is minimal and incomplete.


No dataset exists containing images of fruits with their final output and metrics for our factory. We first obtain a dataset by using our prototype FruitBooth. More information on the capture device and segmentation might be added later. Segmentation is a big part of this project. Our capture setup is not optimal for this task but allows possible future work in 3D.

Our main problem is data scarcity. We capture thousands of images of cherries but only have access to 23 output data points (factory output). Production batches can last days and contain thousands of cherries but only a single final value is obtained for one production. This makes the problem much more complicated.

computer vision segmentation of cherries
4 lights same cherries

Deep Learning Encoder

To mitigate our data scarcity problem, we split the problem in two. First, we encode the cherry picture into a small vector using deep learning. Second, we use this latent vector to predict our metrics with classical machine learning.

High Level Diagram

The first model is a Variational Auto-Encoder (VAE) with a custom pose loss. This network is trained with self supervision to compress and decompress an image of a cherry. Our custom loss forces the latent vector to be similar when two pictures of the same cherry are provided with different rotation and different light. After training, we only reuse the encoder part. Basically, we pass the image in an algorithm that outputs 128 values representing the cherry that was captured. We can perform mathematical operations in this latent space. This allows us to average multiple cherries by averaging their latent vectors and get the average cherry. This latent vector approximates the distribution of cherries in a batch and can later be used to predict factory outputs. The vector for a single cherry can also be used but to get a better representation of a batch of cherries, it is better to capture many cherries and average. We could find statistics and analyse the distribution of a batch of many cherries.

Encoding results VAE GAN different resolution
interpolation in latent domain
cherry math
Same cherry, different pose and different light used in pose loss
we allow 1/8 of the vector to encode the pose information of a cherry

Classification and Results

The second part is a simple random forest. We first encode cherry images to a latent vectors using the previous step, and use it as input for this second network. The goal of this network is to predict the final yield. Using a 50/50 split for validation, we obtain surprisingly decent results! The focus of our work is on the encoder. We did not spend much time on this analysis because 23 data points is not enough. However, the results are really encouraging!

The first network we tried to train, a simple Random Forest, offers decent results. We train on 50% of the dataset and show results on the remaining 50%. Here are the results of yield prediction. We also try to identify the palette ID of the cherries from their latent space (we think palette ID means the cherries might have received a similar treatment during transport and freezing). The yield prediction is decent. Top-1 score of 31% with 72 classes on palette prediction.

prediction of cherry yield with random forest
Pallet classification. 76 pallets total for 12000 cherries
Variational Auto-Encoder Diagram

It would be interesting to determine which variables out of the 128 in the latent vector represent the sugar content and which can represent the yield. This would allow us to find which cherries are the best to maximize our factory.

A lot more could be written on this project.

If you have any ideas for projects you would be interested in doing with us, contact me!

Cherry background

Special thanks to Mathieu, Jinsong, Marc-André, JF and Denis for their help.

Code is available here: