Before the start of the hackathon, we divided the contenders into teams with both beginning and experienced data scientists to give the beginning data scientists a chance to learn from the best. The teams got access to a Python notebook with an already built convolutional neural network (CNN) using the python library Keras. The model incorporated transfer learning using the VGG16 network pre-trained on ImageNet. Three densely connected layers were added to the topless model to make it specific for our dataset. The combined deep CNN is able to get important features out of the tomato seedling images to help predict the quality of the tomato plant.
To make sure the teams could experiment quickly with this heavy-on-the-computer model, each team had access to their own Azure Databricks cluster with a powerful GPU. The goal for the teams was to get the highest possible accuracy with their model on the test set, and also present their approach at the end of the hackathon. The creativity of the approach was of course measured with our famous applause meter.
Which approach results in the most accurate prediction?
Once on their way, the teams diverged quickly in their approach. Where some teams directly started to tweak the hyperparameters of the pre-built CNN, others took more time to try and implement domain knowledge to extract better features from the tomato plant images. Some teams also focused on manipulating the images in different ways to artificially increase the size of the relatively small training set of 1,000 images, as well as counter overfit.
However, one of the things that many teams forgot to take into account was the class imbalance present in the dataset. Normal tomato plants were a lot more common in the dataset than abnormal tomato plants, just like in real life. 78.4 percent of the plants were classified as normal by an expert while only 21.6 percent were classified as abnormal. Unsurprisingly, a lot of the results from the teams on the test set comprised one of these numbers.
The final score
Three teams had a model that seemed to predict all normal plants, while two teams had a model classifying all plants as abnormal. Only two teams achieved a higher accuracy than the majority class, with the winning team achieving an accuracy of 84.1 percent. These teams both spent most of their time coping with the class imbalance and tuning the models hyperparameters.
In the end, the teams provided us with valuable insights on how to implement domain knowledge and curb overfit in our own model, which will help to improve our own model for the WUR even further. Do you think you can do better than us on this challenge? Download it below!
Back to overview