After a short tour through the notebook and everybody logging in to the Databricks environment, the teams were ready to start hacking. The first thing most teams did was to explore the notebook and begin to identify the quick wins. However, this surprisingly took a fair amount of time. To recharge our brains, the teams decided to take a short break and enjoy a nice meal and each other’s company.
Upon returning, a lot of the teams decided to split the work between their members. Some focused on engineering more advanced features from the images, while others went on to tune the parameters of the XGBoost model. Advanced features were extracted from the data by gauging the degree of coloring using top-view images of the tomato seedlings or by measuring the height of the plant using side-view images. Parameter tuning was achieved by a grid search for optimal parameters, such as the learning rate using the validation accuracy.
Feature engineering pays off
After teams finished their tweaked versions, it was time for the teams to present their models. It was great to see the different ideas other teams came up with. For instance, some presented solutions to handle the class imbalance, such as oversampling the minority class, while others solely focused on the main task of feature engineering. Either by concentrating on the extracted features themselves, such as color frequencies in the leaves and stem area, or by using images from other angles (i.e., side-view images). This latter approach, used by multiple teams, proved to be most effective. When the features extracted from those images were included in the XGboost model, it resulted in a significant improvement of the accuracy from 94 to 97 percent.
The key takeaway of the evening was that domain knowledge plays an important role when training machine learning models. The combination of feature extraction from the plant images using PlantCV, together with the XGBoost model outperformed the earlier used CNNs. This became evident when the accuracies were compared. With the highest accuracy at 97,2%, the team using this approach won eternal fame and dinner at our Itility home base: restaurant La Fontana. Another place where they surely take their tomatoes seriously.
Got excited for the next hackathon? Sign up for the Meetup group to get your invite.
Back to overview