Skip to main content

Morgan Cunningham

  • BSc (University of Victoria, 2021)

Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Benchmarking Algorithms for Analysis of the Honey Bee Gut Microbiome

Department of Computer Science

Date & location

  • Wednesday, April 10, 2024

  • 12:00 P.M.

  • Virtual Defence

Reviewers

Supervisory Committee

  • Dr. Hosna Jabbari, Department of Computer Sciences, University of Victoria (Co-Supervisor)

  • Dr. Marta Guarna, Department of Computer Science, UVic (Co-Supervisor) 

External Examiner

  • Dr. Lauren Davey, Department of Biochemistry and Microbiology, University of Victoria 

Chair of Oral Examination

  • Dr. Andrew Rowe, Department of Mechanical Engineering, UVic

     

Abstract

Machine learning has emerged as a pivotal analysis technique in bioinformatics, offering new insights that traditional statistical methods often fail to uncover. This is particularly relevant in the study of the honey bee gut microbiome, a critical factor in bee health and immunity, which has not yet been extensively analyzed using these advanced computational approaches. Given the significant yet poorly understood colony losses in recent years, studying the bee microbiome’s role through machine learning could provide essential clues for increasing bee health and preventing loss.

This thesis focuses on benchmarking four machine learning algorithms—random forest, ridge regression, lasso regression, and elastic net regression—specifically for their efficacy in analyzing the compositional changes in the honey bee gut microbiome. These algorithms were applied to a metagenomic dataset collected during the highbush blueberry pollination season to classify various metadata parameters of the microbiome. Among these, the random forest algorithm outperformed the others across several key performance metrics, including accuracy, AUC (Area Under the Curve), kappa, and log loss, highlighting its potential as a superior tool for microbiome analysis.

Our study further explores the challenges of machine learning in this context, such as the risk of overfitting during hyper-parameter tuning when statistical methods suggest minimal differences. To mitigate these issues, strategies like data augmentation, stratified sampling, and careful partitioning of data for training and testing were examined. These additional analyses contribute to establishing a set of best practices for the application of machine learning in the exploration of the honey bee gut microbiome.

Ultimately, by focusing on the benchmarking of machine learning algorithms and delineating best practices for their application, this thesis aims to advance the analytical techniques available for investigating the complex associations within the honey bee gut microbiome. Such advancements are crucial for unveiling the complex dynamics that influence bee health and for developing strategies to mitigate the decline of bee populations critical to our agricultural systems.