Independent projects - Machine Learning

During the last year of my doctoral studies, during my spare time, I have been exploring data and ways to analyze and model them. Some of the projects I worked on are the following:

  1. Being curious about how one can model the spread of a disease within a community given the ongoing pandemic, and realizing the inadequacies of the SIR epidemiological model, I tried my hand at modeling the spread of an epidemic by using my own rules, which I felt more aptly described reality. This model highlights the importance of social distancing and personal protective equipment.

  2. I analyzed histology tiles of Colorectal cancer patients - used clustering algorithms to understand the data better and eventually built a Convolutional Neural Network model to predict the type of tissue of a given image. I used this as a testing ground to understand some of the basics of clustering, CNNs and also to analyze image data.

  3. I did this as a part of the course ‘Introduction to Machine Learning’ I took in the Fall semester of 2019. In this Computer vision problem, given a set of points normally distributed about a line in 3 dimensions, I used a Metropolis-Hastings (Monte-Carlo Markov Chain) sampler to obtain the posterior distribution of the end points of the line segment about which these given points were distributed about in 3 dimensions given these points as viewed by a camera (2 Dimensions). I wrote the MH sampler from scratch in python and the relevant plots were made. This was repeated for the same points as viewed from a different camera (different angle). Given both these datasets, using the likelihood from both these distributions, I inferred the line segment with more confidence.

  4. In order to get familiar with classification initially, I built and compared classification models to predict a patient’s risk to being diagnosed with Cervical Cancer.

  5. I’m currently building regression models to predict drug response in cancerous tissues based on genomic data from the Genomic Data Commons Data Portal of the NIH. I am comparing different methods of feature selection on a high dimensional feature space to balance interpretability and accuracy of the supervised learning model in order to identify the biological pathway causing the cancer.