Machine Learning Project Update: Preprocessing data

Machine Learning Project Update: Preprocessing data

This first entry will focus on my experiments with preprocessing images which includes finding the mean and standard deviation, normalizing the set of images, using a convolution kernel, and sorting the images using TensorFlow.

First, I wrote a python script that scraped the first 100 images for ‘ballet dancer’. Here are a few of those images:

Next, I cropped every image to a square and then resized it to 100 x 100 pixels. I then created a 4 dimensional array of these 100 images. This is a plot of the resulting dataset.


















Next, I created a TensorFlow session and found the mean of the images. Here’s the mean (if you look closely, you can see the outline of a dancer on one leg):


















After that, I found the standard deviation of the images:


















I then found the normalization of each image by subtracting the mean and dividing by the standard deviation.


















I then built 3 kernels for each input color channel and then made the kernels into the shape [100, 100, 3, 1].











I then performed the convolution on the normalized images with the 4D tensors.


















Finally, I flattened the convolved images so instead of many 3d images, I had many 1d vectors.
To do this I converted the 4d representation of N x H x W x C to a  2d representation of N x (H*W*C). Using tensorflow, I then attempted to organize the dataset. I tried sorting based on the mean value of each convolved image’s output to use for sorting.