
Machine Learning Project Update: Preprocessing data
This first entry will focus on my experiments with preprocessing images which includes finding the mean and standard deviation, normalizing the set of images, using a convolution kernel, and sorting the images using TensorFlow.
First, I wrote a python script that scraped the first 100 images for ‘ballet dancer’. Here are a few of those images:
Next, I cropped every image to a square and then resized it to 100 x 100 pixels. I then created a 4 dimensional array of these 100 images. This is a plot of the resulting dataset.
Next, I created a TensorFlow session and found the mean of the images. Here’s the mean (if you look closely, you can see the outline of a dancer on one leg):
After that, I found the standard deviation of the images:
I then found the normalization of each image by subtracting the mean and dividing by the standard deviation.
I then built 3 kernels for each input color channel and then made the kernels into the shape [100, 100, 3, 1].
I then performed the convolution on the normalized images with the 4D tensors.
Finally, I flattened the convolved images so instead of many 3d images, I had many 1d vectors.
To do this I converted the 4d representation of N x H x W x C to a 2d representation of N x (H*W*C). Using tensorflow, I then attempted to organize the dataset. I tried sorting based on the mean value of each convolved image’s output to use for sorting.