How To Make K Means Clustering Algorithm With C#

K means clustering is a optimization method of partitioning an image by measuring Euclidean distances between pixels and cluster means.


Andraz Krzisnik
How To Make K Means Clustering Algorithm...

We can use k means clustering for optimally dividing data into separate groups. Furthermore, we’re going to use it to partition an image into a certain number of regions.

The name of this operation pretty much tells us what’s the essence of it. Basically, we assign each pixel to a cluster with nearest mean, which acts as clusters center. Furthermore, we’re going to use Euclidean distance to measure to which cluster we’re going to assign it to.

We’re also going to represent image data in a multi-dimensional vector for further processing. Therefore, when we’re working with grayscale images, we can represent images in 2D vectors, where intensity of each pixel acts as a sample.

However, when we’re working with color images, each pixel holds values for each color channel, so we need to represent images in 3D vectors.

This is also an iterative process, with which we refine mean values to their optimal values. In other words, we slightly change them with each step until they converge and stop changing.

How does region segmentation with k means clustering work?

As we mentioned before, we’re going to use Euclidean distance to partition all pixels to clusters. In the following example I implemented how to partition a grayscale image. Reason for that is, because it’s simpler and more useful for demonstrational purposes.

However, if you want to try and extend the capabilities of my code for color images, you’ll need to use the sum of Euclidean distances from all color channels between pixel values and mean values. In our case, we’ll only need to compute Euclidean distance of pixel intensity and mean intensities of each cluster.

First of all, we need to initialize the process by selecting initial mean values. We can do that by generating random intensity values. However, we need to keep in mind so we don’t generate duplicate values.

Secondly, we need to assign pixel intensities to their clusters. Thirdly, we need to compute new mean values for each cluster by using newly assigned intensities.

And for the last step, we need to compute collective error by summing Euclidean distances between means from previous step and from current step. Furthermore, we’re going to use this value to compare it to a threshold.

If the error is larger than the threshold, we repeat the process by returning to second part of the process. Therefore, it’s going to keep iterating through the process until the error reaches that threshold.

In my case here, I stoped the iteration when the error from previous step and newly computed error mached.

Code

public static Bitmap KMeansClusteringSegmentation(this Bitmap image, int clusters)
     {
         int w = image.Width;
         int h = image.Height;

         BitmapData image_data = image.LockBits(
             new Rectangle(0, 0, w, h),
             ImageLockMode.ReadOnly,
             PixelFormat.Format24bppRgb);

         int bytes = image_data.Stride * image_data.Height;
         byte[] buffer = new byte[bytes];

         Marshal.Copy(image_data.Scan0, buffer, 0, bytes);
         image.UnlockBits(image_data);

         byte[] result = new byte[bytes];
         int[] means = new int[clusters];
         Random rnd = new Random();

         for (int i = 0; i < clusters; i++)
         {
             int init_mean = rnd.Next(1, 255);
             while (means.Contains((byte)init_mean))
             {
                 init_mean = rnd.Next(1, 255);
             }
             means[i] = (byte)init_mean;
         }

         double error = new double();
         List<byte>[] samples = new List<byte>[clusters];

         while (true)
         {
             for (int i = 0; i < clusters; i++)
             {
                 samples[i] = new List<byte>();
             }

             for (int i = 0; i < bytes; i += 3)
             {
                 double norm = 999;
                 int cluster = 0;

                 for (int j = 0; j < clusters; j++)
                 {
                     double temp = Math.Abs(buffer[i] - means[j]);
                     if (norm > temp)
                     {
                         norm = temp;
                         cluster = j;
                     }
                 }
                 samples[cluster].Add(buffer[i]);

                 for (int c = 0; c < 3; c++)
                 {
                     result[i + c] = (byte)means[cluster];
                 }
             }

             int[] new_means = new int[clusters];

             for (int i = 0; i < clusters; i++)
             {
                 for (int j = 0; j < samples[i].Count(); j++)
                 {
                     new_means[i] += samples[i][j];
                 }

                 new_means[i] /= (samples[i].Count() + 1);
             }

             int new_error = 0;

             for (int i = 0; i < clusters; i++)
             {
                 new_error += Math.Abs(means[i] - new_means[i]);
                 means[i] = new_means[i];
             }

             if (error == new_error)
             {
                 break;
             }
             else
             {
                 error = new_error;
             }
         }

         Bitmap res_img = new Bitmap(w, h);
         BitmapData res_data = res_img.LockBits(
             new Rectangle(0, 0, w, h),
             ImageLockMode.WriteOnly,
             PixelFormat.Format24bppRgb);

         Marshal.Copy(result, 0, res_data.Scan0, bytes);
         res_img.UnlockBits(res_data);

         return res_img;
     }

Conclusion

I hope this tutorial was helpful in giving you a better understanding on region segmentation with k means clustering method.

You can also download the demo project and try it out for yourself.

Related Articles

C# Basics

C# Arrays And Everything You Need To Know About Them

C# arrays are data structures for storing multiple items of the same type. This guide shows you how to use different varieties of arrays.

Posted on by Andraz Krzisnik
C# Tutorial

Histogram Equalization And Magic Behind It

This tutorial is meant to demonstrate how histogram equalization works. This is a continuation on contrast stretch articles I’ve made in the past. An article on histogram...

Posted on by Andraz Krzisnik