Computer Vision is a subject I have really enjoyed playing with for some time. Vision is one of those problems that is so intractable one can’t help but feel excited by making even slight progress in automatic processing. There are many excellent texts on applications in this arena, but they usually focus on imperative solutions driven by C style code or Matlab. For all of the rich advances in this field, I see few who have chosen a functional language as their teaching tool. I believe this is unfortunate because functional programming could bring a great deal of clarity to the student, who often spends much of their time trying to figure out how to translate equations to code.In this series I focus on clarity rather than efficiency. I believe a motivated reader could take what I offer here and design a very efficient library by making a few fundamental adjustments in the approach. However, I do not intend to be inefficient in ways that are unnecessary. If you see that I have been, please do leave a comment.
First the dirty work, we need to get our images into a format we can easily manipulate.
In order to accomplish that we are going to pull the data out of a Bitmap object. To do that quickly, we are going to use a pointer to access the memory directly. Before I reveal that solution, I will preface it with the following disclaimer. I would avoid using direct memory access in F# like the plague. The F# team put a lot of effort into providing a safe and verifiable programming language, and you throw most of that away when you do direct memory access. The documentation is also a little sparse for direct memory access in F#, although Expert F# has a fantastic chapter on C and COM interop which was very helpful for me. Generally, the combination of all of that safety being discarded, and the lack of experience out there with native pointers in F# would lead me to try other approaches. However, I am going to illustrate how I would do it in F#, as I think it will be an illuminating exercise. This code set will be designed so that the direct memory access only happens in one place. The place which converts a bitmap into our internal format, and vice versa.
We will be using grayscale images to start, but I may post something about color later. So the interface I am providing here will return a representation that is the mean of the red green and blue values. I will access these byte by byte after retrieving a BitmapData object. I am assuming a 4 byte per pixel format, so if you want to play with this code on other images in different formats you will have to extend that logic.
Our first function, fromBitmap, takes a Bitmap and creates a 2 dimensional array of short integers of the same size. Note the first dimension of the array is height and the second is width. This is transposed from most people’s typical view of an image on the screen, but we will address that later. It then populates the array by moving the pointer through the entire image in a nested for loop. It depends on the helper function getAvgPosition which takes a pointer and returns the mean of the three succeeding bytes (red green and blue).
// Average the three bytes at the pointer (r,g,b)
let getAvgPosition p =
(int16 (NativePtr.get p 0) +
int16 (NativePtr.get p 1) +
int16 (NativePtr.get p 2))
/ (int16 3)
let fromBitmap (image:Bitmap) =
// Create the array
let (arr2:int16[,]) = Array2.create image.Height image.Width (int16 0)
let bd = image.LockBits(Rectangle(0,0,image.Width,image.Height),ImageLockMode.ReadWrite,PixelFormat.Format32bppArgb)
let mutable (p:nativeptr<byte>) = NativePtr.of_nativeint (bd.Scan0)
for i=0 to image.Height-1 do
for j=0 to image.Width-1 do
// Populate our 2 dim array with the average of the rgb bytes.
arr2.[i,j] <- getAvgPosition p
// Go to the next position
p <- NativePtr.add p 4
done
// The stride - the whole length (multiplied by four to account
// for the fact that we are looking at 4 byte pixels
p <- NativePtr.add p (bd.Stride - bd.Width*4)
done
// Unlock the image bytes
image.UnlockBits(bd)
// Return the array
arr2
Once we have completed our image manipulation, we write it back to a bitmap so that it can be output to a display or file. This is essentially the same procedure as above in reverse. One difference is that, where before we had three color values to average together, here we write the same value for each of the three colors. The function is called toBitmap. It depends on the helper function setPosition which writes the value to the pointer location in memory.
/// Sets the three bytes following the given pointer to v
let private setPosition p i j v =
NativePtr.set p 0 (byte v)
NativePtr.set p 1 (byte v)
NativePtr.set p 2 (byte v)
let toBitmap (arr:int16[,]) =
// Create the bitmap
let image = new Bitmap(arr.GetLength(1),arr.GetLength(0))
// Get the bitmap data for a 32 bpp bitmap with a Read Write lock
let bd = image.LockBits(Rectangle(0,0,image.Width,image.Height),ImageLockMode.ReadWrite,PixelFormat.Format32bppArgb)
// Setup the pointer
let mutable (p:nativeptr<byte>) = NativePtr.of_nativeint (bd.Scan0)
//
for i=0 to image.Height-1 do
for j=0 to image.Width-1 do
setPosition p i j (arr.[i,j])
p <- NativePtr.add p 4
done
// The stride - the whole length (multiplied by four to account
// for the fact that we are looking at 4 byte pixels
p <- NativePtr.add p (bd.Stride - bd.Width*4)
done
// Unlock the image bytes
image.UnlockBits(bd)
image
Now, we can use the interactive session to play with this. You can do this in Visual Studio by opening up the entire code listing below, highlighting the entire thing and pressing Alt-Enter (you can open the interactive session window in Visual Studio under Tools -> Add-in Manager…). I store all of this in a module called NativeImageHandler so we have to open that.I flew into Hong Kong airport recently, and being curious I looked up a birds eye view on Google Earth. We can use that as our example image.
Now, executing the following code on that image …
> open NativeImageHandler;; > let arr = fromBitmap(new Bitmap(@"c:\users\chance\pictures\HongKongAirport.jpg"));; > let b = toBitmap(arr);; > b.Save(@"C:\users\chance\pictures\HongKongAirportBW.jpg",System.Drawing.Imaging.ImageFormat.Jpeg);;
Results in this image …
In writing this we have accomplished a very basic image processing task. We can now take color images and convert them to grayscale. In future posts I will include gathering the histogram, inverting the image, filters and edge detection. Time permitting I may even get into some object detection subjects and dealing with color. All of that will be based on this foundation in which we bring the data into a format for manipulation.
Complete code listing is below.
#light
open System
open Microsoft.FSharp.NativeInterop
open System.Drawing
open System.Drawing.Imaging
module NativeImageHandler = begin
// Average the three bytes at the pointer (r,g,b)
let getAvgPosition p =
(int16 (NativePtr.get p 0) +
int16 (NativePtr.get p 1) +
int16 (NativePtr.get p 2))
/ (int16 3)
let fromBitmap (image:Bitmap) =
// Create the array
let (arr2:int16[,]) = Array2.create image.Height image.Width (int16 0)
let bd = image.LockBits(Rectangle(0,0,image.Width,image.Height),ImageLockMode.ReadWrite,PixelFormat.Format32bppArgb)
let mutable (p:nativeptr<byte>) = NativePtr.of_nativeint (bd.Scan0)
for i=0 to image.Height-1 do
for j=0 to image.Width-1 do
// Populate our 2 dim array with the average of the rgb bytes.
arr2.[i,j] <- getAvgPosition p
// Go to the next position
p <- NativePtr.add p 4
done
// The stride - the whole length (multiplied by four to account
// for the fact that we are looking at 4 byte pixels
p <- NativePtr.add p (bd.Stride - bd.Width*4)
done
// Unlock the image bytes
image.UnlockBits(bd)
// Return the array
arr2
/// Sets the three bytes following the given pointer to v
let private setPosition p i j v =
NativePtr.set p 0 (byte v)
NativePtr.set p 1 (byte v)
NativePtr.set p 2 (byte v)
let toBitmap (arr:int16[,]) =
// Create the bitmap
let image = new Bitmap(arr.GetLength(1),arr.GetLength(0))
// Get the bitmap data for a 32 bpp bitmap with a Read Write lock
let bd = image.LockBits(Rectangle(0,0,image.Width,image.Height),ImageLockMode.ReadWrite,PixelFormat.Format32bppArgb)
// Setup the pointer
let mutable (p:nativeptr<byte>) = NativePtr.of_nativeint (bd.Scan0)
//
for i=0 to image.Height-1 do
for j=0 to image.Width-1 do
setPosition p i j (arr.[i,j])
p <- NativePtr.add p 4
done
// The stride - the whole length (multiplied by four to account
// for the fact that we are looking at 4 byte pixels
p <- NativePtr.add p (bd.Stride - bd.Width*4)
done
// Unlock the image bytes
image.UnlockBits(bd)
image
end

