Brightness

Starting from an RGB value there are various ways of determining the brightness of that pixel.  The simplest is to compute the average of the R, G and B channels.

        private double GetBrightness(Color clr)
        {
            return (clr.R + clr.G + clr.B) / 3.0; 
        }

Another is to use the HSV hexcone model which takes the maximum of all the colors

        private double GetBrightnessHSV(Color clr)
        {
            return Math.Max(Math.Max(clr.R, clr.G), clr.B);
        } 

Another is to use the HSL method which averages the maximum and minimum of the colors

        private double GetBrightnessHSL(Color clr)
        {
            return 0.5 * Math.Max(Math.Max(clr.R, clr.G), clr.B) + 0.5 * Math.Min(Math.Min(clr.R, clr.G), clr.B);
        }

There is also luma which is based upon the weighted average of gamma corrected RGB values.  Naturally there are different corrections depending upon the source.  Rec.601 refers to NTSC sources.  Rec 709 to sRGB and Rec 2020 to UHDTV.  Noting that in the .net framework each color is represented with 8 bits per color.  Rec 2020 calls for 10 or 12 bits per sample, expanding the range of possible color representation, this requires a different “color” object to represent the pixel’s value.

        private double GetBrightnessLumaRec2020(UHDColor clr)
        {
            return 0.2627 * clr.R + 0.6780 * clr.G + 0.0593 * clr.B; // rec 2020 Luma coefficients
        }

        private double GetBrightnessLumaRec601(Color clr)
        {
            return 0.30 * clr.R + 0.59 * clr.G + 0.11 * clr.B; // rec 601 Luma coefficients
        }

        private double GetBrightnessLumaRec709(Color clr)
        {
            return 0.21 * clr.R + 0.72 * clr.G + 0.07 * clr.B; // rec 709 Luma coefficients
        } 

All of these approximations have flaws, the most accurate representation we have appears to be the latest standard from the International Commission on Illumination (yes it really exists!) called CIECAM02.  This appears to be implemented in Windows from Vista onwards but not yet available in .net.

Working with an Image

Our budding DevOps engineer will need to be able to look at diagrams and understand them.  In order to be able to read a diagram we will need to be able to pick out objects from the background and capture the text associated with the objects.  The most general case if for this to be an image (Visio plugins could help pull out the underlying object structure but not necessarily the text that goes with each object, nor necessarily the associations between objects which might be joining or non joining lines or based upon underlays or overlaps of other images)

If we think about it, although the image is itself flat, we humans are able to see the difference between the objects in the picture and determine where the boundaries are.  If such a picture is initially held as a byte array and we are able to understand the displayed picture as a layered image, potentially with depth and shadow, then all such information necessary to determine this is in that byte array that we started with.

So first lets convert our bitmap into a byte array so we can do something more fancy with it then we can with it as a bitmap.  Also has the side effect of being much faster to operate with then directly manipulating the bitmap with .net.  Perhaps it will be more useful as a 2 dimensional byte array.

        public byte[,] GetImage(string filename)
        {
            Bitmap bmap = new Bitmap(filename);
            int colorDepth = Bitmap.GetPixelFormatSize(bmap.PixelFormat);
            int sizex = bmap.Width;
            int sizey = bmap.Height;
            int bytesPerPixel = colorDepth / 8;
            int pixelCount = sizex * sizey;
            byte[] pixels = new byte[pixelCount * bytesPerPixel];

            Rectangle rect = new Rectangle(0, 0, sizex, sizey);
            var bitmapData = bmap.LockBits(rect, ImageLockMode.ReadWrite,
                  bmap.PixelFormat);
            IntPtr Iptr = bitmapData.Scan0;

            // Copy data from pointer to array
            Marshal.Copy(Iptr, pixels, 0, pixels.Length);
            bmap.UnlockBits(bitmapData);
            byte[,] pixelgrid = pixels.ToSquare2D(sizex * bytesPerPixel);
            return pixelgrid;
        }	

The ToSquare2D extension method I used was originally posted by ZenLulz here.  I have kept the extension method however have replaced its insides with BlockCopy which appears to be faster.

Buffer.BlockCopy(array, 0, buffer, 0, array.Length);

One of the first things we notice if we open up a picture is that nearly every single pixel is different.  Even areas which might look visually identical can have little variations, RGB(255,0,0) looks extremely similar to RGB(254,1,2) if it is not identical.  So if it is identical to me and I can read the picture then it is too much information.  Of course those subtle differences might be what helps us determine orientation and depth in an otherwise flat image.

Introduction

Artificial intelligence is a topic which has fascinated me for a long time.  In my early days of programming the universe expanded for me when I first wrote some code which wrote code and I thought at the time “if software can write software, then where is the limit?”.  Actually my naivety at the time is now somewhat sobering but the excitement remained.  The obvious question being how does the computer know what to write.  Well from the same way us humans do, by getting requirements.  Ahh, but computers can’t understand written English.  So began a discovery for me as I started to explore exactly that sentence.

Computers can’t understand written English.

There are 2 key items here.  Understanding and written English.  Understanding, what is “understanding” exactly and how does it come about?  is it codifiable or is it emergent from a set of abilities such as logic inference and relational analysis.  These are questions to which we have no answers yet but many theories and many clever people working on different approaches.  The answer may come but for now the 2nd key item is something which looks far easier to deal with.

Written English – it brings to mind the expression “written in plain English” to indicate something should be simple to understand.  However when the starting point is a non-intelligent computer this is some kind of vastly amusing joke.  Our simplest written language is anything but plain, actually it is incredibly complex.  Researchers across the world have made great in roads into extracting this complexity and got stuck.  For the simple reason that language outside of context is meaningless.  The text “I bought a new red car” can have its grammatical elements isolated with good accuracy but it has no meaning if you don’t have a concept of self, do not understand the passage of time such that “new” is significant and what it implies, have no idea what “red” is let alone what the noun “car” refers to.

So perhaps we need to work on concepts first and how to represent those.  Concepts are also a topic subject to much research with great efforts being made to model words and their relationships, establish hierarchies of concepts and the like.  Alternatively perhaps we need to emulate the way humans learn, with our senses and extract the concepts from the sense information.  Vision is a key aspect of this and a number of companies are already working on the first practical aspects of image recognition technology.

There are theories which say that we generate a virtual world in our head and that not only is the sense information used to update that world but also our own thoughts where we run “simulations” in this virtual world.  With robotics we are in a way reverse engineering how humans operate in our environment and gaining valuable insight from that and at the same time we are learning ever more about the brain and how it functions.  Lots of research, lots of different approaches with a myriad of different aims.  Hence this blog.

This blog is about an exploration into the various techniques being explored by researchers,  the code (in c#) and a journey of an attempt to assemble software which can emulate a devops engineer.