Yeah, things are getting more interesting, huh? In the last posts we covered linear regression where we fit a straight line to represent the best way possible a set of points. This is a simple and powerful way to predict values. But sometimes instead of predicting a value, we want to classify them.

Why we would do that? I know that is easy to understand but for those who didn’t catch it, why this is interesting?

We humans classify almost everything in our life without noticing it. We are able to recognize, differentiate and categorize stuffs. We know that a dog is a dog and not a wolf. We can recognize people, cars, buildings, etc. Now, imagine if we could make a program which is able to do the same, it wouldn’t be nice?

The ** classification problem is just like the regression problem**, except that the values we now want to predict take on only a small number of discrete values. We’ll cover first binary classification problem which means that we have only two classes.

For example we could classify a house being expensive or not expensive depending on the number of rooms, area, price, city, etc.

In linear regression we were using equation to fit the best curve given a set of points.

In **logistic regression** we don’t want to fit a curve in a set of points as in linear regression but instead classify data in categories. For that we’ll use the sigmoid function that is expressed as:

Having the following shape

Figure 1 – Sigmoid function

This function is interesting since it maps any value of into a number between 0 and 1. So will actually represents the probability that our output is 0 or 1, where 0 or 1 will be our two classes.

As our solution is discrete, we may “round” the result as follow:

Looking at Figure 1 we can see that at we have y = 0.5 so:

where (again) .

Now, if we pay attention, we have two inequalities. This means that now we have a curve and two regions. One when and another when which are our two possible classes.

It worth notice that don’t need to be linear, it can be any polynomial function like .

As in linear regression, we must minimize the error. In logistic regression the error function is defined as

Where the cost function is

if

Figure 2 – Cost error for y = 1

if

Figure 3 – Cost error for y = 0

This two cost functions above indicates that must be equal to to have zero error. If not, the error will grow exponentially.

We can simplify and merge both equations into one like this

Note that using this equation guarantees that the error will be convex for logistic regression, which means we’ll converge to a global minimum when using gradient descent.

Finally our error function will be:

As in linear regression, we can use gradient descent to minimize the error function (eq 3) and find the constants .

We still need to calculate the derivatives and follow the same algorithm.

In each iteration we must update parameters as we did in linear regression

In this post we covered the theory of logistic regression which is very close to linear regression. Besides the name, it’s a classification algorithm which allow us to classify data in two different classes.

In the next post we’ll get our hands dirty and testing this algorithm to classify some data and check how it will perform.

Seeya!!!

]]>

In this post we’ll show another way different to solve the problem of error minimization in linear regression. Instead of using gradient descent, we could solve this linear system using matrices.

As we know, a system of linear equations can be solved by matrices. In this case, we don’t need to define learning rate , we don’t need to iterate, we don’t need to scale features.

We minimize the square errors by taking the derivatives and setting them to zero and then we solve this linear system with the following matrix equation:

**Remember**: is a vector

How this equation works and why? As seen before, once we define the error function, the whole idea is find a way to minimize it. In our case we are using mean squared error (MSE). We can get the minimum error by taking the derivatives:

and setting them to zero

Working on the equation we have

and finally

Now, we can solve this using matrices:

i)

ii)

iii)

You can find a very good (and very long) explanation on how it works here.

I wrote a script on octave to calculate the parameters using normal equation method with both datasets used for simple and multiple linear regression. Here is the results compared to those using gradient descent:

**Simple linear regression**

Starting gradient descent algorithm with and and 10000 iterations

Normal equation | Gradient descent | |
---|---|---|

1.2921 | 1.4826 | |

9.7777 | 0.065248 | |

Error | 53.776 | 55.529 |

We can see that gradient descent finally converged to a local minimum.

**Multiple linear regression**

Starting the algorithm with , and and 10000 iterations

Normal equation | Gradient descent | |
---|---|---|

77.09660 | 73.632116 | |

0.33745 | 0.652041 | |

-1.01820 | -0.859574 | |

Error | 11.529 | 11.716 |

I did another test with 100,000 iterations and for multiple linear regression it finally converged to the global minimum since I got the very same parameters.

As we can see, normal equation is another way to solve linear regression. It’s more direct, it finds the optimum solution and we don’t need any parameters to be tuned. But why we don’t use it always? Why do we need gradient descent?

Well, as anything is perfect, normal equation need much more processing that gradient descent. For few features (less than 10,000 accordingly to Andrew Ng) it’s acceptable but more than that, we can start thinking about gradient descent. Gradient descent can be thought as an approximation of the ideal solution since it may or may not converge to the global minimum.

See ya!

]]>

The last two posts were about linear regression. I explained a little about the theory and I left an example to test the algorithm which actually works but could be improved. How can we do this?

One of problems with gradient descent is when two features are not in the same scale. For example house price ($200,000 – $1,000,000) vs number of bedrooms (1 – 5). When it happens, we are going to descend quickly on small ranges and very slowly in large ranges, which will lead to oscillation and slow convergence.

By scaling the features, the main idea is put all features at the same (or close) scale to converge faster.

Figure 1 – Feature scaled (left) vs unscaled feature (right)

*Image taken from internet*

So we can imagine that the learning rate could work well for one feature, but could be too high for the other. It’s easy to imagine that if we throw a small marble ball in the image on the right it will oscillate a lot until it stops in the middle but it would stop much quicker on the left image.

The idea of scaling a feature is to have the feature’s values between 0 and 1 and then all feature’s values would be pretty close. But how can we scale a feature?

There are several techniques (honestly I don’t know why one is better than other, if anyone knows, please leave a comment =D):

**i) Rescaling**

In this technique, we subtracts the input by the minimum and divide by the range of values (max – min values).

In this technique, we first subtracts input values by the mean of the input values and divide them by the range of values (max – min values).

**iii) Standardization**

In this technique, we first subtracts input values by the mean of the input values and divide them by the standard deviation.

It worth remember that you have to save the values used in the feature scaling as you’ll need after processing data to convert back to real values.

Another issue with gradient descent is that it can oscillate and never converge or overflow. We use the parameter learning rate to tune the speed of convergence. If too high, gradient descent will oscillate and overflow or never converge to the minimum. If too small, it will take too much time (iterations) to converge.

Personally I start with 0.001, 0.01… to check first what happens and increase/decrease the learning rate value in multiples of 3. I found that multiplying by 10 was very often too much.

As mentioned before, there is a way to check if the gradient descent is converging to a value. We can always track the error vs iteration and see what is going on. The error should decrease over time.

We could see that gradient descent can work well but in some conditions it can be unstable or too slow. Using the techniques discussed in this post can help or solve them allowing a better performance of the algorithm.

See you next time.

]]>

In the last post we talked about simple linear regression, where we calculated the “trend line” of a set of points for a single variable. But what if we have more than a single variable? How can we solve it?

Imagine that we have a database with the grades of students with following data: grade, country, school name, grade level, student sex and student age. With multiple linear regression we could try to predict the grade of a person knowing student’s country, school name, grade level, sex and age. Of course we don’t need to use all these information, actually one of main quality of a data scientist is finding the best data to be used. Most of time we have irrelevant data or even data that must be worked on to be useful. An obvious example is student name and this is not (or shouldn’t be) correlated with the grade.

In the simple linear regression we had . In the last post example y was the car price and x was the year of manufacture.

Now, in our student grade we would have something like

where:

= student grade

= country

= school name

= grade level

= student sex

= student age

Do you remember the error function that we had to minimize?

We still can use it with gradient descent as for simple linear regression. But instead of using

we’ll use

.

Or even better, we’ll use

.

where and . This will simplify later in the partial derivatives and the calculation in Octave.

It’s the same approach with the only difference that we’ll have more partial derivatives as shown below:

In the partial derivative , we use n, which can be 0, 1, 2, 3, 4 or 5, to avoid repetition. Don’t forget that is always 1.

Updating the parameters follows the same rules for simple linear regression.

where is the learning rate and n can be 0, 1, 2, 3, 4 or 5.

It’s hard to visualize what happens since we have more than 3 dimensions, so I made tests with two features (age and sex) and the results are in the images below.

First we can see how grade depends on age and sex.

Figure 1 – grade vs age and sex

Of course in this example we don’t have continuous value for sex but the only two possible values are 0 and 1, but the idea for continuous variable is still the same.

We can see that our algorithm is converging with iteration which is a good sign. By the way, for this example, the learning rate was 0.01 instead of 0.001 of the last post.

As we are doing a linear regression, we expect to have a plane which will pass through the sample points right on the “middle”, in other words, right there where the mean squared error (MSE) was minimized. Never forgot that we could found a local minimum instead of global minimum.

Figure 3 – Linear regression result (age)

Figure 4 – Linear regression result (sex)

In figure 3 we can see where the plane in dividing the age samples and in figure 4, how the plane is dividing the sex samples. So, in multiple linear regression, we try to find a way to find the equation which will minimize the MSE accordingly to all features selected.

I did everything in Octave and if you interested in the source code, check my github page at https://github.com/marcelojo/multi_linear_regression.git.

Bye

]]>

After almost two years of personal hard work, I’m back to share a little about linear regression and gradient descent.

What is linear regression?

From wikipedia

In statistics,

linear regressionis a linear approach for modeling the relationship between a scalar dependent variableyand one or more explanatory variables (or independent variables) denotedX.[…]linear regression can be used to fit a predictive model to an observed data set of

yandXvalues. After developing such a model, if an additional value ofXis then given without its accompanying value ofy, the fitted model can be used to make a prediction of the value ofy.

**Simple linear regression** is when *y* is a function of a single variable *x *and **multiple linear regression** is when *y* is a function of multiples variables, like x1, x2, x3, etc. In this post we’ll cover simple linear regression.

In other words and going direct to the point, linear regression is the linear trend line that we know from Excel as in the image below.

Figure 1 – Linear regression

As we can see, we have a set of points where y changes accordingly to a single variable x. Take as an example, the price of a car versus its year of manufacture.

We can find a line equation which represents all the sampled points with the least possible error. With this equation it will be possible to predict y from new x values. The general equation of a straight line is . **So, our problem is to find the best constants a and **

We know what is linear regression and what we can do with, now how can we find the best fit with the least error?

First, how can we calculate the error? There are several ways to calculate it. You could just subtracts the real y by the predicted y. Or you could take the absolute values after subtraction, or you could first subtract, square it and take the average. Indeed, this is the mean square error which is used normally for data fitting which is our case.

The error equation is

h in our case is the a line equation: . So, basically this formula subtracts y from the predicted value, square it to have only positive numbers, and take the average. The here will be useful later for gradient descent, but is not mandatory.

Now, we must find a way to minimize this error. As we know, we’ll predict y with a line equation . It’s where gradient descent kicks in. From wikipedia:

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.

From calculus class we know that with the first derivative we can find the local minimum of a curve. If the curve is a convex one, the local minimum is actually the global one. The idea is very simple but still very powerful and it’s used very often in machine learning.

Since , we must find the partial derivative of the error (equation 1) respect to ** a** and

Equations 2 and 3 will give us the direction towards to the minimum. The size of each step is determined by a parameter called the **learning rate**.

So, the new constants **a** and **b** is given by equations 4 and 5

Everytime we calculate this, we make a step forward to the minimum.

- Define function h. In our case as simple linear regression, we used .
- Define error equation. In our case we used mean square error.
- Calculate and
- Define learning rate –
- Calculate new parameters
**a**and**b**. In next iteration use this new parameters and recalculate again.

In the images below, we can see the final result of our algorithm. We can see that we have a line which is passing close to the middle point of all points, hence minimizing the mean square error. The parameters found after 1000 iteration were:

Figure 2 – Linear regression result

A good way to debug the code is checking the error in every iteration. Here is the plot error vs iteration showing the error decreasing which is a good sign. After about 100 iterations the errors almost didn’t decrease.

We can see as well in every step how we improve our final curve. We started with a = b = 0 which is the x – axis. With the update of * a* and

Figure 4 – Linear regression steps

What happens if we set ** a** and

Figure 5 – Not so optimal result

Although the error decreases and the algorithm converges, we can see that we actually found a local minimum since this line seems to split in half the set of points, but not in a optimal manner. We can even try with 10000 iterations and we’ll got the same result. So, start values for ** a** and

Here some tips:

- Try different learning rates – . It makes the algorithm converge faster. If you error increases after some steps, try decreasing .
- Try different start values for
and**a**, since you can find local minimum.**b** - Test this code with different error function, play with different data, start point for
and*a*, test different values of alpha, understand the code and overall, have fun!*b*

As we can see, linear regression is quite simple but powerful enough to solve simple problems. The most important thing is the concept which will be used for others algorithms that we’ll discuss later on the blog.

You can download this algorithm done in Octave from my github repository at https://github.com/marcelojo/simple_linear_regression.git

]]>

Besides all techniques created until today, every software developed can have bugs. Software we can always be updated to a new versions that fix all those bugs and it doesn’t take anything more than a few mouse clicks. Embedded software ou bare bone firmware can have bugs too, but update to new versions is not always that simple as in computers software.

In embedded systems we need to develop a piece of code which (in theory) is programmed only once and is the very first program to run after a reset. This program is known as bootloader. If for any reason we need to upgrade a bootloader, care must be taken because if the upgrade fail, anything will work after a reset.

In bigger embedded systems, normally the bootloader already exists and is quite stable and robust, i.e. [1]Das UBoot. Nowadays lots of MCU have already an embedded a bootloader where we can update main firmware by some serial communication, i.e. SPI and UART. But sometimes there is a need to develop a custom bootloader and this is the subject of this post.

In this post we’ll use the microcontroller STM32F030CC, an ARM Cortex-M0 with 256k of flash. As mentioned before, the bootloader is the very first program to be executed after a reset. This MCU has the following memory map.

Figure 1 – STM32F0 memory map

After a reset, this microcontroller fetchs instructions from address 0x0000 0000. Notice that the flash memory starts at address 0x0800 0000. This MCU has the capability to remap address 0x0000 0000 to the flash, RAM or system memory which is the embedded bootloader.

This means that if the RAM is remapped to 0x0000 0000, accessing address 0x0000 1000 is the same as accessing address 0x2000 1000. After a reset, by default, internal flash is remapped to 0x0000 0000. If no data is programmed in the first address, the microcontroller will remap system memory and then execute internal bootloader automatically.

Once we know all this, we must prepare the custom bootloader to be [2] linked on the beginning of internal flash and the main firmware in some memory after bootloader. As an example, we could have the first 4k for bootloader and the rest of the flash for main app.

Figure 2 – Flash split in two regions

The STM32F030CC has a [3][4] interrupt vector table located on the first 192 bytes of memory.

Figure 3 – Interrupt vector table STM32F030CC

As we can see above, the first address actually stores the initial stack pointer value. The second address stores the address of the reset interrupt handler. This is important to understand because later we’ll how we remap the interrupt vectors in main firmware.

Every code compiled is linked. The linker needs a linker file (a [5] .ld file) where it gets all instructions to place functions, variables etc in the correct address. For GNU linker, the memory is defined in the section MEMORY. In our example, we’ll leave 4k for the bootloader and the rest for the main application. Just remember that this script can and should be used by bootloader and main firmware.

/* Memory definition */ MEMORY { RAM (xrw) : ORIGIN = 0x200000C0, LENGTH = 32K - 192 /* 192 for vector table */ BOOTLOADER (rx) : ORIGIN = 0x08000000, LENGTH = 4K FIRMWARE (rx) : ORIGIN = 0x08001000, LENGTH = 256K - 4K } /* Main app start address symbol */ _main_app_start_address = 0x08001000; /* For main application, change BOOTLOADER by FIRMWARE */ REGION_ALIAS("ROM", BOOTLOADER ); /* Sections */ SECTIONS { /* The startup code into ROM memory */ .isr_vector : { . = ALIGN(4); KEEP(*(.isr_vector)) /* Startup code */ . = ALIGN(4); } >ROM /* The program code and other data into ROM memory */ .text : { . = ALIGN(4); *(.text) /* .text sections (code) */ *(.text*) /* .text* sections (code) */ *(.glue_7) /* glue arm to thumb code */ *(.glue_7t) /* glue thumb to arm code */ *(.eh_frame) KEEP (*(.init)) KEEP (*(.fini)) . = ALIGN(4); _etext = .; /* define a global symbols at end of code */ } >ROM ...

There are two regions defined, BOOTLOADER which starts at 0x0800 0000 with 4k and FIRMWARE which starts at 0x0800 1000 with 252k.

We defined a symbol to indicate where the main application starts. This is useful to avoid having [6] magic numbers and to get things easier if we need to change memories later.

Just pay attention that in our script the internal flash is defined as ROM. We can see that in the script file .isr_vector, .text are placed in flash (>ROM). Sometimes it can be another name, just check the correct name and set REGION_ALIAS and you are all set.

It’s important to notice to tell to the linker to not use the first 192 bytes of the RAM since they will be used to remap the interrupt vector table. So, in MEMORY section, our RAM starts at address 0x2000 00C0 and has 32k – 192 bytes.

The bootloader in most of cases, must be a simple and robust program with only one goal: receive an image and reprogram internal flash. Keep it simple reduce code size and is decreases the possibility to have bugs.

The image can be an [7] .hex file which is generated by the toolchain. We won’t describe in detail how we receive an image, how we can store it and how we write it on flash because this process is straightforward. But in generally, the process consist in sending the image to the MCU which can save on an external EEPROM/flash or even in a region of internal flash. It’s always a good idea to calculate the image CRC to be sure that we received everything correct. Once image received, it’s only a matter to copy it to the internal flash following the address indicated on .hex file.

Now, after image written on flash, the bootloader must take some actions before jump to main program:

- Reconfigure stack pointer
- Disable global and specific peripheral interrupts

/** * Jump to application */ void JumptoApp(void) { // disable global interrupt __disable_irq(); // Disable all peripheral interrupts CPU_NVIC_DisableIRQ(SysTick_IRQn); CPU_NVIC_DisableIRQ(USART2_IRQn); CPU_NVIC_DisableIRQ(WWDG_IRQn); CPU_NVIC_DisableIRQ(RTC_IRQn); CPU_NVIC_DisableIRQ(FLASH_IRQn); CPU_NVIC_DisableIRQ(RCC_IRQn); CPU_NVIC_DisableIRQ(EXTI0_1_IRQn); CPU_NVIC_DisableIRQ(EXTI2_3_IRQn); CPU_NVIC_DisableIRQ(EXTI4_15_IRQn); CPU_NVIC_DisableIRQ(DMA1_Channel1_IRQn); CPU_NVIC_DisableIRQ(DMA1_Channel2_3_IRQn); CPU_NVIC_DisableIRQ(DMA1_Channel4_5_IRQn); CPU_NVIC_DisableIRQ(ADC1_IRQn); CPU_NVIC_DisableIRQ(TIM1_BRK_UP_TRG_COM_IRQn); CPU_NVIC_DisableIRQ(TIM1_CC_IRQn); CPU_NVIC_DisableIRQ(TIM3_IRQn); CPU_NVIC_DisableIRQ(TIM6_IRQn); CPU_NVIC_DisableIRQ(TIM7_IRQn); CPU_NVIC_DisableIRQ(TIM14_IRQn); CPU_NVIC_DisableIRQ(TIM15_IRQn); CPU_NVIC_DisableIRQ(TIM16_IRQn); CPU_NVIC_DisableIRQ(TIM17_IRQn); CPU_NVIC_DisableIRQ(I2C1_IRQn); CPU_NVIC_DisableIRQ(I2C2_IRQn); CPU_NVIC_DisableIRQ(SPI1_IRQn); CPU_NVIC_DisableIRQ(SPI2_IRQn); CPU_NVIC_DisableIRQ(USART1_IRQn); CPU_NVIC_DisableIRQ(USART2_IRQn); CPU_NVIC_DisableIRQ(USART3_6_IRQn); // main app start address defined in linker file extern uint32_t _main_app_start_address; uint32_t MemoryAddr = (uint32_t)&_main_app_start_address; uint32_t *pMem = (uint32_t *)MemoryAddr; // First address is the stack pointer initial value __set_MSP(*pMem); // Set stack pointer // Now get main app entry point address pMem++; void (*pMainApp)(void) = (void (*)(void))(*pMem); // Jump to main application (0x0800 0004) pMainApp(); }

This is the program which will be executed after the bootloader. Since we have a bootloader which was running before, all interrupt vectors are referenced not referenced to the main application. Instead they are pointing to interrupt handler in bootloader. So we need to remap the interrupt vector to point to main firmware. On Cortex-M0 microcontrollers, this is done by copying the interrupt vector table to the RAM and then remapping address 0x0000 0000 to the RAM as explained before. When an interrupt occurs, it will access somewhere between 0x0000 0000 and 0x0000 00C0 which finally is the vector table in RAM.

This remap must be done as soon as possible in the main firmware and, of course, before any interrupt be enabled. Below we have an example, just keep in mind that the code uses some API functions from [8] STM Cube Mx.

#define FIRMWARE_START_ADDR (uint32_t)(&_main_app_start_address)

void Remap_Table(void) { // Copy interrupt vector table to the RAM. volatile uint32_t *VectorTable = (volatile uint32_t *)0x20000000; uint32_t ui32_VectorIndex = 0; for(ui32_VectorIndex = 0; ui32_VectorIndex < 48; ui32_VectorIndex++) { VectorTable[ui32_VectorIndex] = *(__IO uint32_t*)((uint32_t)FIRMWARE_START_ADDR + (ui32_VectorIndex << 2)); } __HAL_RCC_AHB_FORCE_RESET(); // Enable SYSCFG peripheral clock __HAL_RCC_SYSCFG_CLK_ENABLE(); __HAL_RCC_AHB_RELEASE_RESET(); // Remap RAM into 0x0000 0000 __HAL_SYSCFG_REMAPMEMORY_SRAM(); }

Figure 4 – Interrupt vector table remap diagram

UPDATE: Don’t forget to enable global irq in main app in order to enable interrupts after initialization, since it has been disabled before jumping to main app.

__enable_irq();

Embedded system has increased complexity and is very easy to have bugs. We need a way to update firmware in order to fix problems without a specific programmer ou specific tool and if possible remotely. A bootloader can get things much easier for this kind of task.

When developping a bootloader, we need to pay attention in few items, listed belos:

- Create two separated memory regions in linker
- Process an image, send to bootloader which must save somewhere. The use of CRC is a must to avoid corrupted image.
- After reprogram flash, disable all interrupts, set stack pointer and jump to main application
- In the main application, remap interrupt vector table as soon as possible.

In this post we just discussed the main steps to develop a custom bootloader. There are some issues important not discussed here like security. The image can be encrypted and then saved in flash to avoid illegal copies. There are mechanisms to increase robustness of the bootloader but this is a personal choice and sometimes a constraint in the system.

]]>

In computer vision, the field of stereo vision is getting more and more attention. Mobile robotics navigation, augmented reality, 3D sensing, 3D scanning, 3D tracking are some examples of stereo vision applications.

In order to have accurate measure of the 3D world, the cameras must be calibrated. This is the basis of all stereo vision system. But why do we need to calibrate the cameras? First, camera’s sensor and lenses are not perfect and not perfectly assembled either. Each and every camera has different errors and tolerances which can’t be generalized. Second, in a stereo system, we must know where both cameras are installed, at which distance and angle of each other.

As mentionned before, in a computer vision system, the cameras must be calibrated. But what actually must be calibrated? Using the simplest model of camera, the pinhole model, there are two sets of parameters, known as intrinsic and extrinsic parameters.

Intrinsic parameters are related to all internal camera parameters such as: focal length, principal point, lens distortions, etc.

Using homogeneous coordinates, we can model intrinsic parameters using the following matrix:

Where alpha x is the product of the physical focal length (fx) of the pinhole and the size of the individual imager elements (σx), given by pixels/millimiters. Similar hold for focal length fy and σy.

u0 and v0 model the displacement between the optical axis and of the center of coordinates on the projection screen.

For square imager fx and fy are equal. For low-cost cameras it can be rectangular and then fx is different of fy.

Sometimes K also has a skew parameter, here represented by S.

Extrinsic parameters are related to the relative position of the camera in relation to the object coordinate system. In a multi-camera systems, the extrinsic parameters also describe the relationship between the cameras.

Figure 1 – Camera versus world coordinate

In a 3D world, we can see that a camera can be rotated in all three axis and have a displacement in relation to the object coordinate system.

The three possibles rotations have the following matrix:

We define R as the product of Rx, Ry and Rz.

The translation vector can be represented by T and then the extrinsic parameters of the camera can be modeled as

In the next posts we’ll cover some classic camera calibration methods. See you soon!

]]>In the last post I explained briefly what are histograms and today I’ll continue a little more and show how it can be useful to process images. Some examples of how histograms can be used are: image enhancement, texture classification, image segmentation, etc. In this post I’ll cover the most common use for histogram which is histogram equalization.

As discussed before, an histogram give us a rough idea of how an image is, if it’s too dark or too bright for example. In the first case we will note that most of pixels have low values and in the second case, most of the pixels have high values and maybe they are not equally spread in the range of 0 and 255. The idea of this technique is to get the maximum variance of histogram of an image and then improve image’s contrast. Figure 1 shows the waveform of an ideal histogram equalization.

Figure 1

Now take a look in figure 2 which is dark and then has histogram with low values only, less then 100.

Figure 2

The histogram can be plot in octave using the following commands:

img = imread('lena_dark.png'); h = imhist(img); plot(h);

Figure 3 shows figure 2 after an histogram equalization:

We can see clearly that figure 3 improved image’s contrast. Note what happened with its histogram. We can see that pixels now are spread from 0 to 255 and roughly with same quantity of pixels for each level.

Figure 4 is the result of histogram equalization from the image which was brighter in the last post.

Pay attention that the images in figure 3 and 4 are not equals and nor their histograms.

Finally, let’s take a look in the original image in grayscale and after an histogram equalization. We can see that we have a better contrast as expected.

This technique is very useful to improve an image contrast. A practical use could be to improve fingerprints images to have a better minutia detection and then fingerprint recognition. It worth to mention that this technique has some limitations and some variations and extensions exist, like histogram matching and local histogram equalization where we process only a part of an image instead of the whole image at once.

Bye

]]>Today I’ll cover a simple subject, image histogram. What is a histogram? Histogram is a graphical representation of a set of data separated in differents classes. It is represented by vertical bars where the base represents the class and the height represents the frequency/quantity of how many times it happened. Yes, it seems to be more complicated when we try to explain something really easy. Look at the figure 1 below

Figure 1

This is an example of histogram where we count the (fictitious) quantity of engineer who are still working in R&D from 25 to 55 years old. We can see that as people are getting older and more experienced, less they work in R&D and maybe more as project manager for example.

Ok, how does it applies do images? That’s simple. We get an gray scaled image and we count how many pixels for each pixel level, from 0 to 255. Below there are 3 examples:

Figure 2

Figure 3

Figure 4

In figure 2, we can see that the image has pixels values from about 25 up to 240 and they are evenly distributed. This means that this image isn’t too dark or to bright in general as we can see in the Lena’s photo in the left. But in figure 3, the histogram tell us that we have only pixels values from about 15 up to 90, which are low values of intensity. This means that the image seems to be in general a little dark. In figure 4 the image has pixels almost all values as in figure 2. We have pixels from about 45 to 255, but we can see that most of the pixels, about 35000, has the maximum intensity and then we can conclude that the image should be too bright.

As you can see, histograms can tell us a little about an image. It doesn’t give us detailed information about the image, but an overview of how it is, related to its intensity. In the next post, I’ll show you how we can use the informations from histograms to improve image quality with simple techniques.

Bye bye

]]>Today it’s a small tutorial to install SimpleCV in Ubuntu. SimpleCV is a opensource framework for computer vision application development. It’s not a replacement for OpenCV, indeed it uses OpenCV as others libraries. As they say in their website: **This is computer vision made easy. **

Personnaly I think this is a good tool to create prototypes, try new ideas, algorithms, etc. since we write applications in Python. If the idea is approved you can optimize your solution with OpenCV. It provides an interactive shell, like octave/matlab, where we can write small scripts esay and fast.

This project is very well maintened, lots of docs, how to, tutorials. You can visit the project page here.

**Installing SimpleCV**

As mentionned before, SimpleCV uses OpenCV and then you need to install it before continue. You can check this post to learn how to install OpenCV in Ubuntu. A little note here: as in november/2014, SimpleCV wasn’t compatible with OpenCV 3.0 and because of that in this tutorial we’ll use OpenCV 2.4.9 and SimpleCV 1.3.

1) First install dependencies

sudo apt-get install ipython python-opencv python-scipy python-numpy python-pygame python-setuptools python-pip

2) Download SimpleCV last version here. If you prefer, you can clone directly from github here.

3) In the directory where you saved SimpleCV type

sudo dpkg -i ./SimpleCV-1.3.deb

or

$ git clone https://github.com/sightmachine/SimpleCV.git $ cd SimpleCV/ $ sudo pip install -r requirements.txt $ sudo python setup.py install

4) To test is really easy.

$ mkdir test_simplecv $ cd test_simplecv $ wget http://marcelojo.org/marcelojoeng/wp-content/uploads/2014/11/lenna.png $ simplecv SimpleCV:1>img = Image("lena_result-e1408715225232.png") SimpleCV:1>img.show()

If everything is right, you will see this classic picture of Lenna.

As you can see, SimpleCV is really easy. In the next posts we’ll make use of this 3 tools: Octave, OpenCV and SimpleCV, showing some theorie and implementation as well.

]]>