Genstego: Image Steganography Based on a Genetic Algorithm

I don't really know what should someone have to say as the first words on a blog. So… I think I'm going to say a simple: 'Hello there!', and write about this project I made on February as a final assignment for the Biologically inspired computation subject of my master degree. The project is made with Python and is inspired on ¹. If you are interested, you can find it on GitHub.

For those that don't know what steganography means, I'm going to try to summarize what I know.

Modern steganography is the practice of concealing digital information: such as messages, audio or images, within another digital information. The main objective is sending concealed messages in such way the fact of sending data is disguised.

This implies, that the data can't be discovered by the bare eye. And only the interlocutors know how to look the secret message.

Genstego is an image steganography package based on deap. It is able to embed/decode secret images into/from a host image. The main idea of genstego, is to model steganography as a search and optimization problem using genetic algorithms. The genetic algorithm allows finding an optimal stochastic solution. Therefore, making the steganalysis more complex.

If you don't know anything about image steganography, don't worry about what I said on the last paragraph and read the short introduction below.

Basic image steganography

In order to conceal an image inside another image we need to know how data is represented. In our case we are talking about media. And, How digital media are represented? I think you guess it: bits.

So, we need to alter the host image bits and embed inside, the bits of the secret image, in such way the host image seems unaltered.

Images are represented as a matrix of pixels within values between 0-255. That's equivalent to 8 bits. One of the most basic techniques of image steganography consists on altering the LSB (Least Significant Bit) of each pixel of the host image changing it with the sequence of bits of the secret image. That means that we need 8 pixels of the host in order to conceal 1 secret pixel. So, in order to conceal a 16x16 image you need at least a 46x46 host image. Finally, if you want to read the secret image you just need to look at the LSB bits of the host pixels and reconstruct the secret image.

However, this isn't always the best way to embed the secret image. The reason is that someone with a little bit of knowledge about this technique can easily decode our secret message. Moreover, using only the LSB we are wasting a lot of space in our host image that can be used by our secret image. That's why modeling steganography as a search and optimization problem makes sense because we can find an optimal way to embed the secret message.

Steganography as a search and optimization problem

I used a genetic algorithm in order to achieve this. This has some advantages. You can obtain different optimal solutions on the last generations. And the cool thing is that the results are stochastic, so it doesn't rely on a specific method to embed the secret image. On the other hand, genetic algorithms are iterative and converge too slow.

In order to run the genetic algorithm we need to define how a chromosome is and what means each gene. Each chromosome will depict a different way to encode and decode the secret image. Also we need a fitness function that measure how good a chromosome is. I chose the PSNR function used to measure the quality of compressed images.

Genetic codification

The following chromosome is composed of 7 genes as you can see in the next table.

Gene name	Range	Length	Description
Direction	0-7	3 Bits	Direction of host image pixel scanning
X-offset	0-255	8 Bits	x-offset of the starting point
Y-offset	0-255	8 Bits	y-offset of the starting point
Bit-Planes	0-15	4 Bits	Used LSBs for bit insertion
SB-Pole	0-1	1 Bit	Secret bit inversion
SB-Dire	0-1	1 Bit	Secret bit direction
BP-Dire	0-1	1 Bit	Bit planes direction

The full chromosome has a length of 26 bits. I'm going to explain in detail what means each gene.

Direction

Represents the direction in which the image is scanned in order to embed and decode the secret bits.

X-offset and Y-offset

Represents the column and row where the scan starts. Has a length of a Byte because I've used host images of 256x256 width and height. But can be tweaked easily.

Bit-Planes

Has 4 bits and depict which bits of the host image are used to embed the secret bits. For example if Bit-Planes are 0001 we use the LSB technique. If there are 0101 the third and less significative bit are used.

SB-Pole

If the value is 1, the one's compliment is applied to the secret bit sequence. For example the sequence 1001 should be changed to 0110.

SB-Dire

If the value is 1 the sequence is inverted. For example the 0001 should be changed to 1000.

BP-Dire

When the value is 1 the selected Bit-Planes are embedded in the 4 LSB of the pixel. When is 0, the 4 MSB (Most Significant Bits) are used.

Fitness: PSNR function

The fitness function is the Peak Signal to Noise Ratio. Is measured by Decibels (dB) and makes us able to measure the quality of the different stego images. It relates \(MAX\) the maximum energy representation of a signal, in our case 255 for a pixel, with the noise introduced in the host image, measured by the \(MSE\) mean squared error.

\[ PSNR = 10 \cdot log_{10}(\frac{MAX_i^2}{MSE}) \]

\[ MSE = \frac{1}{MN} \sum_{i=0}^{M-1}\sum_{j=0}^{N-1}||I(k,j) - K(i,j)||^2 \]

In conclusion, we want to maximize this function. Thus higher values depict a less noisy representation of the host image.

Aspects of the genetic algorithm

The algorithm is so simple an I'm going to shortly summarize some of the main details.

Firstly, the individuals are randomly initialised and evaluated. Then, a new population of individuals are selected by tournament. Next, the individuals are mate executing a two-point crossover. And finally, the new individuals mute with a probability by flipping random genes of their chromosomes applying the not operand.

As you can see below, the algorithm is very simple and you can find it here on deap.

1evaluate(population)
2for g in range(ngen):
3    population = select(population, len(population))
4    offspring = varAnd(population, toolbox, cxpb, mutpb)
5    evaluate(offspring)
6    population = offspring

Embedding and reading a secret message

In order to embed the message using one of the individuals, host and secret images, are flatten into a sequence of bits. For the host image, the sequence is flatten using the correspondent Direction and starting from x and y offset. Then the secret raw bits are changed depending on BP-Pole and SB-Dir. At this point, if the sequence of secret bits is larger than the available bits in the host image, the individual is set a 0 fitness. In other case, the bits are embed following the configuration depicted by Bit-Planes and BP-Dir.

Read the secret message is the same operation, but first we need to extract the secret raw bits with the Bit-Planes and BP-Dir and turn it to something readable with BP-Pole and SB-Dir.

Results

I used the Lenna image with a shape of 256x256 to embed different images of different shapes. The number of generations is set to 80, with a probability to mate and mute of 0.7 and 0.25.

In the below table are the best individuals obtained at the end of the algorithm, in order to create the secret message.

Shape	Capacity		PSNR (%)
	%	bpp (bits per pixel)	Baboon	Jet	Pepper
64x64	6.25	0.50	30.38	30.17	30.16
81x81	10.0	0.80	28.27	28.13	28.12
115x115	20.1	1.61	21.64	20.97	21.38
127x127	24.6	1.96	20.73	20.10	20.51
140x140	29.9	2.39	15.31	15.12	15.11
162x162	40.0	3.20	9.61	8.92	9.26
180x180	49.4	3.95	8.67	7.96	8.32

Logically the PSNR get higher values when the secret message is smaller. Therefore, as you can see, the stego images have worst quality when the embedded message is larger.

Future work

Genstego can be improved in several ways. First of all, there are 8 implemented scanning directions. Adding more directions will expand the search space for an optimal solution. Also, the secret image is always scanned in raster order and that can be changed.

There are some other mandatory improvements like, embed the chromosome into a placeholder inside of the host image. Otherwise, the receptor won't know how to read the secret image unless the chromosome will be given by other channel.

Also, Genstego only use gray-scale images as host and stego. But, it can be modified to use color images or hide different types of secret messages.

Footnotes

1. Kanan and B. Nazeri. (2004). A novel image steganography scheme with

high embedding capacity and tunable visual image quality based on a genetic algorithm.