I don't really know what should someone have to say as the first words on a blog. So… I think I'm going to say a simple: 'Hello there!', and write about this project I made on February as a final assignment for the Biologically inspired computation subject of my master degree. The project is made with Python and is inspired on 1. If you are interested, you can find it on GitHub.
For those that don't know what steganography means, I'm going to try to summarize what I know.
Modern steganography is the practice of concealing digital information: such as messages, audio or images, within another digital information. The main objective is sending concealed messages in such way the fact of sending data is disguised.
This implies, that the data can't be discovered by the bare eye. And only the interlocutors know how to look the secret message.
Genstego is an image steganography package based on deap. It is able to embed/decode secret images into/from a host image. The main idea of genstego, is to model steganography as a search and optimization problem using genetic algorithms. The genetic algorithm allows finding an optimal stochastic solution. Therefore, making the steganalysis more complex.
If you don't know anything about image steganography, don't worry about what I said on the last paragraph and read the short introduction below.
Basic image steganography
In order to conceal an image inside another image we need to know how data is represented. In our case we are talking about media. And, How digital media are represented? I think you guess it: bits.
So, we need to alter the host image bits and embed inside, the bits of the secret image, in such way the host image seems unaltered.
Images are represented as a matrix of pixels within values between 0-255. That's equivalent to 8 bits. One of the most basic techniques of image steganography consists on altering the LSB (Least Significant Bit) of each pixel of the host image changing it with the sequence of bits of the secret image. That means that we need 8 pixels of the host in order to conceal 1 secret pixel. So, in order to conceal a 16x16 image you need at least a 46x46 host image. Finally, if you want to read the secret image you just need to look at the LSB bits of the host pixels and reconstruct the secret image.
However, this isn't always the best way to embed the secret image. The reason is that someone with a little bit of knowledge about this technique can easily decode our secret message. Moreover, using only the LSB we are wasting a lot of space in our host image that can be used by our secret image. That's why modeling steganography as a search and optimization problem makes sense because we can find an optimal way to embed the secret message.
Steganography as a search and optimization problem
I used a genetic algorithm in order to achieve this. This has some advantages. You can obtain different optimal solutions on the last generations. And the cool thing is that the results are stochastic, so it doesn't rely on a specific method to embed the secret image. On the other hand, genetic algorithms are iterative and converge too slow.
In order to run the genetic algorithm we need to define how a chromosome is and what means each gene. Each chromosome will depict a different way to encode and decode the secret image. Also we need a fitness function that measure how good a chromosome is. I chose the PSNR function used to measure the quality of compressed images.
Genetic codification
The following chromosome is composed of 7 genes as you can see in the next table.
Gene name | Range | Length | Description |
---|---|---|---|
Direction | 0-7 | 3 Bits | Direction of host image pixel scanning |
X-offset | 0-255 | 8 Bits | x-offset of the starting point |
Y-offset | 0-255 | 8 Bits | y-offset of the starting point |
Bit-Planes | 0-15 | 4 Bits | Used LSBs for bit insertion |
SB-Pole | 0-1 | 1 Bit | Secret bit inversion |
SB-Dire | 0-1 | 1 Bit | Secret bit direction |
BP-Dire | 0-1 | 1 Bit | Bit planes direction |
The full chromosome has a length of 26 bits. I'm going to explain in detail what means each gene.
Direction
Represents the direction in which the image is scanned in order to embed and decode the secret bits.
X-offset and Y-offset
Represents the column and row where the scan starts. Has a length of a Byte because I've used host images of 256x256 width and height. But can be tweaked easily.
Bit-Planes
Has 4 bits and depict which bits of the host image are used to embed the secret
bits. For example if Bit-Planes are 0001
we use the LSB technique. If
there are 0101
the third and less significative bit are used.
SB-Pole
If the value is 1
, the one's compliment is applied to the secret bit
sequence. For example the sequence 1001
should be changed to 0110
.
SB-Dire
If the value is 1
the sequence is inverted. For example
the 0001
should be changed to 1000
.
BP-Dire
When the value is 1
the selected Bit-Planes are embedded in the 4 LSB of
the pixel. When is 0
, the 4 MSB (Most Significant Bits) are used.
Fitness: PSNR function
The fitness function is the Peak Signal to Noise Ratio. Is measured by Decibels (dB) and makes us able to measure the quality of the different stego images. It relates \(MAX\) the maximum energy representation of a signal, in our case 255 for a pixel, with the noise introduced in the host image, measured by the \(MSE\) mean squared error.
\[ PSNR = 10 \cdot log_{10}(\frac{MAX_i^2}{MSE}) \]
\[ MSE = \frac{1}{MN} \sum_{i=0}^{M-1}\sum_{j=0}^{N-1}||I(k,j) - K(i,j)||^2 \]
In conclusion, we want to maximize this function. Thus higher values depict a less noisy representation of the host image.
Aspects of the genetic algorithm
The algorithm is so simple an I'm going to shortly summarize some of the main details.
Firstly, the individuals are randomly initialised and evaluated. Then, a new
population of individuals are selected by tournament. Next, the individuals are
mate executing a two-point crossover. And finally, the new individuals mute
with a probability by flipping random genes of their chromosomes applying the not
operand.
As you can see below, the algorithm is very simple and you can find it here on deap.
1evaluate(population)
2for g in range(ngen):
3 population = select(population, len(population))
4 offspring = varAnd(population, toolbox, cxpb, mutpb)
5 evaluate(offspring)
6 population = offspring
Embedding and reading a secret message
In order to embed the message using one of the individuals, host and secret
images, are flatten into a sequence of bits. For the host image, the sequence
is flatten using the correspondent Direction
and starting from x
and y
offset.
Then the secret raw bits are changed depending on BP-Pole
and SB-Dir
. At
this point, if the sequence of secret bits is larger than the available bits in
the host image, the individual is set a 0 fitness. In other case, the bits are
embed following the configuration depicted by Bit-Planes
and BP-Dir
.
Read the secret message is the same operation, but first we need to extract the
secret raw bits with the Bit-Planes
and BP-Dir
and turn it to something
readable with BP-Pole
and SB-Dir
.
Results
I used the Lenna
image with a shape of 256x256 to embed different images
of different shapes. The number of generations is set to 80
, with a
probability to mate and mute of 0.7
and 0.25
.
In the below table are the best individuals obtained at the end of the algorithm, in order to create the secret message.
Shape | Capacity | PSNR (%) | |||
---|---|---|---|---|---|
% | bpp (bits per pixel) | Baboon | Jet | Pepper | |
64x64 | 6.25 | 0.50 | 30.38 | 30.17 | 30.16 |
81x81 | 10.0 | 0.80 | 28.27 | 28.13 | 28.12 |
115x115 | 20.1 | 1.61 | 21.64 | 20.97 | 21.38 |
127x127 | 24.6 | 1.96 | 20.73 | 20.10 | 20.51 |
140x140 | 29.9 | 2.39 | 15.31 | 15.12 | 15.11 |
162x162 | 40.0 | 3.20 | 9.61 | 8.92 | 9.26 |
180x180 | 49.4 | 3.95 | 8.67 | 7.96 | 8.32 |
Logically the PSNR get higher values when the secret message is smaller. Therefore, as you can see, the stego images have worst quality when the embedded message is larger.
Future work
Genstego can be improved in several ways. First of all, there are 8 implemented scanning directions. Adding more directions will expand the search space for an optimal solution. Also, the secret image is always scanned in raster order and that can be changed.
There are some other mandatory improvements like, embed the chromosome into a placeholder inside of the host image. Otherwise, the receptor won't know how to read the secret image unless the chromosome will be given by other channel.
Also, Genstego only use gray-scale images as host and stego. But, it can be modified to use color images or hide different types of secret messages.
Footnotes
-
-
Kanan and B. Nazeri. (2004). A novel image steganography scheme with
-
high embedding capacity and tunable visual image quality based on a genetic algorithm.