My Journey to Cracking Steganography Mission 15 at HackThisSite
by Ivan Ivanov Petrov (Keeper)
FIRST EDITION
About the underground The website was founded by Jeremy Hammond in the late 2003. For a long time, it’s been a
subject to many different organizations trying to gain control over it and destroy the general community. In ˝ In
November 2004 the (now defunct) HackThisSite-based HowDark Security Group notified the phpBB
Group, makers of the t he phpBB bulletin software, of a serious vulnerability in the product. The v ulnerability was kept under wraps while it was brought to the attention of t he phpBB admins, who after reviewing, proceeded to downplay its risks. Unhappy Unhappy with the Groups' failure to take action, HowDark HowDark then published the bug on the bugtraq the bugtraq mailing-list. Malicious users found and exploited the vulnerability which led to the takedown of several phpBB-based bulletin boards and websites. Only then did the admins take notice and release a fix. Slowness to patch the vulnerability by end-users led to an implementation of the exploit the exploit in the Perl/Santy the Perl/Santy worm (read full full article) which article) which defaced upwards of 40,000 websites and bulletin boards within a few hours of its release. ˮ
- Wikipedia, the free f ree encyclopedia
The community is dedicated to facilitating an open learning environment by providing a series of hacking challenges, articles, resources, and discussion of the latest happenings in hacker culture. An online movement of artists, activists, hackers and anarchists who are organizing to create new worlds. Considering that several of the hacking challenges are simulated web defacements, the question of the ethics of hacking is repeatedly brought up. They consider hacking itself to be a tool, a skill which in itself is neutral, a means without end. It can be used for good (for the benefit of all) or bad (mindless destruction or theft). They do not encourage negative use of the information we provide. They are more concerned with the greater risks of not distributing this information and are ready to accept the consequences.
About Steganography Mission 15 Starting off from the very beginning, the mission originally had a fairly simple solution until there was a followed-up update of the entire challenge which altered the concept entirely. The mission drew attention due to the fact that many famous and not so well -known steganographers have tried to figure out the notion behind it but none has been able to so far. Ever since the year of 2008, the challenge has only been solved by eighteen people worldwide (whose origin is unknown up to now). Some state that few of those were the very administrators of the website whose hands get to know the answer to every submitted challenge on the board. Others are inclined to believe that the solvings are a result of extensive exhausted search attacks (a.k.a brute-force attacks). My involvement in this mission started back in 2012 when I first had the chance of getting introduced to steganogra steganography. phy. At first, I thought there wasn’t a nything special about it but soon after I took it on a higher level and was unable of solving it, I found out that it was an underground competition. Before we proceed with any further talk, let us bring out the foremost details that need be mentioned. Beginning with the image itself in the first place: The steganographied image
has a divided IDAT structure of 12 blocks (the last LSB slightly smaller) (.PNG). The data seems to have been concealed by altering the enhanced LSB values, eliminating the high-level bits for each pixel except for the last least significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. Initial analyzes on the image did not show anything in specific or rather odd beyond the utter lack of one value in any of the three color values (RGB) and the heightened presence of another value in one third of the color values. Studying these and replacing bytes has given me nothing, however, and I was at a loss as to whether this avenue is even worth pursuing at all.
Chi-square analysis (Java module)
Hence, I looked into developing a script in rather Python, PHP or C/C++ that would reverse the process and 'restore' the enhanced LSBs. Automating the process guarantees a higher percentage of success rate since a number of different analyses are being carried in a matter of seconds whereas it would take quite a while for a single person to conduct these experiments. Converting the image to a 24-bit .BMP and tracking down the red curve from a chi-square steganalysis, it is certain that there is a steganographied data within the file therefore nothing has been or will be at vain. First, there is a little bit more than 8 vertical zones. That means that the hidden data is a little bit more than 8kB in size. One pixel can be used to hide three bits (one in the LSB of each RGB color tone). So we can hide (98x225)x3 (98x225)x3 bits. bits. To get the number of kilobytes, we divide by 8 and ((98x225)x3)/(8x1024).. Well, that should be around 8.1 kilobytes. However, that ain't by 1024: ((98x225)x3)/(8x1024) the case here.
Chi-square analysis (Batch module)
The analysis of the APPO and APP1 markers of a .JPG extension of the file f ile also gave some awkward outputs: Start Offset Offset: : 0x00000000 *** Marker Marker: : SOI (xFFD8) *** OFFSET: 0x00000000 *** Marker Marker: : APP0 (xFFE0) *** OFFSET: 0x00000002 length = 16 identifier = [JFIF] version = [1.1]
density thumbnail
= 96 x 96 DPI (dots per inch) = 0 x 0
*** Marker Marker: : APP1 (xFFE1) *** OFFSET: 0x00000014 length = 58 Identifier = [Exif [Exif] ] Identifier TIFF Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ] Endian = Motorola Motorola (big) (big) TAG Mark x002A = x[002A] EXIF IFD0 @ Absolute Absolute x[00000026] x[00000026] Dir Length Length = = x[0003] [IFD0.x5110 [IFD0.x5111 [IFD0.x5112 Offset to Offset to Next Next IFD IFD = [00000000] *** Marker Marker: : DQT (xFFDB) *** Define a Define a Quantization Table Table. . OFFSET: 0x00000050 Table length Table length = 67 ---Precision=8 Precision =8 bits Destination ID=0 Destination ID=0 (Luminance (Luminance) ) DQT, Row #0: 2 1 1 2 DQT, Row #1: 1 1 2 2 DQT, Row #2: 2 2 2 3 DQT, Row #3: 2 2 3 3 DQT, Row #4: 2 3 4 7 DQT, Row #5: 3 4 7 8 DQT, Row #6: 6 8 9 10 DQT, Row #7: 9 11 11 12 Approx quality Approx quality factor = 94.02
] = ] = 0 ] = 0
3 5 6 7 3 7 7 7 5 7 8 7 6 10 10 7 8 13 12 9 10 12 14 11 12 15 14 12 13 12 12 12 (scaling=11.97 variance=1.37) variance=1.37)
Being nearly convinced that there is no encryption algorithm applied therefore no key implementation follows the concealment - my notion is that of coding a script that would shift the LSB values and return the originals. The file was run under several structure analyses, statistical attacks, BPCS and a few others.
The histogram of the image shows a specific color with an unusual spike to it. I manipulated that as best I can to try and view any hidden data, but to no avail. Those are the histograms of the RGB values as follows: Then there are the multiple IDAT chunks. I did put together a similar image by defining random color values at/for each pixel location, and I too wound up with several of these. Unfortunately, very little was found inside of them. Even more interesting is the way that color values are repeated in the image. It seems as though the frequency of reused colors could hold some clue. Yet did not fully understand that relationship, if any exists at all . Additionally, there is only a single column and a single row of pixels that do not possess a full value of 255 on their alpha channel. I even interpreted the X, Y, A, R, G, and B values of every pixel in the image as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs cannot tell us anything. There is no evident break. Here are several other histograms which show the weird curve of the blue value from the RGB:
The red curve shows some difference. It can see something that we cannot spot (yet). Statistical detection is more sensitive than our eyes, and I guess that was my final point. However, there is also a sort of latency in the red curve. Even without hidden data, it starts at maximum and stays like that for some time. It is close to a false positive state. Looks like the LSB in the image is very close to random, and the algorithm needs a large population (keep in mind that
the analysis was carried on a consistently incrementing population of pixels) until abutting upon a threshold where the choice was to be made m ade whether the red curve has to go down or up depending on the state of pixels (which are never randomized). The same sort of latency happens occurs in the occasion of hidden data. You hide 1kB or 2 kB of data, but the red curve does not pay attention to that and alters not its direction after this amount of data. It waits a little bit (and in our situation - respectively at around 1.3kB and 2.6kB. Here is a representation of the data types from a hex editor:
Here's another spectrum to confirm the behavior of the blue (RGB) value. Notice the sudden curve at the beginning. As mentioned above, there is no evident clue of the original values of the RGB alpha channel. They are either set to 255 or 0 depending on their Least Significant Bit. The other option that was in my mind at that moment was that the mission was intended to implement a protocol for the usage of quantum steganography. Matlab and a few other steganalysis techniques seem tempting but to a certain degree. The only steganalysis attack that can reveal whether there is anything concealed in terms of eLSB technique is the chi-square. As for Matlab, the tools it offers are of no great use since they are restricted to what the user supplies as information and we currently have none valid. In particular, I could easily reverse the process by pulling the least significant bit from every pixel
channel, group them into words of 8 bits and convert back to text. However, that is if I knew the key or variable used for the layer encryption. Protocols such as those for hiding quantum information in a codeword of a quantum errorcorrecting code passing through a channel are more likely to be the case. Meaning that I cannot (it is impossible to) eavesdrop simply with the power to monitor the channel, but without the secret key, cannot distinguish the message from the channel noise. In other words there must be something other besides this that is the case which I have yet not found. Also ‘noise’ would not only refer to the visual representation of the file. It could be related to a hex dump or whatnot - any unreadable/corrupted data as a whole. The idea here behind eLSB shifting is that each pixel is being replaced with a di fferent value and hence makes the image totally unrecognizable. It is called enhanced because we are elimi nating the high-level bits for every pixel except for the last LSB one and this is the case where we can most often evaluate the layer by looking into the structure of an image and following let us say an IDAT of 9 blocks, last LSB will be either smaller or equal to the previous bits (rarely equal in fact) which means that the previous ones have been altered and there's literally no room for the last LSB.
One of the few techniques that can be used to detect eLSB steganography (and actually actually differentiate it from quantums) is statistical analyses. The chi-square module represents the following data as shown below. The ˝ The
program will output a graph with two curves. The first one in red is the result of the chi-square
test. If it's close to one, than the probability for a random em bedded message is high. So, if there is a random message embedded, this green curve will stay around 0.5. On the graph network, every vertical blue line represents 1 kilobyte of embedded data.
ˮ
- Somnium, a.k.a. Guillermito
This is a sample representation of how the LSBs are being enhanced and set to either 255 or 0. Basically, the noise level depends on how much data we want to steganography and of course the size of the image, the color capacity etc.
Now let us say there is i s some sound or whatever audio file meddled. If we are good enough with steganography you could mix up both eLSB and audio rendering of an image and come up with an incredibly secure layer. Consider we have a fil e calledfuke.wav which is somewhat altered and has some data within it. One of the ways to check for anything specific or whatever is to put the file under a frequency f requency analysis and see whether there is something worth pursuing. First let's see a temporal analysis alongside a TFFT. Actually, the only difference between a FFT and a temporal analysis is that the TFFT studies both the ti me and frequency of the signal while the FFT one only the signal itself (in other words we need to define a spectrum in order to see the temporal frequencies).
If that does not suit us, we can use sox for Linux boxes and generate a similar spectrum. Note that sox works only with .wav files (which is pretty much the extension that most software worships). Now to output a spectrum we do the following:
Code:
We may have to use a converter like ffmpeg or similar to alter the extension if we have previously generated a different one than .wav. And so we end up with the following:
Similar to that spectrogram are the following. The f irst one of which is with a dBV^2 scale on a 1024-bit window at 85%+- and the second one a linear scale and a 2048-bit window at 90%+ with a log bin. Quite better visible as we can plainly see. Same would refer if we manage to scale the sox spectrogram and manipulate it as best as we can but I frankly do not think sox offers such possibilities. Frankly speaking there is a lot of software for embedding and extracting data but none is actually efficient when it comes to reversing the process. In this case the only possible way to reverse it will be to pull the least significant bit from every pixel channel, group them into words of 8 bits and convert back to text but that would only be possible if we had any clue on which pixels have been altered (which we do not possess as mentioned earlier). Matlab, however, is not the only possibility we are left with. There are numerous software distributions for that purpose though a lot of people who are capable or have been capable of reaching to this point will be experienced enough to code their own script for such purpose (even though being less optimized and functional). That being said, the ultimate mission remains a mystery which has l ead me to no avail. The avenues one could pursue throughout this challenge are literally more than an experienced steganographers can imagine.