next up previous
Next: Generalization Up: Attacking the algorithm Previous: Breaking the challenge

Going further

One can now argue that our attempt succeeded because the same pattern was used to mark both songs. First, we would like to point out that this is necessary because the detector needs to have this pattern in order to be able to work. If different patterns were used for every song, building a detector would be essentially impossible because it would have to test every possible patterns or recreate the pattern from the song. Recreating the pattern from the song, would require to solve either the problem of fuzzy hashing or of song classification, which are certainly both as hard as watermarking. Furthermore, even if song classification was realized, there would be a need to maintain a complicated database of all the existing songs together with their associated marks.

However, it is possible to use a small set of patterns, and it is also possible (and reasonable) that this pattern would not be the same in the version of the system deployed in real life.

We will now show that our attack still works. More precisely, we will show how to recover an unknown pattern from the marked song, without the original song.

Let us first assume, to simplify the exposition, that the mark starts at the first sample of the song. We use the same notation as in the rest of the article: $ s_i$ denote the $ i-th$ chunk of $ 1470$ samples of the unmarked song, and $ s'_i$ the corresponding chunks of the marked song, $ w=(w_1,\dots,w_{10})$ denotes the pattern (unknown here), and $ \beta $ denote the unknown function.

Let us also assume, again for simplicity, that the song $ s$ is exactly $ l$ chunks long. So, we have, for every $ i$ in $ \{1,\dots,l\}$, and every $ j$ in $ \{1,\dots,10\}$,

$\displaystyle s'_i[j] = s_i[j] + \beta(s,w,i,j) \vert\vert s_i[j]\vert\vert w[j]$ (3)

Let us divide by $ \vert\vert s'_i[j]\vert\vert$ and sum over $ i$. We will use the following notations:

We have, for every $ j$:

$\displaystyle S'[j] = S[j] + w[j] \sum_{i=1}^l \beta(s,w,i,j) \frac{\vert\vert s_i[j]\vert\vert}{\vert\vert s'_i[j]\vert\vert}$ (4)

The multiplicative term $ \beta(s,w,i,j) \frac{\vert\vert s_i[j]\vert\vert}{\vert\vert s'_i[j]\vert\vert}$ is not very problematic. First, it turned out that it was extremely close to one for every $ j$ (actually, it would almost disappear if we knew $ \beta $), second, is is not a real problem to recover the mark times a multiplicative constant. I would have been a problem if this sum happened to be very small, but that was not the case.

The more problematic term is $ S[j]$. We would like it to be small. However, it is very difficult to estimate the typical value of $ S[j]$. Naturally, if $ l$ is large enough, it should be very small. Having longer songs (the songs included in the challenge were only two minute long) would help. Also, if the same (now unkown) mark is used for several songs (which seems to be the case), we can actually perform the averaging on all the songs we can obtain, thus largely improving the chances for $ S[j]$ to be negligeable.

The problem is that the structure of the music plays an important part in the value of $ S[j]$. If, for example, a drum beat happens with a period which is synchronized with the period of the mark, $ S[j]$ might be very large on some specific points.

We have not had time to perform an analysis of the value of $ S[j]$ on a large number of songs, and we are not aware of a general statistical model for music. What we know, however, is that in the case of $ BM$, our technique works surprisingly well. It turned out that the average of the unmarked version of this song was especially small. We could recover the mark from $ BM$ (and from only $ BM$) with a very good precision. Figure 7 shows a part of the mark recovered from $ BM$ and the corresponding mark extracted from the difference $ D=AM-A$. Once we have recovered the mark, the rest of the attack works as previously.

Note that, especially when $ S[j]$ is not negligeable, it is possible to improve the precision on $ w$ by filtering $ S'[j]$ to attenuate $ S[j]$ in a very significant way. As a matter of fact, $ S[j]$ and $ w$ are very different in nature. $ S[j]$ is obtained by averaging the signal over periods of 147 samples and such a process is well known to be equivalent as low-pass filtering. On the contrary, the watermark $ w$ has the most important part of its information in the higher frequencies. Figure 8 illustrates very well this phenomenon on a different song than $ BM$. Therefore, applying a high-pass filter with an adequate cutoff frequency ($ 0.01$ in the case of figure 8) allows the extraction of $ w$ with a higher precision.

As a final note, let us recall that we had assumed, for simplicity, that the mark was starting at the first sample of the song. This was not the case in $ BM$. However, we simply needed to perform the averaging attack for every $ 1470$ possible starting positions.


next up previous
Next: Generalization Up: Attacking the algorithm Previous: Breaking the challenge
Julien Stern 2001-01-05