Introduction

In this post we will cover one of the most important transforms of digital processing: the The Discrete Fourier Transform (DFT). If you don't know what a transform is, I recommend to read the Introduction of my post about the z transform. If you don't want to read it, I'll shortly summarize it: the goal of most transforms is to apply some mathematical operation to data which then reveals some information we could not see from its original representation.
In the case of the Fourier transform, a discrete signal is transformed, allowing us to see which frequencies the signal comprises. This may sound insignificant, but it actually offers a whole world of new opportunities. It is the essential ingredient of math for many applications such as guitar tuners (in my next post I'll show how to program a guitar tuner using Python and the DFT), noise filtering, or x-ray image sharpening.
As the name may suggest, it is the discrete equivalent of the Fourier Transform. Many lectures and courses begin with teaching the "standard" Fourier Transform first and then go over to the discrete ones. However, for this post you don't need to know anything about the Fourier Transform. We will start at nearly 0% and then try to derive, understand, and use the DFT. The only prerequisite is some basic knowledge about linear algebra and complex numbers. If you know what a matrix is and that \(e^{i x} = cos(x) + i \cdot sin(x) \), then you are already good to go ;)

A different perspective

As previously mentioned, the DFT can help us to determine which frequencies a discrete signal is made of. To make things a little bit easier, we have to regard discrete signals from a new perspective. When imagining discrete signals, people (or at least I) often think about a bunch of values spread on a time axis: Simple discrete signal However, another way of representing signals is using vectors. So, if we have the signal \((2,1)\) the corresponding vector would look like this: 2d vector Whereby each dimension corresponds to a different value in time. If the signal has more values, we need to increase the number of dimensions. I have to admit that at a certain number of dimensions, it is hard imagine how a vector would look like (for me, this already the case at the humble number of 4 dimensions). But the general concept of thinking of signals as vectors is clear, I hope.

The reason why we want to regard signals as vectors, is because the DFT can be interpreted as a rotation of a signal/vector! Using this interpretation has some nice advantages. First, a rotation is something familiar from our everyday lifes. Second, using rotation matrices, it is quite simple to derive the inverse discrete fourier transform.

DFT spin

Rotations

The rotation of vectors can be expressed by multiplying them with a so-called rotation matrix. For example, rotating a 3-dimensional vector around the x-axis is achieved by the following matrix: $$ R_x = \begin{pmatrix} 1 & 0 & 0\\ 0 & cos(\varphi) & sin(-\varphi) \\ 0 & sin(\varphi) & cos(\varphi) \\ \end{pmatrix} $$ You can also combine multiple rotation matrices to achieve an arbitrary rotation. Using the sliders you can rotate the chciken box and corresponding vectors.
\(\varphi_x\) :
\(\varphi_y\) :
\(\varphi_z\) :
The resulting rotation matrix looks like follows:
$$R_{xyz} = \begin{pmatrix} 1.00 & 0.00 & 0.00\\ 0.00 & 1.00 & 0.00 \\ 0.00 & 0.00 & 1.00 \\ \end{pmatrix}$$

As the DFT is some kind of rotation, we need to explore the characteristics of rotation matrices as the next step. Because which mathematical properties does a rotation matrix have?
The answer is kind of intuitive: A rotation shall not change the length of a vector and the angle between two vectors has to stay the same. You can also see this in the example. No matter how you change the angles, the vectors of the chciken box keep their length and their relative angles. Expressing this mathematically will lead us to the following relationship: $$R^{-1} = R^T$$ This means that the inverse of the rotation matrix exists and it's simply the matrix transposed (this is really nice)! Because generally, inverting arbitrary matrices might require some fancy algorithms, and in some cases, an invertible matrix might not even exists. Transposing, in contrast, is basically one of the simplest things to do. Just mirror the elements at the diagonal, and you are done. Note, that this concept is not only valid for 3 or 2, but an arbitrary number of dimensions. If you want to see the derivation:

The statement of not changing lengths and angles can be expressed by the dot product. Assume we have two vectors \(v_1\) and \(v_2\). Their dot product results in: $$ \langle v_1 , v_2 \rangle = |v_1| \cdot |v_2| \cdot cos(\varphi) $$ Where \(|v|\) is the length of the vector and \(\varphi\) their relative angle. If we rotate a vector, the dot product should not change as the length and relative angle remain the same. So: $$ \langle Rv_1, Rv_2 \rangle = \langle v_1, v_2 \rangle$$ Using basic math we can reformulate this to: $$ \langle v_1, R^T Rv_2 \rangle = \langle v_1, v_2 \rangle$$ This means that multiplying the rotation matrix with itself but transposed has to result in the identity matrix: $$R^T R = I_1$$ But this is also the definition of the inverse matrix, so we can write: $$R^{-1}=R^T$$


From this property we can derive a further one: The rows of a rotation matrix are perpendicular/orthogonal. To be more exact: They are orthonormal (orthogonal and have a length of 1). Therefore their dot products are 0! Except you take the dot product of row with itself. The derivation can be found here:

We know that (with \(r_i\) being a row): $$ R R^T = \begin{pmatrix} & r_1 & \\ \hline & r_2 & \\ \hline & ... & \\ \end{pmatrix} \cdot \left( \begin{array}{c|c|c} & & \\ r_1^{T} & r_2^{T} & ...\\ & & \\ \end{array} \right) = \begin{pmatrix} 1 & 0 & ...\\ 0 & 1 & ...\\ ... & ... & ...\\ \end{pmatrix} = I_1 $$ This means that: $$ r_i \cdot r_j^T = \langle r_i, r_j \rangle = \begin{cases} 0 & i \neq j\\ 1 & i = j\\ \end{cases} $$ Different rows are orthogonal as the dot product between them is 0. Furthermore, as the length of each row is 1, they are also orthonormal.

The DFT

As a next step, we want to show that the DFT can be interpreted as a complex rotation. So, let's take a look at the definition of the DFT, which you can find in any textbook or online article: $$ X(n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$ Basically, we have our discrete signal \(x(k)\) with a length \(N\). This signal is multiplied with some expression and then summed up. The transformed signal is \(X(n)\) (this domain is also called frequency domain) and has the same number of elements as its time domain counterpart \(x(k)\). Note, that the common convention is to represent the frequency domain by capital letters. The nice thing about this formula is that we can also represent it by a matrix multiplication: $$ \begin{pmatrix} X(0) \\ X(1) \\ .. \end{pmatrix} = \begin{pmatrix} e^{-i\frac{2\pi}{N}(0 \cdot 0)} & e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\ e^{-i\frac{2\pi}{N}(0 \cdot 1)} & e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\ ... & ... & ...\\ \end{pmatrix} \cdot \begin{pmatrix} x(0) \\ x(1) \\ ... \end{pmatrix} $$ If that matrix is a rotation matrix, then it should also be easy to define the inverse transform. Since \(R^{-1} = R^T\) applies to rotation matrices, we can simply do the following trick: $$X = R \cdot x $$ $$R^T \cdot X = x$$ So if we are in the frequency domain, we multiply the transposed matrix to get back in the time domain!

To prove that the matrix is a rotation matrix, we will check if the rows are orthogonal using the dot product (see Derivation for orthogonality above to understand why we can use the dot product here): $$ \langle r_n, r_m \rangle = r_n \cdot r_m^T = \sum_{k=0}^{N-1} e^{-i\frac{2\pi}{N}km} \cdot e^{+i\frac{2\pi}{N}kn} = \sum_{k=0}^{N-1} e^{i\frac{2\pi}{N}k(n-m)} $$ Note, that if you transpose a complex vector, the values are complex conjugated (the sign of the imaginary part switches). Next, we have to distinguish between two cases.
First we have \(m \neq n\) (two different rows).
Using the following formula (this formula is also used for the derivation of the geometric series): $$ \sum_{k=0}^{N-1} x^k = \frac{x^N-1}{x-1} $$ We get: $$ \sum_{k=0}^{N-1} e^{i\frac{2\pi}{N}k(n-m)} = \frac{e^{i2\pi(n-m)}-1}{e^{i\frac{2\pi}{N}(n-m)}} = 0 $$ As the row dot product of two different rows is zero, we know that they are orthogonal. Nice!
Now we have to check the dot product of a row with itself (\(n=m\)) which can be interpreted as the squared length of the row vector: $$ \sum_{k=0}^{N-1} e^{0} = \sum_{k=0}^{N-1} 1 = N $$ As we can see, the squared length is not \(1\) but \(N\). This means that the length of the row vector is \(\sqrt{N}\) times larger than we need it to be for orthonormality (right now it is only orthogonal). We can deal with this by simply putting a normalization factor in front of the DFT: $$ X(n)= \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$

The length of a row vector is now \(1\) using the normalization factor. In literature this is sometimes called the Normalized DFT (NDFT). While this seems more convenient from a geometrical point of view, the version without a scaling factor is the standard DFT for digital signal processing.
Why?
According to this site the main reason for omitting the scaling factor is to save computations when calculating the DFT. This makes perfect sense, as digital processing is often about optimizing certain mathematical algorithms. We will use the DFT without the scaling since following conventions is always a good thing to do. Furthermore, having only an orthogonal but not orthonormal matrix does not really have to bother us, as the next section will show.

The inverse DFT

In this section we will derive the inverse DFT. In many applications you want to transform your signal to the frequency domain, do some processing, and then transform it back to the time domain. This does, of course, require an inverse transformation.
As already explained above, we can simply take the inverse matrix of the DFT to get back to the time domain. Using the non-normalized DFT (which is the default DFT) the lack of orthonormality first looks like an issue because we cannot simply use \(R^T=R^{-1}\). But we can use a little trick and factor out a \(\sqrt{N}\) and make it normal. Starting with the DFT as a matrix representation: $$ \begin{pmatrix} X(0) \\ X(1) \\ .. \end{pmatrix} = \begin{pmatrix} e^{-i\frac{2\pi}{N}(0 \cdot 0)} & e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\ e^{-i\frac{2\pi}{N}(0 \cdot 1)} & e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\ ... & ... & ...\\ \end{pmatrix} \cdot \begin{pmatrix} x(0) \\ x(1) \\ ... \end{pmatrix} $$ We can factor out a \(\sqrt{N}\): $$ \begin{pmatrix} X(0) \\ X(1) \\ .. \end{pmatrix} = \sqrt{N} \begin{pmatrix} \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 0)} & \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\ \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 1)} & \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\ ... & ... & ...\\ \end{pmatrix} \cdot \begin{pmatrix} x(0) \\ x(1) \\ ... \end{pmatrix} $$ Now the matrix is orthonormal, so we can use the relation \(R^T = R^{-1}\) to obtain: $$ \frac{1}{N} \begin{pmatrix} e^{+i\frac{2\pi}{N}(0 \cdot 0)} & e^{+i\frac{2\pi}{N}(1 \cdot 0)} & ...\\ e^{+i\frac{2\pi}{N}(1 \cdot 0)} & e^{+i\frac{2\pi}{N}(1 \cdot 1)} & ...\\ ... & ... & ...\\ \end{pmatrix} \cdot \begin{pmatrix} X(0) \\ X(1) \\ .. \end{pmatrix} = \begin{pmatrix} x(0) \\ x(1) \\ ... \end{pmatrix} $$ This leads to the following definition of the inverse DFT: $$ x(k)= \frac{1}{N} \sum_{n=0}^{N-1} X(n) \cdot e^{+i\frac{2\pi}{N}kn} $$ This definition of the inverse DFT is actually quite similar to the DFT. We just have to change the sign of the exponent and divide by \(N\).

At this point, we covered the definition of the DFT, how it can be interpreted as an n-dimensional complex rotation , and how the inverse DFT can be derived using this interpretation. But we haven't covered yet what we actually gain from the DFT. Because why do we actually do this?
The next section will hopefully answer this question as we explore why the DFT lets us "see" frequencies in signals.

The DFT as a basis

In this section, we show that the complex exponential function as it is used by the DFT can be regarded as a basis. This means we can reconstruct any signal by adding and scaling the complex exponential function. It is therefore some kind of mathematical atom or lego brick. Just by combining them in the right way, we can build arbitrary structures. To get an understanding of this, we will now look at the inverse DFT and its components and rearrange it as follows: $$ \frac{1}{N} \begin{pmatrix} \color{red}{e^{+i\frac{2\pi}{N}(0 \cdot 0)}} & \color{blue}{e^{+i\frac{2\pi}{N}(1 \cdot 0)}} & ...\\ \color{red}{e^{+i\frac{2\pi}{N}(1 \cdot 0)}} & \color{blue}{e^{+i\frac{2\pi}{N}(1 \cdot 1)}} & ...\\ \color{red}{...} & \color{blue}{...} & ...\\ \end{pmatrix} \cdot \begin{pmatrix} \color{red}{X(0)} \\ \color{blue}{X(1)} \\ .. \end{pmatrix} $$ $$ = \frac{1}{N} \cdot \color{red}{ X(0) \cdot \begin{pmatrix} e^{+i\frac{2\pi}{N}(0 \cdot 0)} \\ e^{+i\frac{2\pi}{N}(1 \cdot 0)} \\ ... \end{pmatrix}} + \frac{1}{N} \cdot \color{blue}{ X(1) \cdot \begin{pmatrix} e^{+i\frac{2\pi}{N}(1 \cdot 0)} \\ e^{+i\frac{2\pi}{N}(1 \cdot 1)} \\ ... \end{pmatrix}} + ... = \begin{pmatrix} x(0) \\ x(1) \\ ... \end{pmatrix} $$ As you can see, the DFT coefficients \(X(0), X(1), ... \) basically scale the vectors, which are the columns of the inverse DFT matrix.
So basically, these columns are the mathematical lego bricks and the DFT coefficients are the instruction manual telling us how to combine them to obtain the time discrete signal. It's really crucial to keep this in mind as this is one of the key points of the DFT.

In the following, we look at mathematical lego bricks more in detail as this will reveal some interesting secrets. Let's assume we have a DFT of size 4 as an example. Then we would get the following matrix: $$ \begin{pmatrix} e^{+i\frac{2\pi}{4}(0 \cdot 0)} & e^{+i\frac{2\pi}{4}(1 \cdot 0)} & e^{+i\frac{2\pi}{4}(2 \cdot 0)} & e^{+i\frac{2\pi}{4}(3 \cdot 0)} \\ e^{+i\frac{2\pi}{4}(0 \cdot 1)} & e^{+i\frac{2\pi}{4}(1 \cdot 1)} & e^{+i\frac{2\pi}{4}(2 \cdot 1)} & e^{+i\frac{2\pi}{4}(3 \cdot 1)} \\ e^{+i\frac{2\pi}{4}(0 \cdot 2)} & e^{+i\frac{2\pi}{4}(1 \cdot 2)} & e^{+i\frac{2\pi}{4}(2 \cdot 2)} & e^{+i\frac{2\pi}{4}(3 \cdot 2)} \\ e^{+i\frac{2\pi}{4}(0 \cdot 3)} & e^{+i\frac{2\pi}{4}(1 \cdot 3)} & e^{+i\frac{2\pi}{4}(2 \cdot 3)} & e^{+i\frac{2\pi}{4}(3 \cdot 3)} \\ \end{pmatrix} = \begin{pmatrix} \color{#338cff}{1} & \color{#3d72b8}{1} & \color{#3c5f8c}{1} & \color{#374f6e}{1} \\ \color{#338cff}{1} & \color{#3d72b8}{i} & \color{#3c5f8c}{-1} & \color{#374f6e}{-i} \\ \color{#338cff}{1} & \color{#3d72b8}{-1} & \color{#3c5f8c}{1} & \color{#374f6e}{-1} \\ \color{#338cff}{1} & \color{#3d72b8}{-i} & \color{#3c5f8c}{-1} & \color{#374f6e}{j} \\ \end{pmatrix} $$ If we depict the real part of each column as a function, we get the following graph (remember: \(Re\{ e^{i \frac{2\pi}{N}kn} \} = cos( \frac{2\pi}{N}kn ) \) ): DFT cosine decomposition Note, that \(k\) is used as a reel (non-discrete) number in the graph to emphasize the underlying function. This means that every signal includes some cosine waves oscillating at different frequencies (and a DC component which is the straight line and basically a cosine with 0Hz). Of course there is also an imaginary part \(j \, sin(2\pi k)\), but in many cases this can be omitted as will be shown later. As the DFT coefficients scale these cosines, we can tell which frequencies are apparent in a signal simply by looking at the DFT coefficients! This is why the DFT lets us see frequencies in Signals!

Since nothing explains things better than examples, we will now take a look at some example signals and their corresponding DFTs.

Example #1: Symmetry of Coefficients

Let's start with some really basic sampled cosine signal of length 8: Simple discrete signal $$x(k)=(2, 1+2^{-0.5}, 1, 1-2^{-0.5}, 0, 1-2^{-0.5}, 1, 1+2^{-0.5})$$ $$x(k)=1+cos(\frac{2\pi}{8}k) $$ If we now calculate the DFT coefficients, we get the following DFT coefficients: DFT of the discrete signal The basis functions, which are scaled by DFT coefficients, are referenced by the links. We can directly confirm the DC component, an oscillation with the frequency \( \frac{2\pi}{8} k \), and one oscillation with the frequency \(\frac{7 \cdot 2\pi}{8} k\).
Wait, one with a frequency of \(\frac{7 \cdot 2\pi}{8} k\)?!
Shouldn't there be only one oscillation at \( \frac{2\pi}{8} k \)? The short answer is: There are in fact two complex oscillations, but the imaginary parts cancel, resulting only in one reel oscillation. This is also a pretty important thing to know, so let's dive a little deeper. If we'd reconstruct the signal from the DFT coefficients using the inverse DFT, we get the following expression: $$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i \frac{2\pi}{8} k} + \frac{1}{8} \cdot 4e^{i\frac{7 \cdot 2\pi}{8} k} $$ We can reformulate this to: $$x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i\frac{2\pi}{8} k} + \frac{1}{8} \cdot 4e^{-i\frac{2\pi}{8} k} $$ $$x(k) = \frac{1}{8} \cdot 8 + \frac{8}{8} \cdot cos(\frac{2\pi}{8} k)$$ As you can see, the two complex oscillations merge into a real oscillation as the imaginary parts are canceled out. This is also quite intuitive if you think about it. Because how could a reel signal be composed of complex basis functions if the imaginary parts did not cancel out?

As a next step, let us look at the mathematical reasons why the imaginary party cancels out.
First, the columns of the inverse DFT matrix are complex conjugated. So, for every column there is another column where the exponent has a switched sign.
Second, the DFT coefficients are complex conjugated for real signals such that \(X(n)=X^{*}(N-n)\). You can also see this in the example where we have two "4"s. There is no way of having a 4 and 3. Note, that this is only the case for real signals but not complex signals. However, in most cases, our signals are real, and for this whole chapter we will only assume real signals in the time domain. Only for some really fancy stuff you would use complex signals in the time domain. These two reasons lead to many pairs such as (assuming the DFT coefficients also to be reel): $$X(n) \cdot e^{i\frac{i2\pi}{N}nk} + X(N-n) \cdot e^{i\frac{2\pi}{N}(N-n)k}$$ $$=X(n) \cdot e^{i\frac{i2\pi}{N}nk} + X^{*}(n) \cdot e^{-i\frac{2\pi}{N}nk}$$ $$=2 X(n) \cdot cos(\frac{i2\pi}{N}nk)$$

At this point, let us summarize what we learned from this example. Basically that are two important things. First, only one half of the DFT coefficients provide us with information for reel signals. The other half can be reconstructed by complex conjugation.
Second, for real signals the imaginary parts of the complex exponential function cancel out. Thus, every reel signal can be reconstructed by using cosine waves!

First, the proof that the DFT coefficients are complex conjugated $$ X(N-n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}k(N-n)} \\ =\sum_{k=0}^{N-1} x(k) \cdot e^{+i\frac{2\pi}{N}kn} \cdot e^{-i\frac{2\pi}{N}kN} \\ =\sum_{k=0}^{N-1} x(k) \cdot e^{+i\frac{2\pi}{N}kn} \\ =X^{*}(n) $$ Note, that the last step is only valid for reel \(x(k)\).
Using the same approach, we can also show that the columns of the inverse DFT matrix are complex conjugated. Because in the proof above, we also showed that the rows of the DFT matrix are complex conjugated. As the inverse DFT matrix is just a transposed DFT matrix, the columns must be complex conjugated!

Example #2: Complex DFT coefficients

So far, we only considered real DFT coefficients. However, they can also be complex (even for real-time domain signals). But what is the meaning of \(1+i\) as a DFT coefficient?
To understand this, we will take a look at another example signal which looks as follows: Simple discrete signal This is basically the same signal as in the previous example, just a little bit shifted in time: $$x(k)=1+cos(\frac{2\pi}{8}k + \frac{\pi}{4})$$ The corresponding DFT coefficients look like this: $$X(n) = (8, \sqrt{8} + i \sqrt{8}, 0, 0, 0, 0, 0, \sqrt{8} - i \sqrt{8})$$ The DFT coefficients are also pretty similar to the previous example, but now the corresponding values are complex! From this example we can conclude the intuitive assumption that the "complexity" party of the DFT coefficients is somehow related to how cosine waves have to be shifted in the time domain. In the following part of this example, we will explore how this works mathematically.

So, I guess most of you know this, but complex numbers can be represented in a cartesian form or in a polar form (thanks to Euler): Cartesian and polar representation of a complex number Both forms are actually mathematically equal and can be transformed into each other. For example, the DFT coefficients of this example can also be written as: $$X(n) = (8, 4 e^{i\frac{\pi}{4}} , 0, 0, 0, 0, 0, 4e^{-i\frac{\pi}{4}} )$$ The advantage of the polar representation is that we can directly see the magnitude part \(r\) (the length of the vector) and the phase part (the angle of the vector). Furthermore, from the polar representation we can easily derive that multiplying two complex numbers will result in their phases adding: $$ r_1 \cdot e^{i\varphi_1} \cdot r_2 \cdot e^{i\varphi_2} = r_1 \cdot r_2 \cdot e^{i (\varphi_1 + \varphi_2) }$$ This property of adding phases will be pretty important in the following.
So let's assume that we have our DFT coefficients in a polar form and we use the inverse DFT to reconstruct a signal in the time domain. For the given example this will be: $$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i\frac{\pi}{4}} \cdot e^{i \frac{2\pi}{8} k} + \frac{1}{8} \cdot 4e^{-i\frac{\pi}{4}} \cdot e^{-i\frac{2\pi}{8} k} $$ $$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4 \cdot e^{i \frac{2\pi}{8} k + \frac{\pi}{4}} + \frac{1}{8} \cdot 4 \cdot e^{-i\frac{2\pi}{8} k - \frac{\pi}{4}} $$ $$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 8 \cdot cos(\frac{2\pi}{8} k + \frac{\pi}{4}) $$ As you can see, the phase of the DFT coefficient first moves into the complex exponential function and can finally be found as a time shift of the cosine. While the magnitude of the DFT coefficients tells us how to scale the cosine waves. These two statements are pretty important, so make sure to understand and remember them. To emphasize their importance, read them again in bold letters ;)

The magnitude of DFT coefficient tells us how to scale the corresponding frequency.

The phase of a DFT coefficient tells us how to shift the corresponding frequency

Applications: The Spectrum

At this point, we covered most of the essential theoretical aspects of the DFT. Of course there are still some (important) topics left, but let's leave them for the future post and head over to some application. Because what is the purpose of all this stuff if we don't apply it?
Whenever you obtain some results, displaying them in a meaningful way is at least as important as the results themselves. A great way of displaying DFT coeffcients are so called spectrums. These spectrums can be further classified as magnitude spectrums or phase spectrums.

A magnitude spectrum can be created by plotting the absolute values of the DFT coefficients \(|X(n)|\) on a y-axis and the coefficient indices \(n\) on an x-axis. An example can look like this magnitude spectrum: Playing an 'a' on a guitar for 1s It is the spectrum of a 1 second sound where I played a note on my guitar. Note, that as most signals are real, only one half the DFT coefficients are often displayed. Furthermore, it is more common to plot the frequencies affected by the DFT coefficients instead of the coefficient indices on the x-axis. You find the affected frequencies by multiplying \(n\) with \(f_s/N\) where \(f_s\) is the sampling frequency and N is the number of samples. In my guitar example, the sound was sampled 44100 times per second resulting in \(f_s = 44100\). As the signal is 1 second long, the number of samples is 44100. This results in 22050 independent DFT coefficients giving us frequencies from 0Hz to 22049Hz in 1Hz steps. The cool thing about magnitude spectrums is that we only use the absolute value of the DFT coefficients. Thus, we can easily see which frequencies are apparent in a time discrete signal! If we zoom into the guitar example, we can see that I played the tone "A" at 440Hz: Playing an 'a' on a guitar for 1s zoomed in Just by looking at the spectrum, you can also tell that I probably played a string instrument as there are many overtones at multiples of 440Hz. But things get even more interesting: You can tell that I don't live in the United States or Canada and that I own shitty equipment just by looking at the spectrum! If you zoom even further, you can see a peak at 50Hz: 50Hz You can see this peak thanks to the german 50HZ AC network, which induces a 50Hz noise signal called mains hum. If I'd live in the US or Canada, this peak would probably be at 60Hz.
And things get even crazier. Just by using the spectrum you cannot only tell where I was but also when! As I surfed through the endless realms of the Internet I found a Wikipedia article about electrical network frequency analysis, which cited this newspaper article. According to the article, the german AC network frequency varies between 49.95Hz and 50.05Hz, which obviously also affects the frequency of the main hum. The trick is to have a quite long audio recording and to estimate the varying mains hum frequency over time. (FYI: A minimum of 100s is needed for a frequency resolution of 0.01Hz. This can be calculated by using \(f_s/N\) as stated above.) By comparing the noise signal with a network frequency database, you may obtain when the audio signal was recorded. This is why the bavarian police records the network frequency 24/7 as of 2010.
Of course you have to create multiple spectrums over time and see how the frequency changes. So, if you have a 10-minute audio recording, you may take several 100s intervals and determine their spectrum. When doing this we rather refer to a short-time Fourier Transform (STFT) and as our spectrum varies over time we rather call it a spectrogram.

You may want to create some spectrograms/spectrums yourself, so take a look a this little Javascript application I wrote:


If you click on the "Start" button, a DFT is applied on the audio signal captured from your device's microphone. Note, that everything is processed locally on your device and no data is sent to some fishy servers (I'm cooler than google, lol). The DFT coefficients are displayed as a magnitude spectrum giving you information about how much of each frequency can be found in the signal. Try to sing, or play some tones if you have an instrument. You will see peaks in the spectrum for the corresponding pitches, which may include some overtones. This does not work for mayonnaise as it is not an instrument...

As mentioned above, there are also so-called phase spectrums. To create a phase spectrum, we use the same principles for the x-axis as explained above. But the y-axis now represents the phase of the DFT coefficients \(arg\{X(n)\}\). However, with the topics covered in this article, I could not find a good use case for it :( If you know some applications, please don't hesitate to contact me.

Summary

Congrats, you made it that far :)
You should now have a basic knowledge of how the DFT works and how it can be applied. Yet there is still a lot of things to discover about the DFT which will be covered in my next posts. But for now let us summarize the most important aspects of this post:
  • The Discrete Fourier Transform (DFT) is the discrete counterpart of the Fourier Transform and is described by: $$ X(n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$
  • The DFT can be interpreted as complex rotation of the time-discrete signal, thus the inverse transform can be derived as: $$ x(k)= \frac{1}{N} \sum_{n=0}^{N-1} X(n) \cdot e^{+i\frac{2\pi}{N}kn} $$
  • Every time-discrete signal can be reconstructed using shifted and scaled complex oscillations.
  • The DFT coefficients tell us how these oscillations have to be scaled (magnitude of the coefficients) and shifted (phase of the coeffcients).
  • In the common case of real-time discrete data, only one half of the DFT coefficients provide us with useful information as the other half is the complex-conjugated counterpart.
  • For real signals the complex oscillations break down to simple cosine waves.
  • By using magnitude spectrums, we can visualize the DFT coeffcients in a meaningful way, telling us how much of each frequency is in a time discrete signal at first glance.
  • There are millions of applications for the DFT, reaching from guitar tuning to criminal investigations.
Please let me know if there are any mistakes or how this Introduction can be improved. You can do this by writing an e-mail to me (see About).