Jekyll2020-08-23T18:08:52+02:00http://localhost:4000/feed.xmlchcikenThis my website :)A Minimal GDB Cheat Sheet2020-08-23T11:55:44+02:002020-08-23T11:55:44+02:00http://localhost:4000/cheat_sheet/2020/08/23/gdb-cheat-sheet<style type="text/css">
.center {
border: 1px solid grey;
}
</style>
<head><link rel="shortcut icon" href="/assets/favicon.ico"></head>
<h2>A Minimal GDB Cheat Sheet</h2>
<p>
I guess the title is pretty self-explanatory ;)
The most important features of this cheat sheet are:
</p>
<p>
<ul>
<li>The most popular <a href="https://stackoverflow.com/search?q=gdb" target="_blank">Stack Overflow</a>
questions are included (marked by the Stack Overflow logo)</li>
<li>It fits on only one page</li>
<li>It looks dope (inspired by <a href="https://www.vim.org/" target="_blank">vim</a>
and <a href="https://github.com/morhetz/gruvbox" target="_blank">gruvbox</a>)</li>
<li>Available in two themes (dark theme and a printer friendly light theme) </li>
</ul>
</p>
<h2>Dark Theme</h2>
The pdf version can be downloaded <a href="/assets/gdb_cheat_sheet/gdb_cheat_sheet_dark.pdf" target="_blank">here</a>.
<p>
<img class="center" width="90%" src="/assets/gdb_cheat_sheet/gdb_cheat_sheet_dark.svg">
</p>
<h2>Light Theme (printer friendly)</h2>
The pdf version can be downloaded <a href="/assets/gdb_cheat_sheet/gdb_cheat_sheet_light.pdf" target="_blank">here</a>.
<p>
<img class="center" width="90%" src="/assets/gdb_cheat_sheet/gdb_cheat_sheet_light.svg">
</p>AMAIX: A Generic Analytical Model for Deep Learning Accelerators2020-07-27T11:55:44+02:002020-07-27T11:55:44+02:00http://localhost:4000/papers/2020/07/27/my-first-paper<style type="text/css">
/* Tooltip container */
.tooltip {
position: relative;
display: inline-block;
}
/* Tooltip text */
.tooltip .tooltiptext {
visibility: hidden;
width: 300px;
background-color: grey;
color: #fff;
text-align: center;
padding: 10px;
border-radius: 6px;
position: absolute;
z-index: 1;
}
/* Show the tooltip text when you mouse over the tooltip container */
.tooltip:hover .tooltiptext {
visibility: visible;
}
.left-align{
text-align: left!important;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
margin-bottom: 2%;
margin-top: 2%;
}
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
margin-top: 20px;
}
.active, .accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
display: none;
background-color: #ccc;
overflow: hidden;
}
.slidecontainer {
text-align: center;
margin-top: 20px;
margin-left: auto;
margin-right: auto;
width: 80%;
}
.slider {
-webkit-appearance: none;
width: 50%;
height: 10px;
border-radius: 5px;
background: #d3d3d3;
outline: none;
opacity: 0.7;
-webkit-transition: .2s;
transition: opacity .2s;
}
.slider:hover {
opacity: 1;
}
.slider::-moz-range-thumb {
width: 25px;
height: 25px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
.slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 15px;
height: 15px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
#scene3d {
width: 300px;
height: 300px;
margin-left: auto;
margin-right: auto;
}
canvas {
width: 80%;
display: block;
}
li{
margin: 12px 0;
}
monospace {
font-family:'Lucida Console', monospace
}
</style>
<head><link rel="shortcut icon" href="/assets/favicon.ico"></head>
<!-- He/she may not
use the Publisher’s PDF version, which is posted on the Publisher’s platforms, for the purpose of self-archiving or
deposit. Furthermore, Author may only post his/her own version, provided acknowledgment is given to the original
source of publication and a link is inserted to the published article on the Publisher’s website. The link must be
provided by inserting the DOI number of the article in the following sentence: “The final authenticated version is
available online at https://doi.org/[insert DOI].” The DOI (Digital Object Identifier) can be found at the bottom of the
first page of the published paper. -->
<h2>Introduction</h2>
<p>
Heyho! Here you can download my fist paper <a href="/assets/analytical_model_for_dlas_paper.pdf" download>AMAIX: A Generic Analytical Model for Deep Learning Accelerators</a>
free of charge.
We successfully submitted this paper to the <a href="https://samos-conference.com/wp/" target="_blank"> SAMOS XX conference</a> where it won the best paper award.
I guess it will soon be published in the <a href="https://www.springer.com/gp/computer-science/lncs">Springer Lecture Notes in Computer Science (LNCS)</a>.
</p>
<p>
In next section I'll be shortly describing what we did in this paper using the kind of language I prefer (not that super duper fancy paper-language).
But of course you are also invited to read the paper ;)
</p>
<h2>Just a short description</h2>
<p>
As with many technical products, you want to know as early as possible in the development process how good the product you are developing actually is.
This is especially true when there are many competitors and the market is therefore highly competitive.
Such a situation can be found with so-called <b>deep learning accelerators (DLAs)</b> (hardware for the acceleration of AI applications).
This is favoured by the fact that an extreme growth is predicted for this market.
On their AI day in 2019, Qualcomm said that they expect a x10 growth in revenue from 1.8 billion american USD dollars in 2018 to 17 billion dollars in 2025.
And this is just for AI accelerators in data centers. <br>
Thus, many large tech-giants (actually all of them), but also small start-ups, are trying to establish a dominant position as early as possible.
If you don't believe it: here's a <a href="https://github.com/basicmi/AI-Chip" target="_blank">list</a> with companies currently trying to engage in this market.
</p>
<p>
So, long story short: if you want to be succesful in this market, you need methods to estimate your design's <b>key performance indicators</b>
(chip area, power consumption, computing power, etc.).
Well-known methods include RTL-simulations (e.g. Verilog) or System-level-simulations (e.g. SystemC).
But if you have a Verilog or SystemC model of your DLA, your project is already in a progressed state.
</p>
<p>
A method, which you can use even directly after you had the initial idea of your DLA, are so called <b>analytical models</b>.
These models try to estimate at system's KPIs using math or simple algorithms.
A pen and a paper or an excel spreadsheet is everything you need to get started with them.
The problem of analytical models is there extremely simplifying nature.
If your system has any kind of non-determinism or is dynamic, the obtained results will be pretty inaccurate for sure.
As most compute systems are very dynamic or include non-determinism (like caches), analytical models are usually not of great help.
But how well do they work for these emerging deep learning accelerators?
</p>
<p>
The paper provides you with all the details, so here's a summarized answer for this question:
At least for our examined case-study (the <a href="http://nvdla.org/" target="_blank">NVDLA</a>) we could estimate the execution time pretty well.
There are many reasons for that, so let my list a few of them:
</p>
<p>
<ul>
<li>The NVDLA is quite simple. Covolutional Neural Networks (the workload of the NVDLA) consist may consist of many layers,
but there are only a few underlying operation types.</li>
<li>There is only a small control flow overhead. In a few cycles the pipelines are filled and then the operations start.</li>
<li>There are no significant dynamic effects. Rather than using caches, the NVDLA incorporates application managed buffers.</li>
<li>The NVDLA is either bottlenecked by the available memory bandwidth or its maximum compute power.</li>
</ul>
</p>
<p>
Especially the last point is of particular interest as this allows us to use the so called
<a href="https://en.wikipedia.org/wiki/Roofline_model"><b>roofline model</b></a>.
A significant part of the paper is about how to rearrange this model for DLAs and apply it to the NVDLA.
The cool thing is, that this model takes some of the NVDLAs configurable hardware parameters as an input and gives you the estimated execution time as an output.
If you pour this into a python script, you can evaluate thousands of designs in a few seconds
and generate such nice graphs which you can use for <b>design space exploration</b>:
<img class="center" width="90%" src="/assets/alexnet_dse.svg">
</p>
<p>
Besides design space exploration there are still some open topics which we haven't studied yet.
I think that analytical models could be an interesting addition for DLA compilers.
For example, many DLAs support a so called Winograd convolution which basically allows you to convolute with less operations compared to a standard convolution.
But the downside is, that you need more weights leading to a higher memory bandwidth consumption.
In my eyes a smart compiler could analyse the system and choose the right operation depending on the bottleneck of the system.
</p>
<p>
Anyway, this was the "short" summary of our paper.
If you have any question, feel free to write an e-mail to me (see <a href="/about/">About</a> or use the address in the paper).
</p>
<script>
var acc = document.getElementsByClassName("accordion");
var i;
for (i = 0; i < acc.length; i++) {
acc[i].addEventListener("click", function() {
this.classList.toggle("active");
console.log(this);
var panel = this.nextElementSibling;
if (panel.style.display === "block") {
panel.style.display = "none";
} else {
panel.style.display = "block";
}
});
}
</script>/* Tooltip container */ .tooltip { position: relative; display: inline-block; }Programming a Guitar Tuner with Python2020-05-13T11:55:44+02:002020-05-13T11:55:44+02:00http://localhost:4000/digital/signal/processing/2020/05/13/guitar-tuner<style type="text/css">
/* Tooltip container */
.tooltip {
position: relative;
display: inline-block;
}
/* Tooltip text */
.tooltip .tooltiptext {
visibility: hidden;
width: 300px;
background-color: grey;
color: #fff;
text-align: center;
padding: 10px;
border-radius: 6px;
position: absolute;
z-index: 1;
}
/* Show the tooltip text when you mouse over the tooltip container */
.tooltip:hover .tooltiptext {
visibility: visible;
}
.left-align{
text-align: left!important;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
margin-bottom: 2%;
margin-top: 2%;
}
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
margin-top: 20px;
}
.active, .accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
display: none;
background-color: #ccc;
overflow: hidden;
}
.slidecontainer {
text-align: center;
margin-top: 20px;
margin-left: auto;
margin-right: auto;
width: 80%;
}
.slider {
-webkit-appearance: none;
width: 50%;
height: 10px;
border-radius: 5px;
background: #d3d3d3;
outline: none;
opacity: 0.7;
-webkit-transition: .2s;
transition: opacity .2s;
}
.slider:hover {
opacity: 1;
}
.slider::-moz-range-thumb {
width: 25px;
height: 25px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
.slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 15px;
height: 15px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
#scene3d {
width: 300px;
height: 300px;
margin-left: auto;
margin-right: auto;
}
canvas {
width: 80%;
display: block;
}
li{
margin: 12px 0;
}
monospace {
font-family:'Lucida Console', monospace
}
</style>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>
<head><link rel="shortcut icon" href="/assets/favicon.ico"></head>
<h2>Introduction</h2>
<p>
<p>
Hello there! In this post we will program a guitar tuner with Python.
This project is a pure software project, so there is no soldering or tinkering involved.
You just need a computer with a microphone (or an audio interface) and Python.
Of course the algorithms presented in the post are not bound to Python, so feel free to use any other language if you don't mind the addtional translation
(however, I recommend to not use <a href="https://www.tcl.tk/about/" targe="_blank">tcl</a> as it is "the best-kept secret in the software industry" and we better keep it a secret, lol).
</p>
<p>
We will start with analyzing the problem we have which is probably a detuned guitar and then forward to solving this problem using math and algorithms.
The focus of this post lies on understanding the methods we use and what their pros and cons are.
For those who want to code a guitar tuner in under 60 seconds: <a href="https://github.com/not-chciken/guitar_tuner/blob/master/hps_tuner.py">my Github repo</a> ;)
</p>
<h2>Guitars & Pitches</h2>
<p>
Let's start with some really basic introduction to music theory and guitars.
But at first we have to define some important musical terms as an exact distinction will avoid some ambiguities:
<ul>
<li>The <b>frequency</b> is defined as the reciprocal of the period duration of an repeating event.
For example, if we have a sinusoidal signal with a period length of 2ms, the frequency is 500Hz.
</li>
<li>
<b>Pitch</b> is the perceived frequency of a sound. Thus, in contrast to frequency which is physical measure, the pitch is a psychoacoustical measure.
This distinction is needed as there are cases where we hear frequencies which are physically not there (or don't hear frequencies which are actually there)!
Don't worry, we will have a closer look on that subject later.
</li>
<li>
A <b>note</b> is just a pitch with a name. For example, the well known A<sub>4</sub> is a pitch at 440Hz.
It can also carry temporal information like whole notes or half notes, but this is rather uninteresting for us.
</li>
<li>
The term <b>tone</b> seems to be ambigous, so we rather try to avoid it. The only kind of tone which will be used is a <b>pure tone</b>.
A pure tone is a sound with a sinusoidal waveform.
</li>
</ul>
(Sources: <a href="https://music.stackexchange.com/questions/3262/what-are-the-differences-between-tone-note-and-pitch" target="_blank">[1]</a>,
<a href="https://en.wikipedia.org/wiki/Pitch_(music)" target="_blank">[2]</a>,
<a href="https://en.wikipedia.org/wiki/Pure_tone" target="_blank">[3]</a>)
</p>
<p>
With this defintions in mind we will now look at how a guitar works on a musical level.
I guess most of you know this but the "default" guitar has 6 strings which are usually tuned in the standard tuning <i>EADGBE</i>.
Whereby each note refers to one of the strings. For example, the lowest string is tuned to the note E<sub>2</sub>.
This means that the string has a pitch of 82.41Hz, since this is how the tone E<sub>2</sub> is defined.
If it would have a pitch of 81Hz, our guitar is out-of-tune and we have to use the tuners on the headstock to get it back in tune.
Of course all other notes can be assigned to a certain pitch as well:
</p>
<p>
<img class="center" width="70%" src="/assets/guitar_tuner/guitar_tuning.svg">
</p>
<p>
Note, that for this post we assume an <a href="https://en.wikipedia.org/wiki/Equal_temperament" target="_blank">equal temperament</a>
and a concert pitch of A<sub>4</sub>=440Hz which covers probably 99% of modern music.
The cool thing about the equal temperament is that it defines the notes and pitches in half step fashion described by the following formula:
$$f(i) = f_0 \cdot 2^{i/12} $$
So, if you have a pitch \(f_0\), for example A<sub>4</sub> at 440Hz, and you want to increase it by one half step to an A#<sub>4</sub> then you have to multiply
the pitch 440Hz with \(2^{1/12}\) resulting in 466.16Hz.<br>
We can also derive an inverse formula which tells how many half steps are between the examined pitch \(f_i\) and a reference pitch \(f_o\).
$$12 \cdot log_2 \left( \frac{f_i}{f_o} \right) = i $$
This also allows us to assign a pitch a note. Our at least a note which is close to the pitch.
As you can imagine, this formula will be of particular interest for us.
Because if we can extract the pich from a guitar recoding, we want to know the closest note and how far away it is.
</p>
<p>
This leads us to the following Python function <monospace>find_closest_note(pitch)</monospace>. If we give it a pitch in Hz, it will return
the closest note and the corresponding pitch of the closest note.
</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">CONCERT_PITCH</span> <span class="o">=</span> <span class="mi">440</span>
<span class="n">ALL_NOTES</span> <span class="o">=</span> <span class="p">[</span><span class="s">"A"</span><span class="p">,</span><span class="s">"A#"</span><span class="p">,</span><span class="s">"B"</span><span class="p">,</span><span class="s">"C"</span><span class="p">,</span><span class="s">"C#"</span><span class="p">,</span><span class="s">"D"</span><span class="p">,</span><span class="s">"D#"</span><span class="p">,</span><span class="s">"E"</span><span class="p">,</span><span class="s">"F"</span><span class="p">,</span><span class="s">"F#"</span><span class="p">,</span><span class="s">"G"</span><span class="p">,</span><span class="s">"G#"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">find_closest_note</span><span class="p">(</span><span class="n">pitch</span><span class="p">):</span>
<span class="n">i</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="nb">round</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="n">log2</span><span class="p">(</span> <span class="n">pitch</span><span class="o">/</span><span class="n">CONCERT_PITCH</span> <span class="p">)</span><span class="o">*</span><span class="mi">12</span> <span class="p">)</span> <span class="p">)</span>
<span class="n">closest_note</span> <span class="o">=</span> <span class="n">ALL_NOTES</span><span class="p">[</span><span class="n">i</span><span class="o">%</span><span class="mi">12</span><span class="p">]</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="mi">4</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="nb">int</span><span class="p">((</span><span class="mi">9</span><span class="o">+</span><span class="nb">abs</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="o">/</span><span class="mi">12</span><span class="p">)</span> <span class="p">)</span>
<span class="n">closest_pitch</span> <span class="o">=</span> <span class="n">CONCERT_PITCH</span><span class="o">*</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="n">i</span><span class="o">/</span><span class="mi">12</span><span class="p">)</span>
<span class="k">return</span> <span class="n">closest_note</span><span class="p">,</span> <span class="n">closest_pitchs</span></code></pre></figure>
<p>
As next step we need to record the guitar and determine the pitch of the audio signal.
This is easier said than done as you will see ;)
</p>
<h2>Pitch Detection</h2>
<p>
After reading the following section you hopefully know what is meant by pitch detection and which algrothims are suited for this.
As already mentioned above pitch and frequencies are not the same. This might sound abstract at first, so let's "look" at an example.
</p>
The example is a short recording of me playing the note A<sub>4</sub> with a pitch of 440Hz on a guitar.
<audio controls class="center"> <source src="/assets/guitar_tuner/example1.mp3" type="audio/mp3"> Your browser does not support the audio element.</audio>
<button class="accordion"><b>Code for recording</b></button>
<div class="panel">
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">sounddevice</span> <span class="k">as</span> <span class="n">sd</span>
<span class="kn">import</span> <span class="nn">scipy.io.wavfile</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="n">SAMPLE_FREQ</span> <span class="o">=</span> <span class="mi">44100</span> <span class="c1"># Sampling frequency of the recording
</span><span class="n">SAMPLE_DUR</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Duration of the recoding
</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Grab your guitar!"</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Gives you a second to grab your guitar ;)
</span>
<span class="n">myRecording</span> <span class="o">=</span> <span class="n">sd</span><span class="o">.</span><span class="n">rec</span><span class="p">(</span><span class="n">SAMPLE_DUR</span> <span class="o">*</span> <span class="n">SAMPLE_FREQ</span><span class="p">,</span> <span class="n">samplerate</span><span class="o">=</span><span class="n">SAMPLE_FREQ</span><span class="p">,</span> <span class="n">channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">dtype</span><span class="o">=</span><span class="s">'float64'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Recording audio"</span><span class="p">)</span>
<span class="n">sd</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="n">sd</span><span class="o">.</span><span class="n">play</span><span class="p">(</span><span class="n">myRecording</span><span class="p">,</span> <span class="n">SAMPLE_FREQ</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Playing audio"</span><span class="p">)</span>
<span class="n">sd</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="n">scipy</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">wavfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">'example1.wav'</span><span class="p">,</span> <span class="n">SAMPLE_FREQ</span><span class="p">,</span> <span class="n">myRecording</span><span class="p">)</span>
</code></pre></figure>
</div>
<br><br>
<p>
The same example but now visualized as a time/value graph looks like follows
<img class="center" width="70%" src="/assets/guitar_tuner/example1.svg">
</p>
<button class="accordion"><b>Code for creating a signal/time plot</b></button>
<div class="panel">
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">scipy.io.wavfile</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">sampleFreq</span><span class="p">,</span> <span class="n">myRecording</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">wavfile</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">"example1.wav"</span><span class="p">)</span>
<span class="n">sampleDur</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">myRecording</span><span class="p">)</span><span class="o">/</span><span class="n">sampleFreq</span>
<span class="n">timeX</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">sampleDur</span><span class="p">,</span> <span class="mi">1</span><span class="o">/</span><span class="n">sampleFreq</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">timeX</span><span class="p">,</span> <span class="n">myRecording</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'x(k)'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'time[s]'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></figure>
</div>
<br><br>
<p>
As you can see the signal has a period length of roughly 2.27ms which corresponds to a frequency of 440Hz.
So far so good.
But you can also see that the signal is far away from being a pure tone. So, what is happening there?
</p>
<p>
The allround tool of a digital signal processing engineer using the so-called <b>Discrete Fourier Transform (DFT)</b>.
From a mathematical point of view it shows how a discrete signal can be decomposed as a set of cosine functions
oscillating at different frequecies.<br>
Or in musical terms: the DFT shows which pure tones can be found in an audio recording.
If you are interested in the mathematical details of the DFT, I recommend you to read my previous
<a href="/digital/signal/processing/2020/04/13/dft.html" target="_blank">post</a>.
But no worries, the most important aspects will be repeated in this post.<br>
The cool thing about the DFT is that it provides us with a so called magnitude spectrum. For the given example it looks like this:
</p>
<img class="center" width="75%" src="/assets/guitar_tuner/example1_dft.svg">
<button class="accordion"><b>Code for creating a DFT plot</b></button>
<div class="panel">
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">scipy.io.wavfile</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">scipy.fftpack</span> <span class="kn">import</span> <span class="n">fft</span>
<span class="n">sampleFreq</span><span class="p">,</span> <span class="n">myRecording</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">wavfile</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">"example1.wav"</span><span class="p">)</span>
<span class="n">sampleDur</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">myRecording</span><span class="p">)</span><span class="o">/</span><span class="n">sampleFreq</span>
<span class="n">timeX</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sampleFreq</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span> <span class="n">sampleFreq</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">myRecording</span><span class="p">))</span>
<span class="n">absFreqSpectrum</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">fft</span><span class="p">(</span><span class="n">myRecording</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">absFreqSpectrum</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">timeX</span><span class="p">,</span> <span class="n">absFreqSpectrum</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="n">myRecording</span><span class="p">)</span><span class="o">//</span><span class="mi">2</span><span class="p">])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'|X(n)|'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'frequency[Hz]'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></figure>
</div>
<br><br>
<p>
On the x-axis you can see the frequencies of the pure tones while the y-Axis displays their intensity.
<p>
The spectrum reveals some interesting secrets which you couldn't see in the time domain.
As expected there is a strong intensity of the pure tone at 440Hz.
But there are other significant peaks at integer multiples of 440Hz. For example, 880Hz, 1320Hz, etc.
If you are familiar with music you may know the name of these peaks: harmonics or overtones.
</p>
<p>
The reasone for the overtones is quite simple. When you hit a guitar string you excite it to vibrate at certain frequencies.
Especially frequencies which form standing waves can vibrate for a long time.
These fullfill the boundary conditions that the string cannot move at the points where it is attached to the guitar (bridge and nut).
Thus, multiple overtones are also excited which are all multiples of the fundamental frequency. The following GIF visualizes this:
<img class="center" width="70%" src="/assets/guitar_tuner/standing_waves.gif">
</p>
<p>
The overall set of harmonics and how they are related is called timbre. The timbre is what makes you guitar sound like a guitar.
This is pretty cool on the one hand but it makes pitch detection a real challenge.
Because at this point you might already had an idea for a guitar tuner: create a DFT spectrum, determine the frequency of the highest peak, done.
Well, for the given spectrum about this might work, but there are many cases for which you will get wrong results.
<br>
The first reason is that the fundamental frequency does not always have create the highest peak.
Altough not beeing the highest peak the pitch is determined by it.
This is the reason why pitch detection is not just a frequency detection!
<br>
The second reason is that the power of the guitar signal is distributed over a large frequency band.
By selecting only the highest peak, the algorithm would be very prone to narrowband noise.
In the example spectrum given about you can see a high peak at 50Hz which is caused by mains hum.
Although the peak is relatively high, it does not determine the overall sound impression of the recording.
Or did you feel like the 50Hz noise was very present?
</p>
<p>
The complexity of this problem has lead to a number of different <a href="https://en.wikipedia.org/wiki/Pitch_detection_algorithm" target="_blank">
pitch detection algorithms</a>. In order to choose the right algorithm we have to think about what requirements a guitar tuner needs to fullfill.
The most important requirements surely are:
<ul>
<li>
<b>Accuracy</b>: According to <a href="https://books.google.de/books?id=Slg10ekZBkAC&pg=PA65&redir_esc=y#v=onepage&q&f=false">[4]</a> the just-noticable difference
for complex tones under 1000Hz is 1Hz. So, our goal should roughly be a frequency resolution of 1Hz in a frequency range of ca. 80-400Hz.
</li>
<li>
<b>Realtime capabability</b>: When using the tuner we want to have a live feedback about which note we play.
We therefore have to consider things like the complexity of the algorithm and the hardware we are using.
</li>
<li>
<b>Delay:</b> If the results only popup 5 seconds after we played a string, tuning our guitar accurately will be pretty hard.
I cannot provide you with any literature on that, but I guess a delay of lesser than 500ms sounds fair.
</li>
</ul>
</p>
<p>
In the following we will start with programming a simple maximum frequency peak algorithm.
As already mentioned above, this method may not work pretty well since the fundamental frequency is not guarenteed to always have the highest peak.
However, this method is quite simple and and a gentle introduction.
</p>
<p>
In the second the section a more sophisticated algorithm using the <b>Harmonic Product Spectrums (HPS)</b> is implemented.
It is based on the simple tuner, so don't skip the first section ;)
</p>
<h2>Simple DFT tuner</h2>
<p>
Our first approach will be a simple guitar tuner using the DFT peak approach.
Usually the DFT algorithm is applied to the whole duration of signal.
However, our guitar tuner is a realtime application where there is no concept of a "whole signal".
Furthermore, as we are going to play several different notes, only the last few seconds are relevant for pitch detection.
So, instead we use the so called discrete Short-time Fourier Transform (STFT) which is basically just the DFT applied for the most recent samples.
You can imagine it as some kind of window where new samples push out the oldest samples:
<img class="center" width="90%" src="/assets/guitar_tuner/sliding_dft.gif">
Note, that the spectrum is now a so-called <b>spectrogram</b> as it varies over time.
</p>
<p>
Before we start with programming our tuner, we have to think about design considerations concerning the DFT algorithm.
Because can the DFT fullfill the requirements we proposed above?
</p>
<p>
Let's begin with the frequency range.
The DFT allows you to analyze frequencies in the range of \( f < f_s / 2 \) with \(f_s\) beeing the sample frequency.
Typical sound recording devices use a sampling rate of around 40kHz giving us a frequency range of \(f < 20kHz\).
This is more than enough to even capture all the overtones.<br>
Note, that the frequency range is an inherent property of the DFT algorithm, but there is also a close relation to the
<b>Nyquist–Shannon sampling theorem</b>.
The theorem states that you cannot extract all the information from a signal if the highest occuring frequencies
are greater than \(f_s / 2 \). This means the DFT is already working at the theoretical limit.
</p>
<p>
As a next point we look at the frequency resolution of the DFT which is
(for details see my <a href="/digital/signal/processing/2020/04/13/dft.html" target="_blank">DFT post</a>):
$$ f_s / N \approx 1 / t_{window} [Hz]$$
With \(N\) being the window size in samples, and \(t_{window}\) the window size in seconds.
The resolution in Hertz is approximately the reciprocal of the window size in seconds.
So, if we have a window of 500ms, then our frequency resolution is 2Hz.
This is where things become tricky as a larger window results in a better frequency resolution but negatively affects the delay.
If we consider frequency resolution more important up to a certaint extent than delay, a windows size of 1s sounds like a good choice.
With this setting we achieve a frequency resolution of 1Hz.
</p>
<p>
So far so good. If you convert all this knowledge to some code, your result might look like this:
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">sounddevice</span> <span class="k">as</span> <span class="n">sd</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.fftpack</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="c1"># General settings
</span><span class="n">SAMPLE_FREQ</span> <span class="o">=</span> <span class="mi">44100</span> <span class="c1"># sample frequency in Hz
</span><span class="n">WINDOW_SIZE</span> <span class="o">=</span> <span class="mi">44100</span> <span class="c1"># window size of the DFT in samples
</span><span class="n">WINDOW_STEP</span> <span class="o">=</span> <span class="mi">21050</span> <span class="c1"># step size of window
</span><span class="n">WINDOW_T_LEN</span> <span class="o">=</span> <span class="n">WINDOW_SIZE</span> <span class="o">/</span> <span class="n">SAMPLE_FREQ</span> <span class="c1"># length of the window in seconds
</span><span class="n">SAMPLE_T_LENGTH</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">SAMPLE_FREQ</span> <span class="c1"># length between two samples in seconds
</span><span class="n">windowSamples</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WINDOW_SIZE</span><span class="p">)]</span>
<span class="c1"># This function finds the closest note for a given pitch
# Returns: note (e.g. A4, G#3, ..), pitch of the tone
</span><span class="n">CONCERT_PITCH</span> <span class="o">=</span> <span class="mi">440</span>
<span class="n">ALL_NOTES</span> <span class="o">=</span> <span class="p">[</span><span class="s">"A"</span><span class="p">,</span><span class="s">"A#"</span><span class="p">,</span><span class="s">"B"</span><span class="p">,</span><span class="s">"C"</span><span class="p">,</span><span class="s">"C#"</span><span class="p">,</span><span class="s">"D"</span><span class="p">,</span><span class="s">"D#"</span><span class="p">,</span><span class="s">"E"</span><span class="p">,</span><span class="s">"F"</span><span class="p">,</span><span class="s">"F#"</span><span class="p">,</span><span class="s">"G"</span><span class="p">,</span><span class="s">"G#"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">find_closest_note</span><span class="p">(</span><span class="n">pitch</span><span class="p">):</span>
<span class="n">i</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="nb">round</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="n">log2</span><span class="p">(</span> <span class="n">pitch</span><span class="o">/</span><span class="n">CONCERT_PITCH</span> <span class="p">)</span><span class="o">*</span><span class="mi">12</span> <span class="p">)</span> <span class="p">)</span>
<span class="n">closestNote</span> <span class="o">=</span> <span class="n">ALL_NOTES</span><span class="p">[</span><span class="n">i</span><span class="o">%</span><span class="mi">12</span><span class="p">]</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="mi">4</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="nb">int</span><span class="p">((</span><span class="mi">9</span><span class="o">+</span><span class="nb">abs</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="o">/</span><span class="mi">12</span><span class="p">)</span> <span class="p">)</span>
<span class="n">closestPitch</span> <span class="o">=</span> <span class="n">CONCERT_PITCH</span><span class="o">*</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="n">i</span><span class="o">/</span><span class="mi">12</span><span class="p">)</span>
<span class="k">return</span> <span class="n">closestNote</span><span class="p">,</span> <span class="n">closestPitch</span>
<span class="c1"># The sounddecive callback function
# Provides us with new data once WINDOW_STEP samples have been fetched
</span><span class="k">def</span> <span class="nf">callback</span><span class="p">(</span><span class="n">indata</span><span class="p">,</span> <span class="n">frames</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="k">global</span> <span class="n">windowSamples</span>
<span class="k">if</span> <span class="n">status</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">indata</span><span class="p">):</span>
<span class="n">windowSamples</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">((</span><span class="n">windowSamples</span><span class="p">,</span><span class="n">indata</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]))</span> <span class="c1"># append new samples
</span> <span class="n">windowSamples</span> <span class="o">=</span> <span class="n">windowSamples</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">indata</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]):]</span> <span class="c1"># remove old samples
</span> <span class="n">magnitudeSpec</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span> <span class="n">scipy</span><span class="o">.</span><span class="n">fftpack</span><span class="o">.</span><span class="n">fft</span><span class="p">(</span><span class="n">windowSamples</span><span class="p">)[:</span><span class="nb">len</span><span class="p">(</span><span class="n">windowSamples</span><span class="p">)</span><span class="o">//</span><span class="mi">2</span><span class="p">]</span> <span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="mi">62</span><span class="o">/</span><span class="p">(</span><span class="n">SAMPLE_FREQ</span><span class="o">/</span><span class="n">WINDOW_SIZE</span><span class="p">))):</span>
<span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span> <span class="c1">#suppress mains hum
</span>
<span class="n">maxInd</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">)</span>
<span class="n">maxFreq</span> <span class="o">=</span> <span class="n">maxInd</span> <span class="o">*</span> <span class="p">(</span><span class="n">SAMPLE_FREQ</span><span class="o">/</span><span class="n">WINDOW_SIZE</span><span class="p">)</span>
<span class="n">closestNote</span><span class="p">,</span> <span class="n">closestPitch</span> <span class="o">=</span> <span class="n">find_closest_note</span><span class="p">(</span><span class="n">maxFreq</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'cls'</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span><span class="o">==</span><span class="s">'nt'</span> <span class="k">else</span> <span class="s">'clear'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s">"Closest note: {closestNote} {maxFreq:.1f}/{closestPitch:.1f}"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">'no input'</span><span class="p">)</span>
<span class="c1"># Start the microphone input stream
</span><span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="n">sd</span><span class="o">.</span><span class="n">InputStream</span><span class="p">(</span><span class="n">channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">callback</span><span class="o">=</span><span class="n">callback</span><span class="p">,</span>
<span class="n">blocksize</span><span class="o">=</span><span class="n">WINDOW_STEP</span><span class="p">,</span>
<span class="n">samplerate</span><span class="o">=</span><span class="n">SAMPLE_FREQ</span><span class="p">):</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
</p>
<p>
This code should work out of the box, assuming that the corresponding python libraries are installed.
Here are some out-of-code comments which explain the single lines more in detail:<br>
</p>
<p>
<i>Line 1-4: Basic imports such as numpy for math stuff and sounddecive for capturing the microphone input</i><br>
<i>Line 7-12: Global variables</i><br>
<i>Line 14-22: The function for finding the nearest note for a given pitch. See section "Guitars & Pitches" for the detailed explaination.</i><br>
<i>Line 24-45: These lines are the heart of our simple guitar tuner, so a let's have a closer look.</i><br>
<i>Line 31-32: Here the incoming samples are appended to an array while the old samples are remmoved.
Thus, a window of WINDOW_SIZE samples is obtained.</i><br>
<i>Line 33: The magnitude spectrum is obtained by using the Fast Fourier Transform.
Note, that one half of the spectrum only provides redundant information.</i><br>
<i>Line 35-36: Here the mains hum is suppressed by simply setting all frequencies below 62Hz to 0.
This is still sufficient for a drop C tuning (C<sub>2</sub>=65.4Hz).
</i><br>
<i>Line 38-40: First, the highest frequency peak is determined.
As a next step the highest frequencies is used to get the closest pitch and note.</i><br>
<i>Line 48-55: The input stream is initialized and runs in an infinite loop.
Once enough data is sampled, the callback function is called.
</i><br>
<i>Line 42-43: Printing the results. Depending on your operating system a different clear function has to be called.
</i><br>
</p>
<p>
I also made a javascript version which works directly from you browser.
Note, that it uses slightly different parameters.
The corresponding magnitude spectrum is also visualized:
<canvas class="center" id="canCtxSimple" width="600" height="300">
</p>
<p>
If you tried to tune your guitar using this tuner you probably noticed that it doesn't work pretty well.
As expected there main problem are harmonic errors as the overtones are often more intense than the actual fundamental frequency.
A way to deal with is problem is using the Harmonic Product Spectrums as the next section will show.
</p>
<h2>HPS tuner</h2>
<p>
In this section we will refine our simple tuner by using the so-called Harmonic Product Spectrum (HPS) which was introduced by A. M. Noll in 1969.
The idea behind it is quite simple yet clever.
The Harmonic Product Spectrum is a multiplication of \(R\) magnitude spectrums with different frequency scalings:
$$ Y(f) = \prod_{r=1}^{R} |X(fr)| $$
With \(X(f)\) being the magnitude spectrum of the signal.
I think that this is hard to explain in words, so let's take a look at a visualization for \(R=4\):
<img class="center" width="70%" src="/assets/guitar_tuner/hps1.svg">
In the upper half of the visualization you can see the magnitude spectrums for the 440Hz guitar tone example.
Each with a different frequency scaling factor \(r\).
These magnitude spectrums are multiplied in a subsequent step resulting in the Harmonic Product Spectrum \(|Y(f)|\).
As the frequency scaling is always an integer number, the product vanishes for non-fundamental frequencies.
Thus, the last step is simply taking the highest peak of the HPS:
$$ f_{max} = \max_{f}{|Y(f)|} $$
For the given example the peak at 440Hz is perfectly determined.<br>
In terms of frequency resolution and delay, the HPS tuner is pretty similar to the simple DFT tuner as the DFT is the basis of the HPS.
However, as the HPS uses the harmonies as well to determine the pitch a higher frequency resolution can be achieved if the spectrum is
interpolated and upsampled before the HPS process is executed.
Note, that upsampling and intepolating does not add any information to the spectrum but avoids information loss as the spectrum is effectively downsampled
when using different frequency scaling.
<br>
Let me illustrate this by using an intuitive example.
Assuming we have a DFT with a frequency resolution of 1Hz and we have a peak at 1761 Hz from which we know that it is the 2nd harmonic of a fundamental frequency
which can be found at 440Hz in the spectrum.
If you have this knowledge, you can calculate \(1321/3=440.25\) and conclude that the fundamental frequency is rather 440.25Hz than 440Hz.
The same principle is used by the HPS algorithm.
</p>
<p>
A python version of a HPS guitar tuner may look like this:
</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">sounddevice</span> <span class="k">as</span> <span class="n">sd</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.fftpack</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">copy</span>
<span class="c1"># General settings
</span><span class="n">SAMPLE_FREQ</span> <span class="o">=</span> <span class="mi">48000</span> <span class="c1"># sample frequency in Hz
</span><span class="n">WINDOW_SIZE</span> <span class="o">=</span> <span class="mi">48000</span> <span class="c1"># window size of the DFT in samples
</span><span class="n">WINDOW_STEP</span> <span class="o">=</span> <span class="mi">12000</span> <span class="c1"># step size of window
</span><span class="n">WINDOW_T_LEN</span> <span class="o">=</span> <span class="n">WINDOW_SIZE</span> <span class="o">/</span> <span class="n">SAMPLE_FREQ</span> <span class="c1"># length of the window in seconds
</span><span class="n">SAMPLE_T_LENGTH</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">SAMPLE_FREQ</span> <span class="c1"># length between two samples in seconds
</span><span class="n">NUM_HPS</span> <span class="o">=</span> <span class="mi">8</span> <span class="c1">#max number of harmonic product spectrums
</span><span class="n">DELTA_FREQ</span> <span class="o">=</span> <span class="p">(</span><span class="n">SAMPLE_FREQ</span><span class="o">/</span><span class="n">WINDOW_SIZE</span><span class="p">)</span> <span class="c1"># frequency step width of the interpolated DFT
</span><span class="n">windowSamples</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WINDOW_SIZE</span><span class="p">)]</span>
<span class="n">noteBuffer</span> <span class="o">=</span> <span class="p">[</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">]</span>
<span class="c1"># This function finds the closest note for a given pitch
# Returns: note (e.g. a, g#, ..), pitch of the tone
</span><span class="n">CONCERT_PITCH</span> <span class="o">=</span> <span class="mi">440</span>
<span class="n">ALL_NOTES</span> <span class="o">=</span> <span class="p">[</span><span class="s">"A"</span><span class="p">,</span><span class="s">"A#"</span><span class="p">,</span><span class="s">"B"</span><span class="p">,</span><span class="s">"C"</span><span class="p">,</span><span class="s">"C#"</span><span class="p">,</span><span class="s">"D"</span><span class="p">,</span><span class="s">"D#"</span><span class="p">,</span><span class="s">"E"</span><span class="p">,</span><span class="s">"F"</span><span class="p">,</span><span class="s">"F#"</span><span class="p">,</span><span class="s">"G"</span><span class="p">,</span><span class="s">"G#"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">find_closest_note</span><span class="p">(</span><span class="n">pitch</span><span class="p">):</span>
<span class="n">i</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="nb">round</span><span class="p">(</span> <span class="n">np</span><span class="o">.</span><span class="n">log2</span><span class="p">(</span> <span class="n">pitch</span><span class="o">/</span><span class="n">CONCERT_PITCH</span> <span class="p">)</span><span class="o">*</span><span class="mi">12</span> <span class="p">)</span> <span class="p">)</span>
<span class="n">clostestNote</span> <span class="o">=</span> <span class="n">ALL_NOTES</span><span class="p">[</span><span class="n">i</span><span class="o">%</span><span class="mi">12</span><span class="p">]</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="mi">4</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="nb">int</span><span class="p">(</span> <span class="p">(</span><span class="mi">9</span><span class="o">+</span><span class="nb">abs</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="o">/</span><span class="mi">12</span> <span class="p">)</span> <span class="p">)</span>
<span class="n">closestPitch</span> <span class="o">=</span> <span class="n">CONCERT_PITCH</span><span class="o">*</span><span class="mi">2</span><span class="o">**</span><span class="p">(</span><span class="n">i</span><span class="o">/</span><span class="mi">12</span><span class="p">)</span>
<span class="k">return</span> <span class="n">clostestNote</span><span class="p">,</span> <span class="n">closestPitch</span>
<span class="n">hannWindow</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">hanning</span><span class="p">(</span><span class="n">WINDOW_SIZE</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">callback</span><span class="p">(</span><span class="n">indata</span><span class="p">,</span> <span class="n">frames</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="k">global</span> <span class="n">windowSamples</span><span class="p">,</span> <span class="n">lastNote</span>
<span class="k">if</span> <span class="n">status</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">status</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">indata</span><span class="p">):</span>
<span class="n">windowSamples</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">((</span><span class="n">windowSamples</span><span class="p">,</span><span class="n">indata</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]))</span> <span class="c1"># append new samples
</span> <span class="n">windowSamples</span> <span class="o">=</span> <span class="n">windowSamples</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">indata</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]):]</span> <span class="c1"># remove old samples
</span>
<span class="n">signalPower</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">windowSamples</span><span class="p">,</span> <span class="nb">ord</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">windowSamples</span><span class="p">)</span>
<span class="k">if</span> <span class="n">signalPower</span> <span class="o"><</span> <span class="mf">5e-7</span><span class="p">:</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'cls'</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span><span class="o">==</span><span class="s">'nt'</span> <span class="k">else</span> <span class="s">'clear'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Closest note: ..."</span><span class="p">)</span>
<span class="k">return</span>
<span class="n">hannSamples</span> <span class="o">=</span> <span class="n">windowSamples</span> <span class="o">*</span> <span class="n">hannWindow</span>
<span class="n">magnitudeSpec</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span> <span class="n">scipy</span><span class="o">.</span><span class="n">fftpack</span><span class="o">.</span><span class="n">fft</span><span class="p">(</span><span class="n">hannSamples</span><span class="p">)[:</span><span class="nb">len</span><span class="p">(</span><span class="n">hannSamples</span><span class="p">)</span><span class="o">//</span><span class="mi">2</span><span class="p">]</span> <span class="p">)</span>
<span class="c1">#supress mains hum
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="mi">62</span><span class="o">/</span><span class="n">DELTA_FREQ</span><span class="p">)):</span>
<span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1">#Calculate average energy per frequency for the octave bands
</span> <span class="n">octaveBands</span> <span class="o">=</span> <span class="p">[</span><span class="mi">50</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">200</span><span class="p">,</span><span class="mi">400</span><span class="p">,</span><span class="mi">800</span><span class="p">,</span><span class="mi">1600</span><span class="p">,</span><span class="mi">3200</span><span class="p">,</span><span class="mi">6400</span><span class="p">,</span><span class="mi">12800</span><span class="p">,</span><span class="mi">25600</span><span class="p">]</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">octaveBands</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">indStart</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">octaveBands</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="o">/</span><span class="n">DELTA_FREQ</span><span class="p">)</span>
<span class="n">indEnd</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">octaveBands</span><span class="p">[</span><span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="o">/</span><span class="n">DELTA_FREQ</span><span class="p">)</span>
<span class="n">indEnd</span> <span class="o">=</span> <span class="n">indEnd</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">)</span> <span class="o">></span> <span class="n">indEnd</span> <span class="k">else</span> <span class="nb">len</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">)</span>
<span class="n">avgEnergPerFreq</span> <span class="o">=</span> <span class="mi">1</span><span class="o">*</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">indStart</span><span class="p">:</span><span class="n">indEnd</span><span class="p">],</span> <span class="nb">ord</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">indEnd</span><span class="o">-</span><span class="n">indStart</span><span class="p">)</span>
<span class="n">avgEnergPerFreq</span> <span class="o">=</span> <span class="n">avgEnergPerFreq</span><span class="o">**</span><span class="mf">0.5</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">indStart</span><span class="p">,</span> <span class="n">indEnd</span><span class="p">):</span>
<span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">if</span> <span class="n">magnitudeSpec</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">></span> <span class="n">avgEnergPerFreq</span> <span class="k">else</span> <span class="mi">0</span> <span class="c1">#suppress white noise
</span>
<span class="c1">#Interpolate spectrum
</span> <span class="n">magSpecIpol</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">interp</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">),</span> <span class="mi">1</span><span class="o">/</span><span class="n">NUM_HPS</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">magnitudeSpec</span><span class="p">)),</span> <span class="n">magnitudeSpec</span><span class="p">)</span>
<span class="n">magSpecIpol</span> <span class="o">=</span> <span class="n">magSpecIpol</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">magSpecIpol</span><span class="p">,</span> <span class="nb">ord</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="c1">#normalize it
</span>
<span class="n">hpsSpec</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">deepcopy</span><span class="p">(</span><span class="n">magSpecIpol</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_HPS</span><span class="p">):</span>
<span class="n">tmpHpsSpec</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">hpsSpec</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">magSpecIpol</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)))],</span> <span class="n">magSpecIpol</span><span class="p">[::(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)])</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">any</span><span class="p">(</span><span class="n">tmpHpsSpec</span><span class="p">):</span>
<span class="k">break</span>
<span class="n">hpsSpec</span> <span class="o">=</span> <span class="n">tmpHpsSpec</span>
<span class="n">maxInd</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">hpsSpec</span><span class="p">)</span>
<span class="n">maxFreq</span> <span class="o">=</span> <span class="n">maxInd</span> <span class="o">*</span> <span class="p">(</span><span class="n">SAMPLE_FREQ</span><span class="o">/</span><span class="n">WINDOW_SIZE</span><span class="p">)</span> <span class="o">/</span> <span class="n">NUM_HPS</span>
<span class="n">closestNote</span><span class="p">,</span> <span class="n">closestPitch</span> <span class="o">=</span> <span class="n">find_closest_note</span><span class="p">(</span><span class="n">maxFreq</span><span class="p">)</span>
<span class="n">maxFreq</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">maxFreq</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">closestPitch</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">closestPitch</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">noteBuffer</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">closestNote</span><span class="p">)</span> <span class="c1">#note that this is a ringbuffer
</span> <span class="n">noteBuffer</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="n">majorityVote</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">noteBuffer</span><span class="p">),</span> <span class="n">key</span> <span class="o">=</span> <span class="n">noteBuffer</span><span class="o">.</span><span class="n">count</span><span class="p">)</span>
<span class="k">if</span> <span class="n">noteBuffer</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">majorityVote</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">detectedNote</span> <span class="o">=</span> <span class="n">majorityVote</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">'cls'</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">name</span><span class="o">==</span><span class="s">'nt'</span> <span class="k">else</span> <span class="s">'clear'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s">"Closest note: {closestNote} {maxFreq}/{closestPitch}"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">'no input'</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Starting HPS guitar tuner..."</span><span class="p">)</span>
<span class="k">with</span> <span class="n">sd</span><span class="o">.</span><span class="n">InputStream</span><span class="p">(</span><span class="n">channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">callback</span><span class="o">=</span><span class="n">callback</span><span class="p">,</span>
<span class="n">blocksize</span> <span class="o">=</span> <span class="n">WINDOW_STEP</span><span class="p">,</span>
<span class="n">samplerate</span> <span class="o">=</span> <span class="n">SAMPLE_FREQ</span><span class="p">):</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="nb">input</span><span class="p">()</span>
<span class="k">if</span> <span class="n">response</span> <span class="ow">in</span> <span class="p">(</span><span class="s">''</span><span class="p">,</span> <span class="s">'q'</span><span class="p">,</span> <span class="s">'Q'</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>
The basic code has many things in common with simple DFT tuner, but of course the algorithmic parts are pretty different.
Furthermore, some signal processing methods were added in order to increase the signal quality. These methods could also be applied to the DFT tuner.
In the following I will provide some comments on the code:
</p>
<p>
<i>Line 38-42: Calculate the signal power. If there is no sound, we don't need to do the signal processing part.</i><br>
<i>Line 44-45: The signal is multiplied with a Hann Window to reduce
<a href="https://en.wikipedia.org/wiki/Spectral_leakage" target="_blank">spectral leakage</a></i><br>
<i>Line 47-49: Suppress the mains hum. This is a quite important signal enhacement.</i><br>
<i>Line 51-60: The average energy for a frequency band is calculated.
If the energy of a given frequency is below this average energy, then the energy is set to zero.
With this method we can reduce white noise or noise which is very close to white noise (note, that white noise has a flat spectral distribution).
This is necessary as the HPS method does not work so well if there is a lot of white noise.</i><br>
<i>Line 62-64: Here the DFT spectrum is interpolated. We need to do this as we are required to downsample the spectrum in the later steps.
Imagine there is a perfect peak at a given frequency and all the frequencies next to it are zero.
If we now downsample the spectrum, there is a certain risk that this peak is simply ignored.
This can be avoided having an interpolated spectrum as the peaks are "smeared" over a larger area.
</i><br>
<i>Line 68-72: The heart of the HPS algorithm. Here the frequency scaled spectrums are multiplied <monospace>NUM_HPS</monospace> times.
The loop is stopped earlier if the spectrum is 0.
</i><br>
<i>Line 74-...: Basically the same as DFT algorithm but with a majority vote filter.
If two or more of the three last notes are the same, then print the this note.</i><br>
</p>
<p>
Again, I also made a javascript version of this with some reduced signal enhacement as javascript is not really made for realtime signal processing.
</p>
<canvas class="center" id="canCtxHPS" width="600" height="300"></canvas>
If you compare this tuner to the previous simple tuner, you will probably notice that it already works many times more accurate.
In fact, when plugging my guitar directly into the computer with an audio interface, it works perfectly.
When using a simple microphone I sometimes notice some harmonic errors but in general tuning the guitar is possible.
<h2>Summary</h2>
In this post I showed how to write a guitar tuner using Python.
We first started with a simple DFT peak detection algorithm and then refined it using a Harmonic Product Spectrum approach
which already gave us a perfectly working guitar tuner.
In harsh environments the HPS tuner sometimes suffers from harmonic errors,
so in the future I might make more guitar tuners using different pitch detection algorithms based on cepstrums (yes, this is correct, you are not having a stroke)
or correlation.<br>
If you like to add or critize somthing, pease contact me :)
You can do this by writing an e-mail to me (see <a href="/about/">About</a>).
<script>
var acc = document.getElementsByClassName("accordion");
var i;
for (i = 0; i < acc.length; i++) {
acc[i].addEventListener("click", function() {
this.classList.toggle("active");
console.log(this);
var panel = this.nextElementSibling;
if (panel.style.display === "block") {
panel.style.display = "none";
} else {
panel.style.display = "block";
}
});
}
</script>
<script src="//cdnjs.cloudflare.com/ajax/libs/ramda/0.25.0/ramda.min.js"></script>
<script src="/assets/guitar_tuner/dsp.js"></script>
<script src="/assets/guitar_tuner/simple_tuner.js"></script>
<script src="/assets/guitar_tuner/hps_tuner.js"></script>
<script>
// var mediaStream;
// var streamRunning = false;
// var canvas = document.querySelector("#canCtxSimple");
// var canvasCtx = canvas.getContext("2d");
// var WIDTH = canvas.width;
// var HEIGHT = canvas.height;
// canvasCtx.fillStyle = '#a8b2bf';
// canvasCtx.fillRect(0, 0, WIDTH, HEIGHT);
// canvasCtx.strokeStyle = 'rgb(0, 0, 0)';
// canvasCtx.fillStyle = 'rgb(0, 0, 0)';
// canvasCtx.font = "30px Arial";
// canvasCtx.textAlign = "center";
// canvasCtx.fillText("Click here to start the simple tuner", WIDTH*0.5, HEIGHT*0.5);
</script>/* Tooltip container */ .tooltip { position: relative; display: inline-block; }Discrete Fourier Transform: Introduction2020-04-13T11:55:44+02:002020-04-13T11:55:44+02:00http://localhost:4000/digital/signal/processing/2020/04/13/dft<style type="text/css">
/* Tooltip container */
.tooltip {
position: relative;
display: inline-block;
}
/* Tooltip text */
.tooltip .tooltiptext {
visibility: hidden;
width: 300px;
background-color: grey;
color: #fff;
text-align: center;
padding: 10px;
border-radius: 6px;
position: absolute;
z-index: 1;
}
/* Show the tooltip text when you mouse over the tooltip container */
.tooltip:hover .tooltiptext {
visibility: visible;
}
.left-align{
text-align: left!important;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
margin-bottom: 2%;
margin-top: 2%;
}
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
margin-top: 20px;
}
.active, .accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
display: none;
background-color: #ccc;
overflow: hidden;
}
.slidecontainer {
text-align: center;
margin-top: 20px;
margin-left: auto;
margin-right: auto;
width: 80%;
}
.slider {
-webkit-appearance: none;
width: 50%;
height: 10px;
border-radius: 5px;
background: #d3d3d3;
outline: none;
opacity: 0.7;
-webkit-transition: .2s;
transition: opacity .2s;
}
.slider:hover {
opacity: 1;
}
.slider::-moz-range-thumb {
width: 25px;
height: 25px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
.slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 15px;
height: 15px;
border-radius: 50%;
background: #338CFF;
cursor: pointer;
}
#scene3d {
width: 300px;
height: 300px;
margin-left: auto;
margin-right: auto;
}
canvas {
width: 90%;
display: block;
}
li{
margin: 12px 0;
}
</style>
<script src="/assets/js/three.js">/*WebGl*/</script>
<script src="/assets/js/THREE.MeshLine.js">/*Addon for prettier lines*/</script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>
<head><link rel="shortcut icon" href="/assets/favicon.ico"></head>
<h2>Introduction</h2>
<p>
In this post we will cover one of the most important transforms of digital processing: the <b>The Discrete Fourier Transform (DFT)</b>.
If you don't know what a transform is, I recommend to read the Introduction of my
<a href="/digital/signal/processing/2020/03/06/z-transform.html" target="_blank">post</a>
about the z transform. If you don't want to read it I'll shortly summarize it:
the goal of most transforms is to apply some mathematical operation on data which then reveals
some information we could not see from its original representation.
<br>
In the case of the Fourier transform a discrete signal is transformed allowing us to see which frequencies the signal comprises.
This may sound insignificant, but it actually offers a whole world of new opportunities.
It it is the essential ingredient of math for many applications such as guitar tuners (in my next post I'll show how to program a guitar tuner
using Python and the DFT), noise filtering, or x-ray image sharpening.
<br>
As the name may suggest it is the discrete equivalent of the Fourier Transform.
Many lectures and courses begin with teaching the "standard" Fourier Transform first and then go over to the discrete ones.
However, for this post you don't need to know anything about the Fourier Transform.
We will start by nearly 0% and then try to derive, understand, and use the DFT.
The only prerequisite is some basic knowledge about linear algebra and complex numbers.
If you know what a matrix is and that \(e^{i2\pi \varphi} = cos(\varphi) + i \cdot sin(\varphi) \), then you are already good to go ;)
</p>
<h2>A different perspective</h2>
<p>
As previously mentioned the DFT can help us to out of which frequencies a discrete signal is made of.
In order to make things a little bit easier we have to regard discrete signals from a new perspective.
When imagining discrete signals, people (or at least I) often think about a bunch of values spread on a time axis:
<img class="center" src="/assets/dft_intro/discrete_signal_example.svg" alt="Simple discrete signal" width="40%">
However, another way of representing signals is using vectors. So, if we have the signal \((2,1)\) the corresponding vector would look like:
<img class="center" src="/assets/dft_intro/vector_2d.svg" alt="2d vector" width="30%">
Whereby each dimension corresponds to a different value in time.
If the signal has more values, we need to increase the number of dimensions of course.
I have to admit that at a certain amount of dimensions it is hard imagine how a vector would look like
(for me this already the case at the humble number of 4 dimensions).
But the general concept of thinking of signals as vectors is clear I hope.
</p>
<p>
The reason why we want to regard signals as vectors is because <b> the DFT can be interpreted as a rotation of a signal/vector!</b>
Using this interpretation has some nice advantages. First, a rotation is something familiar from our everyday lifes.
Second, using rotation matrices it is quite simple to derive the inverse discrete fourier transform.
</p>
<button class="accordion"><b>Click here for a fancy gif</b></button>
<div class="panel">
<img class="center" src="/assets/dft_intro/cat_spin.gif" alt="DFT spin" width="30%">
</div>
<h2> Rotations </h2>
The rotation of vectors can be expressed by multiplying them with a so called <b>rotation matrix</b>. For example,
rotating a 3-dimensional vector around the x-axis is achieved by the following matrix:
$$
R_x =
\begin{pmatrix}
1 & 0 & 0\\
0 & cos(\varphi) & sin(-\varphi) \\
0 & sin(\varphi) & cos(\varphi) \\
\end{pmatrix}
$$
You can also combine multiple rotation matrices to achieve an arbitrary rotation.
Using the sliders you can rotate the chciken box and corresponding vectors.
<div class="slidecontainer">
\(\varphi_x\) :<input type="range" min="0" max="150" value="0" class="slider" id="rotXSlider" onchange="updateMatrix(this.value)"><br>
\(\varphi_y\) :<input type="range" min="0" max="150" value="0" class="slider" id="rotYSlider" onchange="updateMatrix(this.value)"><br>
\(\varphi_z\) :<input type="range" min="0" max="150" value="0" class="slider" id="rotZSlider" onchange="updateMatrix(this.value)">
</div>
<div id="scene3d"></div>
The resulting rotation matrix looks like follows:
<div id="rotMatMath">
$$R_{xyz} = \begin{pmatrix}
1.00 & 0.00 & 0.00\\
0.00 & 1.00 & 0.00 \\
0.00 & 0.00 & 1.00 \\
\end{pmatrix}$$
</div>
<p>
As the DFT is some kind of rotation we need to explore the characteristics of rotation matrices as a next step.
Because which mathematical properties does a rotation matrix have?<br>
The answer is kind of intuitive: A rotation shall not change then length of a vector
and the angle between two vectors has to remain after a rotation. You can also see this at the example.
No matter how you change the angles, the vectors of the chciken box keep their length and their relative angles.
Expressing this mathematically will lead us to the following relationship:
$$R^{-1} = R^T$$
This means that <b>the inverse of the rotation matrix exists and it's simply the matrix transposed</b> (this is really nice)!
Usually inverting arbitrary matrices is a really cumbersome job as not every matrix can be inverted
and special algorithms are required. Transposing in contrast is basically one of the simplest things to do.
Just mirror the elements at the diagonal and you are done.
Note, that this concept is not only valid for 3 or 2, but an abritrary number of dimensions.
If you want to to see the derivation:
</p>
<button class="accordion"><b>Derivation for \(R^{-1}=R^T\)</b></button>
<div class="panel">
The statement of not changing lengths and angles can be expressed by the dot product.
Assume we have two vectors \(v_1\) and \(v_2\). Their dot product results in:
$$ \langle v_1 , v_2 \rangle = |v_1| \cdot |v_2| \cdot cos(\varphi) $$
Where \(|v|\) is the lenghth of the vector and \(\varphi\) their relative angle.
If we rotate a vector, the dot product should not change as length and relative angle remain the same.
So:
$$ \langle Rv_1, Rv_2 \rangle = \langle v_1, v_2 \rangle$$
Using basic math we can reformulate this to:
$$ \langle v_1, R^T Rv_2 \rangle = \langle v_1, v_2 \rangle$$
This means that multiplying the rotation matrix with itself but transposed has to result in the identity matrix:
$$R^T R = I_1$$
But this is also the definition of the inverse matrix, so we can write:
$$R^{-1}=R^T$$
</div>
<br><br>
<p>
From this property we can derive a further ones: <b>The rows of a rotation matrix are perpendicular/orthogonal</b>.
To be more exact: They are orthonormal (orthogonal and have a length of 1).
Therefore their dot products are 0! Except you take the dot product of row with itself.
The derivation can be found here:
</p>
<button class="accordion"><b>Derivation for orthogonality</b></button>
<div class="panel">
We know that (with \(r_i\) being a row):
$$
R R^T =
\begin{pmatrix}
& r_1 & \\
\hline
& r_2 & \\
\hline
& ... & \\
\end{pmatrix}
\cdot
\left(
\begin{array}{c|c|c}
& & \\
r_1^{T} & r_2^{T} & ...\\
& & \\
\end{array}
\right)
=
\begin{pmatrix}
1 & 0 & ...\\
0 & 1 & ...\\
... & ... & ...\\
\end{pmatrix}
= I_1
$$
This means that:
$$
r_i \cdot r_j^T = \langle r_i, r_j \rangle =
\begin{cases}
0 & i \neq j\\
1 & i = j\\
\end{cases}
$$
Different rows are orthogonal as the dot product between them is 0.
Furthermore, as the length of each row is 1 they are also orthonormal.
</div>
</p>
<h2>The DFT</h2>
<p>
As a next step, we want to show that the DFT can be interpreted as a complex rotation.
So, let's take a look at the definition of the DFT which you can find in any textbook or online article:
$$ X(n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$
Basically we have our discrete signal \(x(k)\) with a length \(N\).
This signal is multiplied with some expression and then summed up.
The transformed signal is \(X(n)\) (this domain is also called frequency domain) and has the same number of elements
as its time domain counterpart \(x(k)\). Note, that the common convention is to represent frequency domain by capital letters.
The nice thing about this formula is that we can also represent it by a matrix multiplication:
$$
\begin{pmatrix}
X(0) \\ X(1) \\ ..
\end{pmatrix}
=
\begin{pmatrix}
e^{-i\frac{2\pi}{N}(0 \cdot 0)} & e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\
e^{-i\frac{2\pi}{N}(0 \cdot 1)} & e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\
... & ... & ...\\
\end{pmatrix}
\cdot
\begin{pmatrix}
x(0) \\ x(1) \\ ...
\end{pmatrix}
$$
If that matrix is a rotation matrix then it should be really easy to also define the inverse transform.
Since \(R^{-1} = R^T\) applies for rotation matrices, we can simply do the following trick:
$$X = R \cdot x $$
$$R^T \cdot X = x$$
So if we are in frequency domain, we multiply the tranposed matrix to get back in time domain!
</p>
<p>
To prove that the matrix is a rotation matrix we will check if the rows are orthogonal using the dot product
(see <i>Derivation for orthogonality</i> above to understand why we can use the dot product here):
$$
\langle r_n, r_m \rangle = r_n \cdot r_m^T = \sum_{k=0}^{N-1} e^{-i\frac{2\pi}{N}km} \cdot e^{+i\frac{2\pi}{N}kn}
= \sum_{k=0}^{N-1} e^{i\frac{2\pi}{N}k(n-m)}
$$
Note, that if you tranpose a complex vector the values are complex conjugated (sign of the imaginary part switches).
Next we have to distinguish between two cases. <br>
First we have \(m \neq n\) (two different rows).<br>
Using the following formula (this formula is also used for the <a href="https://en.wikipedia.org/wiki/Geometric_series#Sum" target="_blank">
derivation of the geometric series</a>):
$$
\sum_{k=0}^{N-1} x^k = \frac{x^N-1}{x-1}
$$
We get:
$$
\sum_{k=0}^{N-1} e^{i\frac{2\pi}{N}k(n-m)} = \frac{e^{i2\pi(n-m)}-1}{e^{i\frac{2\pi}{N}(n-m)}} = 0
$$
As the row dot product of two different rows is zero, we know that the they are orthogonal. Nice!<br>
Now we have to check the dot product of a row with itself (\(n=m\)) which can be interpreted as the squared length of the row vector:
$$
\sum_{k=0}^{N-1} e^{0} = \sum_{k=0}^{N-1} 1 = N
$$
As we can see the squared length is not \(1\) but \(N\).
This means that the length of the row vector is \(\sqrt{N}\) times larger
than we need to it be for orthonormality (right now it is only orthogonal).
We can deal with this by simply putting a <b>normalization factor</b> in front of the DFT:
$$ X(n)= \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$
</p>
<p>
The length of a row vector is now \(1\) using the normalization factor. In literature this is sometimes called the <b>Normalized DFT (NDFT)</b>.
While this seems more convient from a geomtrical point of view
the version without a scaling factor is the standard DFT for digital signal processing.<br>
Why?<br>
According to this <a href="https://www.dsprelated.com/freebooks/mdft/Normalized_DFT.html"> site </a> the main reason for omitting
the scaling factor is to save computations when calculating the DFT.
This makes perfectly sense as digital processing is often about optimizing certain mathematical algorithms.
We will use to the DFT without the scaling since following conventions is always a good thing to do.
Furthermore, having only an orthogonal but not orthonormal matrix does not really have to bother us as the next section will show.
</p>
<h2>The inverse DFT</h2>
<p>
In this section we will derive the <b>inverse DFT</b>. In many applications you want to transform your signal to the frequency domain,
do some processing, and then transform it back to time domain.
This does of course require an inverse transformation. <br>
As already explained above, we can simply take the inverse matrix of the DFT to get back to time domain.
Using the non-normalized DFT (which is the default DFT)
the lack of orthonormality first looks like an issue because we cannot simply use \(R^T=R^{-1}\).
But we can use a little trick and factor out a \(\sqrt{N}\) and make it normal.
Starting with the DFT as a matrix representation:
$$
\begin{pmatrix}
X(0) \\ X(1) \\ ..
\end{pmatrix}
=
\begin{pmatrix}
e^{-i\frac{2\pi}{N}(0 \cdot 0)} & e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\
e^{-i\frac{2\pi}{N}(0 \cdot 1)} & e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\
... & ... & ...\\
\end{pmatrix}
\cdot
\begin{pmatrix}
x(0) \\ x(1) \\ ...
\end{pmatrix}
$$
We can factor out a \(\sqrt{N}\):
$$
\begin{pmatrix}
X(0) \\ X(1) \\ ..
\end{pmatrix}
= \sqrt{N}
\begin{pmatrix}
\frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 0)} & \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 1)} & ...\\
\frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(0 \cdot 1)} & \frac{1}{\sqrt{N}} e^{-i\frac{2\pi}{N}(1 \cdot 1)} & ...\\
... & ... & ...\\
\end{pmatrix}
\cdot
\begin{pmatrix}
x(0) \\ x(1) \\ ...
\end{pmatrix}
$$
Now the matrix is orthonormal so we can use the relation \(R^T = R^{-1}\) to obtain:
$$
\frac{1}{N}
\begin{pmatrix}
e^{+i\frac{2\pi}{N}(0 \cdot 0)} & e^{+i\frac{2\pi}{N}(1 \cdot 0)} & ...\\
e^{+i\frac{2\pi}{N}(1 \cdot 0)} & e^{+i\frac{2\pi}{N}(1 \cdot 1)} & ...\\
... & ... & ...\\
\end{pmatrix}
\cdot
\begin{pmatrix}
X(0) \\ X(1) \\ ..
\end{pmatrix}
=
\begin{pmatrix}
x(0) \\ x(1) \\ ...
\end{pmatrix}
$$
This leads to the following
definition of the <b>inverse DFT</b>:
$$ x(k)= \frac{1}{N} \sum_{n=0}^{N-1} X(n) \cdot e^{+i\frac{2\pi}{N}kn} $$
This definition of the inverse DFT is actually quite similar to the DFT.
We just have to change the sign of exponent and divide by \(N\).
</p>
<p>
At this point we covered the definition of the DFT, how it can be interpreted as an n-dimensional complex rotation
and how the inverse DFT can be derived using this interpretation.
But we haven't covered yet what we actually gain from the DFT.
Because why do we actually do this?<br>
The next section will hopefully answer this question as we explore why the DFT lets us "see" frequencies in signals.
</p>
<h2>The DFT as a basis</h2>
In this section we show that the complex exponential function as it is used by the DFT can be regarded as a basis.
This means just by adding and scaling the complex exponential function we can reconstruct any signal.
It is therefore some kind of mathematical atom or lego brick.
Just by combining them in the right way we can build arbitrary structures.
To get an understanding of this we will now look at the inverse DFT and its components and rearrange it as follows:
$$
\frac{1}{N}
\begin{pmatrix}
\color{red}{e^{+i\frac{2\pi}{N}(0 \cdot 0)}} & \color{blue}{e^{+i\frac{2\pi}{N}(1 \cdot 0)}} & ...\\
\color{red}{e^{+i\frac{2\pi}{N}(1 \cdot 0)}} & \color{blue}{e^{+i\frac{2\pi}{N}(1 \cdot 1)}} & ...\\
\color{red}{...} & \color{blue}{...} & ...\\
\end{pmatrix}
\cdot
\begin{pmatrix}
\color{red}{X(0)} \\ \color{blue}{X(1)} \\ ..
\end{pmatrix}
$$
$$
=
\frac{1}{N} \cdot
\color{red}{
X(0) \cdot
\begin{pmatrix}
e^{+i\frac{2\pi}{N}(0 \cdot 0)} \\
e^{+i\frac{2\pi}{N}(1 \cdot 0)} \\
...
\end{pmatrix}}
+
\frac{1}{N} \cdot
\color{blue}{
X(1) \cdot
\begin{pmatrix}
e^{+i\frac{2\pi}{N}(1 \cdot 0)} \\
e^{+i\frac{2\pi}{N}(1 \cdot 1)} \\
...
\end{pmatrix}}
+
...
=
\begin{pmatrix}
x(0) \\ x(1) \\ ...
\end{pmatrix}
$$
As you can see the DFT coefficients \(X(0), X(1), ... \) basically scale the vectors which are the columns of the inverse DFT matrix.<br>
So basically, these columns are the mathematical lego bricks and the DFT coefficients are the instruction manual telling us
how to combine them to obtain the time discrete signal.
It's really important to keep this in mind as this is one of the key points of the DFT.</p>
<p>
In the following we look at mathematical lego bricks more in detail as this will reveal some interesting secrets.
Let's assume we have a DFT of the size 4 as an example. Then we would get the following matrix:
$$
\begin{pmatrix}
e^{+i\frac{2\pi}{4}(0 \cdot 0)} & e^{+i\frac{2\pi}{4}(1 \cdot 0)} & e^{+i\frac{2\pi}{4}(2 \cdot 0)} & e^{+i\frac{2\pi}{4}(3 \cdot 0)} \\
e^{+i\frac{2\pi}{4}(0 \cdot 1)} & e^{+i\frac{2\pi}{4}(1 \cdot 1)} & e^{+i\frac{2\pi}{4}(2 \cdot 1)} & e^{+i\frac{2\pi}{4}(3 \cdot 1)} \\
e^{+i\frac{2\pi}{4}(0 \cdot 2)} & e^{+i\frac{2\pi}{4}(1 \cdot 2)} & e^{+i\frac{2\pi}{4}(2 \cdot 2)} & e^{+i\frac{2\pi}{4}(3 \cdot 2)} \\
e^{+i\frac{2\pi}{4}(0 \cdot 3)} & e^{+i\frac{2\pi}{4}(1 \cdot 3)} & e^{+i\frac{2\pi}{4}(2 \cdot 3)} & e^{+i\frac{2\pi}{4}(3 \cdot 3)} \\
\end{pmatrix}
=
\begin{pmatrix}
\color{#338cff}{1} & \color{#3d72b8}{1} & \color{#3c5f8c}{1} & \color{#374f6e}{1} \\
\color{#338cff}{1} & \color{#3d72b8}{i} & \color{#3c5f8c}{-1} & \color{#374f6e}{-i} \\
\color{#338cff}{1} & \color{#3d72b8}{-1} & \color{#3c5f8c}{1} & \color{#374f6e}{-1} \\
\color{#338cff}{1} & \color{#3d72b8}{-i} & \color{#3c5f8c}{-1} & \color{#374f6e}{j} \\
\end{pmatrix}
$$
If we depict the real part of each of each column as a function, we get the following graph
(remember: \(Re\{ e^{i \frac{2\pi}{N}kn} \} = cos( \frac{2\pi}{N}kn ) \) ):
<img class="center" src="/assets/dft_intro/dft_cosine.svg" alt="DFT cosine decomposition" width="70%">
Note, that \(k\) is used as a reel (non-discrete) number in the graph to emphasize the underlying function.
This means that every signal includes some cosine waves oscillating at different frequencies
(and a DC component which is the straight line and basically a cosine with 0Hz).
Of course there is also an imaginary part \(j \, sin(2\pi k)\), but in many cases this can be omitted as will be shown later.
As the DFT coefficients scale these cosines, we can tell which frequencies are apparent in a signal simply looking at the DFT coefficients!
This is why the DFT lets us see frequncies in Signals!
</p>
<p>
Since nothing explains things better than examples, we will now take a look at some example signals and their corresponding DFTs.
<h2>Example #1: Symmetry of Coefficients</h2>
Let's start with some really basic sampled cosine signal of length 8:
<img class="center" src="/assets/dft_intro/dft_example_signal_1.svg" alt="Simple discrete signal" width="60%">
$$x(k)=(2, 1+2^{-0.5}, 1, 1-2^{-0.5}, 0, 1-2^{-0.5}, 1, 1+2^{-0.5})$$
$$x(k)=1+cos(\frac{2\pi}{8}k) $$
If we now calculate the DFT coefficients we get the following DFT coefficients:
<img class="center" src="/assets/dft_intro/dft_example_signal_1_comp.svg" alt="DFT of the discrete signal" width="30%">
The basis functions which are scaled by DFT coefficients are referenced by the links.
We can directly confirm the DC component, an oscillation with the frequency \( \frac{2\pi}{8} k \), and one oscillation with the frequency \(\frac{7 \cdot 2\pi}{8} k\).<br>
Wait, one with a frequency of \(\frac{7 \cdot 2\pi}{8} k\)?!<br> Shouldn't there be only one oscillation at \( \frac{2\pi}{8} k \)?
The short answer is: There are in fact two complex oscillations, but the imaginary parts cancel resulting only in one reel oscillation.
This is also a pretty important thing to know, so let's dive a little deeper.
If we'd reconstruct the signal from the DFT coefficients using the inverse DFT we get the following expression:
$$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i \frac{2\pi}{8} k} + \frac{1}{8} \cdot 4e^{i\frac{7 \cdot 2\pi}{8} k} $$
We can reformulate this to:
$$x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i\frac{2\pi}{8} k} + \frac{1}{8} \cdot 4e^{-i\frac{2\pi}{8} k} $$
$$x(k) = \frac{1}{8} \cdot 8 + \frac{8}{8} \cdot cos(\frac{2\pi}{8} k)$$
As you can see, the two complex oscillations merge to a reel oscillation as the imaginary parts are cancelled out.
This is also quite intuitive if you think about it.
Because how could a reel signal be composed of complex basis functions if the imaginary parts did not cancel out?
</p>
<p>
As a next step let us look at the mathematical reasons why the imaginary party actually cancel out.<br>
First, <b>the columns of the inverse DFT matrix are complex conjugated</b>.
So, for every column there is another column where the exponent has a switched sign.<br>
Second, the <b>DFT coefficients are complex conjugated</b> for real signals such that \(X(n)=X^{*}(N-n)\).
You can also this at the example where we have two "4"s. There is no way of having a 4 and 3.
Note, that this is only the case for real signals not for complex signals.
However, in most cases our signals real and for this whole chapter we will only assume real signals in time domain.
Only for some really fancy stuff you would use complex signals in time domain.
These two reasons lead to many pairs such as (assuming the DFT coefficients also to be reel):
$$X(n) \cdot e^{i\frac{i2\pi}{N}nk} + X(N-n) \cdot e^{i\frac{2\pi}{N}(N-n)k}$$
$$=X(n) \cdot e^{i\frac{i2\pi}{N}nk} + X^{*}(n) \cdot e^{-i\frac{2\pi}{N}nk}$$
$$=2 X(n) \cdot cos(\frac{i2\pi}{N}nk)$$
<p>
At this point let us summarize what we learned from this example.
Basically that are two important things.
First, only one half of the DFT coefficients provide us with information for reel signals. The other half can be reconstructed
by complex conjugation. <br>
Second, for reel signals the imaginary parts of the complex exponential function cancel out. Thus, every reel signal can
be reconstructed by using cosine waves!
</p>
<button class="accordion"><b>Derivation for complex conjugated DFT coefficients and columns</b></button>
<div class="panel">
First, the proof that the DFT coefficients are complex conjugated
$$
X(N-n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}k(N-n)} \\
=\sum_{k=0}^{N-1} x(k) \cdot e^{+i\frac{2\pi}{N}kn} \cdot e^{-i\frac{2\pi}{N}kN} \\
=\sum_{k=0}^{N-1} x(k) \cdot e^{+i\frac{2\pi}{N}kn} \\
=X^{*}(n)
$$
Note, that the last step is only valid for reel \(x(k)\).<br>
Using the same approach we can also show that the coloumns of the inverse DFT matrix are complex conjugated.
Because in the proof above we also showed that the rows of the DFT matrix are complex conjugated.
As the inverse DFT matrix is just a tranposed DFT matrix, the columns have to be complex conjugated!
</div>
<h2>Example #2: Complex DFT coefficients</h2>
So far we only considered real DFT coefficients.
However, they can also be complex (even for real time domain signals).
But what is the meaning of \(1+i\) as a DFT coefficient?<br>
To understand this we will take a look at another example signal which looks like follows:
<img class="center" src="/assets/dft_intro/dft_example_signal_2.svg" alt="Simple discrete signal" width="60%">
This is basically the same signal as in the previous example just a little bit shifted in time:
$$x(k)=1+cos(\frac{2\pi}{8}k + \frac{\pi}{4})$$
The corresponding DFT coefficients look like this:
$$X(n) = (8, \sqrt{8} + i \sqrt{8}, 0, 0, 0, 0, 0, \sqrt{8} - i \sqrt{8})$$
The DFT coeffcients are also pretty similar to the previous example, but now the corresponding values are complex!
From this example we can conclude the intuitive assumption that the "complexity" party of the DFT coefficients is somehow related
to how cosine waves have to be shifted in the time domain.
In the following part of this example we will explore how this works mathematically.
<p>
So, I guess most of you know this, but complex numbers can be represented in a cartesian form or in a polar form (thanks to Euler):
<img class="center" src="/assets/dft_intro/complex_cartesian_polar.svg" alt="Cartesian and polar representation of a complex number" width="60%">
Both forms are actually mathematically equal and can be transformed into each other.
For example, the DFT coefficients of this example can also be written as:
$$X(n) = (8, 4 e^{i\frac{\pi}{4}} , 0, 0, 0, 0, 0, 4e^{-i\frac{\pi}{4}} )$$
The advantage of the polar representation is that we can directly see the <b>magnitude</b> part \(r\)
(the length of the vector) and the <b>phase</b> part (the angle of the vector).
Furthermore, from the polar representation we can easily derive, that multiplying two complex numbers
will result in their phases adding:
$$ r_1 \cdot e^{i\varphi_1} \cdot r_2 \cdot e^{i\varphi_2} = r_1 \cdot r_2 \cdot e^{i (\varphi_1 + \varphi_2) }$$
This property of adding phases will be pretty important in the following.<br>
So let's assume that we have our DFT coefficients in a polar form and we use the inverse DFT to reconstruct a signal in time domain.
For the given example this will be:
$$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4e^{i\frac{\pi}{4}} \cdot e^{i \frac{2\pi}{8} k}
+ \frac{1}{8} \cdot 4e^{-i\frac{\pi}{4}} \cdot e^{-i\frac{2\pi}{8} k} $$
$$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 4 \cdot e^{i \frac{2\pi}{8} k + \frac{\pi}{4}}
+ \frac{1}{8} \cdot 4 \cdot e^{-i\frac{2\pi}{8} k - \frac{\pi}{4}} $$
$$ x(k) = \frac{1}{8} \cdot 8 + \frac{1}{8} \cdot 8 \cdot cos(\frac{2\pi}{8} k + \frac{\pi}{4}) $$
As you can see the phase of the DFT coefficient first moves into the complex exponential function and can finally be found as
a time shift of the cosine.
While the magnitude of the DFT coefficients tells us how to scale the cosine waves.
These two statements are pretty important, so make you sure to understand and remember them.
To emphasize their importance, read them again in bold letters ;)
<br>
<br>
<b>The magnitude of DFT coefficient tells us how to scale the corresponding frequency.</b>
<br>
<br>
<b>The phase of a DFT coefficient tells us how to shift the corresponding frequency</b>
</p>
<h2>Applications: The Spectrum</h2>
<p>
At this point we covered most of the important theoretical aspects of the DFT.
Of course there are still some (important) topics left, but let's leave them for the next post and head over
to some application.
Because what is the purpose of all this stuff if we don't apply it?<br>
Whenever you obtain some results, displaying them in meaningful way is at least as important as the results themselves.
A great way of displaying DFT coeffcients are so called <b>spectrums</b>.
These spectrums can be further classified as <b>magnitude spectrums</b> or <b>phase spectrums</b>.
</p>
<p>
A magnitude spectrum can be created by plotting the absolute values of the DFT coefficients \(|X(n)|\) on a y-axis
and the coefficient indicies \(n\) on an x-axis.
An example can look like this magnitude spectrum:
<img class="center" src="/assets/dft_intro/a440_dft.svg" alt="Playing an 'a' on a guitar for 1s" width="80%">
It is the spectrum of a 1 second sound where I played a note on my guitar.
Note, that as most signals are real, only one half the DFT coefficients are often displayed.
Furthermore, it is more common to plot the frequencies affected by the DFT coefficients instead of the coeffcient indices on the x-axis.
You find the affected frequencies by multiplying \(n\) with \(f_s/N\) where \(f_s\) is the sampling frequency and N is the number of samples.
In my guitar example the sound was sampled 44100 times per second which results in \(f_s = 44100\).
As the signal is 1 second long, the number of samples is 44100.
This results in 22050 independent DFT coeffcients giving us frequencies from 0Hz to 22049Hz in 1Hz steps.
The cool thing about magnitude spectrums is that we only use the absolute value of the DFT coefficients.
Thus, we can easiliy see which frequencies are apparent in a time discrete signal!
If we zoom into the guitar example we can see that I played the tone "A" at 440Hz:
<img class="center" src="/assets/dft_intro/a440_dft_zoomed.svg" alt="Playing an 'a' on a guitar for 1s zoomed in" width="80%">
Just by looking at the spectrum you can also tell that I probably played a string instrument as there are many overtones at multiples of 440Hz.
But things get even more interesting: You can tell that I don't live in the United States or Canada and that I own shitty equipment just by looking at the spectrum!
If you zoom even further you can see a peak at 50Hz:
<img class="center" src="/assets/dft_intro/a440_dft_zoomed_more.svg" alt="50Hz" width="80%">
You can see this peak thanks to the german 50HZ AC network which induces a 50Hz noise signal called mains hum.
If I'd live in the US or Canada, this peak would probably be at 60Hz.<br>
And things get even crazier. Just by using the spectrum you cannot only tell where I was but als when!
As I surfed through the endless realms of the Internet I found a Wikipedia article about
<a href="https://en.wikipedia.org/wiki/Electrical_network_frequency_analysis">electrical network frequency analysis</a> which cited this
newspaper <a href="https://www.sueddeutsche.de/muenchen/landeskriminalamt-dem-verbrechen-auf-der-spur-1.1061222-2" target="_blank">article</a>.
According to the article the german AC network frequency varies between 49.95Hz and 50.05Hz which obviously also affects the frequency of the main hum.
The trick is to have a quite long audio recording and to estimate the varying mains hum frequency over time.
(FYI: A minimum of 100s is needed for a frequency resolution of 0.01Hz. This can be calculated by using \(f_s/N\) as stated above.)
By comparing the noise signal with a network frequency database you may obtain when the audio signal was recorded.
This is the reason why the bavarian police records the network frequency 24/7 as of 2010.<br>
Of course you have to create multiple spectrums over time and see how the frequency changes. So, if you have a 10 minute audio recording,
you may several 100s intervals and determine their spectrum.
When doing this we rather refer to a <b>short-time Fourier Transform (STFT)</b> and as our spectrum varies over time we rather call it a <b>spectrogram</b>.
</p>
<p>
You may want to create some spectrograms/spectrums yourself, so take a look a this little Javascript application I wrote:
</p>
<canvas class="center" onclick="start_dft()" width="700px" height="300px" id="canvasCtx"></canvas><br>
<script src="/assets/fft.js">/*cool dft demo*/</script>
<p>
If you click on the "Start" button, a DFT is applied on the audio signal captured from your devices microphone.
Note, that everything is processed locally and no data is sent to some fishy servers (I'm cooler than google, lol).
The DFT coeffcients are displayed as a magnitude spectrum giving you information about how much of each frequency can be found in the signal.
Try to sing or play some tones if you have an instrument.
You will see peaks in the spectrum for the corresponding pitches which may include some overtones.
This does not work for mayonnaise as it is not an instrument...
</p>
<p>
As mentioned above, there are also so-called phase spectrums.
To create a phase spectrum we use the same principles for the x-axis as explained above.
But the y-axis now represents the phase of the DFT coefficients \(arg\{X(n)\}\).
However, with the topics covered in this article I could not find a good use-case for it :(
If you know some application, please don't hesitate to contact me.
</p>
<h2>Summary</h2>
Congrats, you made it that far :) <br>
You should now have a basic knowledge of how the DFT works and how it can be applied.
Yet there is still a lot of things to discover about the DFT which will be covered in my next posts.
But for now let us summarize the most important aspects of this post:
<ul>
<li>The Discrete Fourier Transform (DFT) is the discrete counterpart of the Fourier Transform and is described by:
$$ X(n)=\sum_{k=0}^{N-1} x(k) \cdot e^{-i\frac{2\pi}{N}kn} $$ </li>
<li>The DFT can be interpreted as complex rotation of the time discrete signal, thus the inverse transform can be derived as:
$$ x(k)= \frac{1}{N} \sum_{n=0}^{N-1} X(n) \cdot e^{+i\frac{2\pi}{N}kn} $$
</li>
<li>Every time discrete signal can be reconstructed using shifted and scaled complex oscillations.</li>
<li>The DFT coefficients tell us how these osciallations have to be scaled (magnitude of the coefficients)
and shifted (phase of the coeffcients).
</li>
<li>In the common case of real time discrete data only one half of the DFT coeffcients provides with us with useful information as the other half
is the complex conjugated counterpart.</li>
<li>For real signals the complex oscillations break down to simple cosine waves.</li>
<li>By using magnitude spectrums we can visualize the DFT coeffcients in meaningful way telling us how much of each frequency
is in a time discrete signal on a first glance.
</li>
<li>There are millions of applications for the DFT reaching from guitar tuning to criminal investigations.</li>
</ul>
Please let me know if there are any mistakes or how this Introduction can be improved.
You can do this by writing an e-mail to me (see <a href="/about/">About</a>).
<script>
function updateMatrix(val) {
var mathString="R_{xyz} = \\begin{pmatrix}";
for (i=0; i<3; i++) {
for (j=0; j<3; j++) {
mathString += parseFloat(String(rotMat.elements[4*i+j])).toFixed(2) + " & ";
}
mathString = mathString.slice(0,-2);
mathString += "\\\\"
}
mathString += "\\end{pmatrix}";
var math = MathJax.Hub.getAllJax("rotMatMath")[0];
MathJax.Hub.Queue(["Text",math,mathString]);
}
</script>
<script>
var acc = document.getElementsByClassName("accordion");
var i;
for (i = 0; i < acc.length; i++) {
acc[i].addEventListener("click", function() {
this.classList.toggle("active");
console.log(this);
var panel = this.nextElementSibling;
if (panel.style.display === "block") {
panel.style.display = "none";
} else {
panel.style.display = "block";
}
});
}
</script>
<script>
var canvas = document.querySelector("#canvasCtx");
var canvasCtx = canvas.getContext("2d");
var WIDTH = canvas.width;
var HEIGHT = canvas.height;
canvasCtx.fillStyle = '#a8b2bf';
canvasCtx.fillRect(0, 0, WIDTH, HEIGHT);
canvasCtx.strokeStyle = 'rgb(0, 0, 0)';
canvasCtx.fillStyle = 'rgb(0, 0, 0)';
canvasCtx.font = "30px Arial";
canvasCtx.textAlign = "center";
canvasCtx.fillText("Click here to start the DFT", WIDTH*0.5, HEIGHT*0.5);
</script>
<script src="/assets/dft_intro/rotate_mat_webgl.js"></script>/* Tooltip container */ .tooltip { position: relative; display: inline-block; }Z-Transform: Stability2020-03-26T10:55:44+01:002020-03-26T10:55:44+01:00http://localhost:4000/digital/signal/processing/2020/03/26/z-stability<style type="text/css">
/* Tooltip container */
.tooltip {
position: relative;
display: inline-block;
}
/* Tooltip text */
.tooltip .tooltiptext {
visibility: hidden;
width: 300px;
background-color: grey;
color: #fff;
text-align: center;
padding: 10px;
border-radius: 6px;
/* Position the tooltip text - see examples below! */
position: absolute;
z-index: 1;
}
/* Show the tooltip text when you mouse over the tooltip container */
.tooltip:hover .tooltiptext {
visibility: visible;
}
.left-align{
text-align: left!important;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
width: 50%;
}
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
}
.active, .accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
display: none;
background-color: #ccc;
overflow: hidden;
}
li{
margin: 12px 0;
}
</style>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>
<head><link rel="shortcut icon" href="/assets/favicon.ico"></head>
<h2>Introduction</h2>
<p><i>Knowledge required for this post:
<a href="/digital/signal/processing/2020/03/06/z-transform.html">z-transform introduction</a> (transfer functions, impulse responses)</i></p>
<p>This the second post of my series about the z-transform.
While the <a href="/digital/signal/processing/2020/03/06/z-transform.html">first post</a> was more like an introduction,
this post will cover one of the z-transform's applications:
checking if an LTI system is stable.
</p>
<p>
When we talk about stability we want to refer to the so called <b>BIBO (bounded-input, bounded-output) stability</b>.
As the name already implies, a BIBO stable system should respond with a bounded output to a bounded input.
One crucial thing to keep in mind is that bounded input refers to <b>all</b> possible inputs.
It is not sufficient to take only one input and show that the system is stable for this particular input.
<br>
An example for a unstable system is the well-known audio feedback.
If you don't correctly arrange your microphones and speakers when planning a concert,
only a small sound (bounded-input) at the microphone is enough to create a loop feedback and thus an extremely annoying squealing.
Luckily the system stops being linear at some point, saving your equipment from becoming a black hole.
<br>
To prevent such desasters, it would be interesting to know if and when a system becomes unstable in advance.
And you might have guessed it: the z-transform is the right tool to obtain this information.
In the next sections, we will use the z-transform to show if a discrete LTI is stable just by looking at the transfer function!
</p>
<h2>Definition</h2>
<p>
As always let's look at some definitions to get a better understanding of stability and boundedness.
By reading textbooks or Wikipedia articles you can find out that a function \(f(k)\) is bounded if there exists a constant \(c\)
such that:
$$ |f(k)|\leq c < \infty \quad \text{for all } k $$
So, if our system is stable, we can find a constant for the input \(x(k)\):
$$|x(k)|\leq c_x < \infty \quad \text{for all } k $$
And for the output \(y(k)\):
$$|y(k)|\leq c_y < \infty \quad \text{for all } k$$
Let's look at an illustrative example since this might be confusing at first:
<img src="/assets/ztrafo_stability/sine_wave.svg" alt="Sine wave" class="center" style="width:80%;margin-top:20px;margin-bottom:20px">
In the figure above, you can see a sine wave \(f(k)=sin(k\cdot 2 \pi)\) for which we can easily find a constant such that:
$$|f(k)| \leq c=1$$
This is illustrated by the grey "tube" which encapsulates the function.
Therefore, this function is bounded.<br>
An unbounded example is \(f(k)=e^{k} \cdot sin(k \cdot 2 \pi)\):
<img src="/assets/ztrafo_stability/exponential.svg" alt="Sine wave" class="center" style="width:80%;margin-top:20px;margin-bottom:20px">
Obviously it is not possible to find a "tube" which encsapsulates the function.
</p>
<p>
With this definition in my mind we will now look at discrete LTI systems.
As shown in the first post the input-output relation can be represented by convoluting the input \(x(k)\) with
a system's impulse response \(h(k)\):
$$ y(k) = h(k) * x(k) = \sum_{n=-\infty}^{\infty} h(n) \cdot x(k-n)$$
</p>
<p>
The question we now want to answer is: If we know that the input \(x(k)\) is bounded,
which requirements have to be fullfilled by the impulse response \(h(k)\) to get a bounded output \(y(k)\)? <br>
The simple answer is: if the impulse is bounded then output is bounded as well.<br>
Let's try to show this mathematically starting with the absolute values of the input-output value relations
(you can also find this proof at the corresponding <a href="https://en.wikipedia.org/wiki/BIBO_stability#Proof_of_sufficiency">Wikipedia article</a>):
$$
|y(k)| = |\sum_{n=-\infty}^{\infty} h(n) \cdot x(k-n)|
$$
Using the triangle inequality:
$$|y(k)| \leq \sum_{n=-\infty}^{\infty} |h(n)| \cdot |x(k-n)|$$
Since we assumed that our input is bounded, we can substitute it by a constant:
$$|y(k)| \leq \sum_{n=-\infty}^{\infty} |h(n)| \cdot c_x$$
$$|y(k)| \leq c_x \cdot \sum_{n=-\infty}^{\infty} |h(n)| $$
In order, to have a bounded \(y(k)\) we require \(h(k)\) to be bounded:
<void>$$ c_h = \sum_{n=-\infty}^{\infty} |h(n)| < \infty $$</void>
And that's mostly it: if the absolute sum of the impulse response is finite then we can guarantee that the system is stable.
<p>Note, that we used the triangular inequality which leads to an upward estimation.
At this point we can say that the finite sum implies stability, but we cannot say that stability implies a finite sum.
However, there is a proof that this implication goes both ways (I found one in
<i>Digital Signal Processing: Principles, Algorithms & Applications (3rd Ed.): pages 88-89</i>)
But we don't want to get lost in proofs, so please refer to the book if you want to get every piece of knowledge there is ;)
</p>
<h2>Applying the z-transform</h2>
<p>As promised we will use the transfer function (and therefore the z-transform) to determine if a system is stable or not.
But first, we have to introduce a few new terms starting with the so called <b>Region Of Convergence (ROC)</b>.
In order to understand this term we will use the following example system:
<img src="/assets/ztrafo_stability/recursive_simple.svg" alt="Sine wave" class="center" style="width:50%;margin-top:20px;margin-bottom:20px">
This system can be described by the following transfer function:
$$Y(z) = Y(z) \cdot 0.5z^{-1} + X(z)$$
$$H(z) = \frac{1}{1-0.5z^{-1}}$$
If we want to be able to get the impulse response from this transfer function, we need to use the geometric series:
<v>$$\sum_{n=0}^{\infty} q^k = \frac{1}{1-q} \quad \text{for } |q|<1$$</v>
Using \(q=0.5z^{-1}\) results in:
<v>$$H(z) = \frac{1}{1-0.5z^{-1}} = \sum_{n=0}^{\infty} (0.5)^k \cdot z^{-k} \quad \text{for } |0.5z^{-1}|<1$$</v>
We can rearrange the condition to:
<v>$$0.5 < |z|$$</v>
So, only for \(0.5 < |z|\) the geometric series and therefore the transfer function and converges!
This area is the region of convergence.
If we assume the variable \(z\) to be complex, the region of convergence can be depicted with an Argand diagram:
<img src="/assets/ztrafo_stability/roc_causal.svg" alt="Sine wave" class="center" style="width:30%;margin-top:20px;margin-bottom:20px">
The y-axis represents the imaginary part of \(z\) while the x axis represents the real part.
The blue area is the region of convergence for \(H(z)\).
Note, for \(0.5 \geq |z|\) the transfer function diverges.
This means that in the grey area \(H(z)\) is either \(\pm \infty\) or it "jumps" around like the series \((..., -1, 1, -1, 1, ...)\) </p>
<p>
So, how can we use this knowledge for determining the stability?
Let's do some math with the transfer function and try to find a relation to the stability condition:
$$ |H(z)| = |\sum_{k=-\infty}^{\infty}h(k)z^{-k}| \leq \sum_{k=-\infty}^{\infty}|h(k)z^{-k}| = \sum_{k=-\infty}^{\infty}|h(k)||z^{-k}| $$
Choosing \(|z|=1\) results in:
$$ |H(|z|=1)| \leq \sum_{k=-\infty}^{\infty}|h(k)|$$
As you can see the right term is actually the necessary condition for stability.
This means, if a system is stable then the z-transform converges for \( |z|=1 \).
We can also express this as: </p>
<p><b>If a system is stable, the unit circle is included in the region of convergence.</b></p>
The reverse statement is also true (at least according to textbooks). But I couldn't find a proof yet.
So, if you have a proof, I'd be happy to hear about it.<br>
If we regard our example, we have a ROC of \(0.5 < |z|\) which means that the system is stable!
However, there is one thing I didn't tell you yet...
</p>
<h2>Hol'up: Ambiguity and Causality</h2>
<p>In the previous section we determined the ROC for the following expression:
$$H(z) = \frac{1}{1-0.5z^{-1}}$$
Which converges for \(|z|>0.5\) to:
$$H(z) = \sum_{k=0}^{\infty} 0.5^k \cdot z^{-k}$$
Using basic math or partial fraction decomposition we can rearrange this to:
$$H(z) = 1 - \frac{0.5}{0.5-z}$$
Which converges for \(|z|<0.5\) to
$$H(z) = 1- 0.5 \cdot \sum_{k=0}^{\infty} 2^k \cdot z^{k}$$
</p>
<p>This is seems to be really confusing at first. The same transfer functions seems to have multiple impulse responses and multiple ROCs.
So which one is now correct?!</p>
<p>The sad truth about the z-transform is that it is actually ambigous...
Multiple impulse responses can be mapped to the same z-transform.
In the above case the following impulse responses are both valid time domain representations of the transfer function:
<img src="/assets/ztrafo_stability/h1_h2.svg" alt="Impulse responses" class="center" style="width:100%;margin-top:20px;margin-bottom:20px">
There are two important differences between them.<br>
First, \(h_1(k)\) is stable while \(h_2(k)\) is not stable.
As we are trying to show stability or instability it is quite important to know the corresponding impulse response.<br>
The second is that \(h_1(k)\) is <b>causal</b> while \(h_2(k)\) is <b>anticausal</b>.
An anticausal system solely depends on the future values.
This means that its difference equation only includes terms such as \(y(k+2)\) or \(x(k+7)\).
You can directly see this by looking at the impulse response of \(h_2(k)\).
We get a response even tough the we haven't fed in the "1".
As such a system is predicting the future it cannot be used for realtime applications.
In contrast to an anticausal system, a causal system depends only one past values and therefore only includes
terms such as \(y(k-2)\) or \(x(k-3)\) in its difference equation. Such a system is realtime capable.</p>
<p>If we only had the transfer function there would be no way to determine the stability of the underlying system!
Thus, we need more information!
<br>
For our example transfer function, we know that we derived it the following difference equation:
$$y(k)=0.5 \cdot y(k-1) + x(k)$$
We know that our system has to be causal (there are no future value terms).
Therefore \(h_1(k)\) is the corresponding impulse response and our system is stable!
</p>
<p>Let's do a quick summary of this section:
<ul>
<li>The z-transform is ambigous. Multiple impulse responses can have the same z-transform.</li>
<li>If we want to relate a z-transform to an impulse response we need more information such as the ROC.</li>
<li>Causal systems seem to converge for \(|z|>a\) while anticausal systems converge for \(|z| < a\).
We will proove this in the next section.</li>
</ul>
</p>
<h2>Poles and stability</h2>
<p>
In the previous section it was shown how the stability and ROC for one particular system was determined.
But how can we generalize this concept and directly "see" from the transfer function if a system is stable?<br>
The first step is to rearrange the transfer function.
We use the fact that every transfer function we have for an LTI system is basically just a <b>rational function</b>.
So, a function with a numerator polynom and a denominator polynom. For example:
$$H(z) = \frac{z+1}{z^2-5z+6}$$
Using <b>partial fraction decomposition</b> a rational function can be expressed as sum of the following simple terms:
$$\frac{a_i}{(z-d_{i,\infty})^j}$$
Where \(d_{i,\infty}\) is a so called <b>pole</b>. At a pole the denominator is 0.
Using the geometric series, we can now determine for each of the terms if they do converge.
If all the terms converge, our transfer function is stable.
</p>
<p>
As an example let us try to analyze the stability the given system. First we try to determine the poles.
You can simply do this by using the pq-formula or by intensively staring and hoping that your brain comes up with an idea:
$$H(z)=\frac{z+1}{z^2-5z+6} = \frac{z+1}{(z-3) \cdot (z-2)}$$
As you can see from the denominator we have two poles, one at \(d_{1,\infty}=3\) and one at \(d_{2,\infty}=2\).
By doing some further calculations the term can be expressed as:
$$H(z)=\frac{-4}{3-z}+\frac{3}{2-z} $$
Note, we can also represent this as:
$$H(z)=\frac{z^{-1}+z^{-2}}{1-5z^{-1}+6z^{-2}} = \frac{z^{-1}+z^{-2}}{(1-3z^{-1}) \cdot (1-2z^{-1})}
=\frac{1}{6}+\frac{-3/2}{1-2z^{-1}}+\frac{4/3}{1-3z^{-1}}$$
If we want to analyze causal systems, this would be the preferred representation.
It is important to note that this representation has the same poles!
</p>
<p>
As a next step we have to analyze the convergence of the single summands.
Only if all terms converge, the overall transfer function will converge as well.
Assuming we have a causal system the ROC for the summands are \(|z|>3\) and \(|z|>2\).
The area where both of them converge is \(|z|>3\).
If we'd assume an anticausal system, the ROC would be \(|z|<2\).<br>
From this example we can see that <b>the ROC is determined by the poles of the transfer function</b>.
Here is an Argand diagram with the corresponding poles to visualize this very important statement (causality assumed):
<img src="/assets/ztrafo_stability/roc_causal_multipole.svg" alt="Impulse responses" class="center" style="width:30%;margin-top:20px;margin-bottom:20px">
In the diagram the poles are represented by an "x".
You can see that the ROC is determined by the pole with highest absolute value.
Furthermore, this system is not stable as the unit circle is not included in the ROC.<br>
In general, this leads us to the following two important statements:
<p><b>For causal systems: A system is stable if all the poles are within the unit circle</b></p>
<p><b>For anticausal systems: A system is stable if all the poles are outside the unit circle</b></p>
</p>
<h2>Summary</h2>
<p>In this post it was shown how the z-transform can be used to determine if an LTI is stable.
The most important points are:
<ul>
<li>A system is stable if the absolute sum of its impulse response is finite: $$ c_h = \sum_{n=-\infty}^{\infty} |h(n)| < \infty $$</li>
<li>A system is stable if the Region Of Convergence (ROC) of the transfer function includes the unit circle</li>
<li>Unfortunately the transfer function is ambigous. Multiple impulse responses can be mapped to the same transfer function</li>
<li>To resolve the ambiguity we need more information about the underlying system (such as causality or dircetly the ROC)</li>
<li>A causal system depends only on past values while an anticausal system depends only on future values</li>
<li>Causal systems converge for \(|z|>a\) (the ROC includes infinity) while anticausal systems converge for \( |z| < a \) (the ROC includes 0)</li>
<li>With the poles of a transfer function the ROC can easily be determined</li>
<li>For a causal system all poles have to be within the unit circle to ensure stability</li>
<li>For a anticausal system all poles have to be outside the unit circle to ensure stability</li>
</ul>
If you have any feedback please write an e-mail (see <a href="/about/">About</a>) :)
</p>Z-Transform: An Introduction2020-03-06T10:55:44+01:002020-03-06T10:55:44+01:00http://localhost:4000/digital/signal/processing/2020/03/06/z-transform<style type="text/css">
/* Tooltip container */
.tooltip {
position: relative;
display: inline-block;
}
/* Tooltip text */
.tooltip .tooltiptext {
visibility: hidden;
width: 300px;
background-color: grey;
color: #fff;
text-align: center;
padding: 10px;
border-radius: 6px;
/* Position the tooltip text - see examples below! */
position: absolute;
z-index: 1;
}
/* Show the tooltip text when you mouse over the tooltip container */
.tooltip:hover .tooltiptext {
visibility: visible;
}
.left-align{
text-align: left!important;
}
.center {
display: block;
margin-left: auto;
margin-right: auto;
width: 50%;
}
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
}
.active, .accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
display: none;
background-color: #ccc;
overflow: hidden;
}
li{
margin: 12px 0;
}
</style>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>
<h2>Introduction</h2>
<p>In this post we will cover the basics of the z transform which plays an important role for digital signal processing.
We will start with a basic mathematical introduction covering not everything but the most important aspects,
so we can directly head towards the application.
<br>
As the name "z-transform "already suggests we are dealing here with a transform.
A transform will apply some kind of function to some kind of data and therefore map them
to another domain (so basicallly different values and operators).
This often done in the hope that some operators may become easier or that the transformed
data reveals some information which we couldn't see in the original representation.
The z-transform allows us, for example, to analyze the stability or spectral characteristics of digital filters.
</p>
<p>
The concept of a transform may sound abstract at first, so let's start with a very simple example: the logarithmic transform.
A long time ago, when modern pocket calculator's were not invented yet humans used an ancient technology
called <a href="https://en.wikipedia.org/wiki/Slide_rule" target="_blank">slide rules</a> to deal with multiplication and division.
Just by shifting some plastic bars you can calculate some seemingly difficult operations in a few seconds.
When I heard this for the first time I was like: What, how can one multiply with some fancy piece of plastic?
<br>
The mathematical answer is quite simple yet clever.
Imagine we want to calculate the following expression:</p>
<p>$$42 \cdot 23$$</p>
<p>If we apply the logarithm on this equation we can reformulate it as follows:</p>
<p>$$log(42 \cdot 23)=log(42)+log(23)$$</p>
<p>So in the logarithmic domain the multiplication can be represented by an addition! This can easily be implemented by shifting a bar.
Of course we need to transform the result back into the domain where we started. So, most transforms come with an inverse transform.
To invert the logarithm we need to take the base of the logarithm which we used and take it to the power of the result (e.g base 10):
$$
42 \cdot 23 = 10^{log_{10}(42)+log_{10}(23)}
$$
If you want to know more about slide rules I highly suggest
<a href="https://www.youtube.com/watch?v=waiprjueVpQ" target="_blank">this</a> video by James Grime.</p>
<h2>Definition</h2>
<p>
So, let's get down to business and start with the z-transform.
When working with data on computers or digital systems in general we have to work with
finite data. So called <b>discrete time data</b> which is often derived by sampling of contintous data. Take the following image as an example.</p>
<img src="/assets/ztrafo_intro/sampled.svg" alt="A sampled signal" class="center">
<p>The continous (grey) function may be some kind of analog voltage we want to measure. This measurment (or often called sampling)
is done by an analog-to-digital converter and yields the blue dots. These dots can be represented by the following function \(x(k)\):</p>
<p>$$x(k)=\{...,0,x(0),x(1),x(2),x(3),x(4),0,...\}=\{...,0,0,1,2,-0.5,1,0,...\}$$</p>
For the sake of simplicity we assume the function to be 0 at points in time where we didn't measure.
<p>
So how do we now apply the z-transform? Let's take a look at the
<c class="tooltip">definition*:
<span class="tooltiptext">The definition refers to the two-sided z-transform. If we'd start with \(k=0\)
it would be called single-sided z-transform instead.</span>
</c>
$$
X(z)=\mathscr{Z}\{x(k)\}=\sum_{k=-\infty}^{+\infty} x(k) \cdot z^{-k}
$$
If we apply this definition to our example signal, we get:
$$
X(z)= 1 \cdot z^{-1} + 2 \cdot z^{-2} - 0.5 \cdot z^{-3} + 1 \cdot z^{-4}
$$
Applying the z-transform is fairly easy. We just take a value of our discrete signal and then multiply it with a \(z^{-k}\)
whereby k depends on the position. Doing this for values and summing the terms up, results in our transformed signal.
If we are talking about z-transformed signals, we will now correspond to them as being in the <b>z domain</b>.
Similarly we talk about the <b>time domain</b> if our signals are not transformed.
Note, that \(z\) is just variable. In subsequent post we will take a look at how the z-transform behaves for different values of z.
Altough the transform step is pretty easy the result may look confusing at first. Also the usefulness is not quite obvious at this point.
So, don't be scared to read the next section as we will bring light into the darkness step by step.
</p>
<p> If you want to get warm with the z-transform try to transform the the following expressions. Difficulty reaches from easy to hard.</p>
<p>1. \( x(k)=
\begin{cases}
1 & k = 0\\
0 & else\\
\end{cases}\)</p>
<p>2. \( x(k)=
\begin{cases}
1 & k \geq 0\\
0 & else\\
\end{cases}\)
<c class="tooltip"> (hint)
<span class="tooltiptext">Use the geometric series which is defined as: $$ \sum_{k=0}^{\infty}aq^{k}=\frac{a}{1-q} \quad for \, |q|<1 $$</span>
</c></p>
<p>3. \( x(k)=
\begin{cases}
1 & k \in [0,3]\\
0 & else\\
\end{cases}\)
<c class="tooltip"> (hint)
<span class="tooltiptext">Use the geometric series and substitution.</span>
</c></p>
<button class="accordion"><b>Click here to see the solution</b></button>
<div class="panel">
<p>1. $$ X(z)=1 \cdot z^0 = 1 $$</p>
<p>2. $$ X(z)=\sum_{k=0}^{\infty} 1 \cdot z^{-k} = \frac{1}{1-z^{-1}} \quad \text{for} \, |z^{-1}| < 1 $$
Note, that this term only converges for \(|z^{-1}|<1\), or \(|z|>1\). In the later sections we will use the geometric series and
the convergence criterium to analyze the stability of digital filters.
</p>
<p>3. $$ X(z)= \sum_{k=0}^{3} 1 \cdot z^{-k}$$
$$= \sum_{k=0}^{\infty} 1 \cdot z^{-k} - \sum_{k=4}^{\infty} 1 \cdot z^{-k}$$
$$= \sum_{k=0}^{\infty} 1 \cdot z^{-k} - \sum_{l=0}^{\infty} 1 \cdot z^{-(l+3)}$$
$$= \sum_{k=0}^{\infty} 1 \cdot z^{-k} - z^{-3} \sum_{l=0}^{\infty} 1 \cdot z^{-l}$$
$$= \frac{1}{1-z^{-1}} \cdot \frac{z^{-3}}{1-z^{-1}}$$
$$= \frac{1-z^{-3}}{1-z^{-1}}$$
Here we used the geometric and the substitution \(k=l+3\). Again, this term only converges for \(|z|>1\)
</p>
</div>
<h2>How operators transform</h2>
<p>Similar to the logarithmic transform some operators may change or not change when they are transformed.
Knowing how operators transform is really essential knowledge and one of the most important aspects of the z-transform!
What you should definitely know is:
<ul>
<li>Addition remains addition in z domain: \(\mathscr{Z}\{x(k)+y(k)\} = X(z)+Y(z)\)</li>
<li>Scaling a function remains a scaling in z domain: \(\mathscr{Z}\{a \cdot x(k)\} = a \cdot X(z)\)</li>
<li>Time shifting by \(u\) refers to a multiplication of \(z^{-u}\) in z domain: \(\mathscr{Z}\{x(k-u)\} = z^{-u} \cdot X(z)\)</li>
<li>Convolution becomes a multiplication in z domain: \(\mathscr{Z}\{x(k)*y(k)\} = X(z) \cdot Y(z)\)</li>
</ul>
Of course there are plenty of other operators and properties which would go beyond the scope of this post.
But with the 4 mentioned properties, you have all the knowledge you need for the following sections.
A comprehensive overview can be found in the <a href="https://en.wikipedia.org/wiki/Z-transform#Properties" target="_blank">Wikipedia article</a>
about the z-transform.
In the following we will go a little bit more in detail. Try to proof the relations by yourself, for a nice challenge.
If you are doing a speed run, you can skip the rest of this section.</p>
<b>Addition</b>
<p>If we just insert the addition in the definition, we get:
$$ \mathscr{Z}\{x(k)+y(k)\} = \sum_{k=-\infty}^{\infty} (x(k)+y(k)) \cdot z^{-k}$$
$$ = \sum_{k=-\infty}^{\infty} x(k) \cdot z^{-k} + \sum_{k=-\infty}^{\infty} y(k) \cdot z^{-k} = X(z) + Y(z) $$
So, addition remains addition.
Let's take some example:
<img src="/assets/ztrafo_intro/addition.svg" alt="Adding two signals" class="center" style="width:90%;margin-top:20px;margin-bottom:20px">
The corresponding z-transforms can be determined as:
$$A(z) = z^{-1}+z^{-3} \quad B(z)=z^{-1}+2z^{-2}+z^{-3}$$
$$C(z) = A(z)+B(z)=(z^{-1}+z^{-3})+(z^{-1}+2z^{-2}+z^{-3}) = 2z^{-1}+2z^{-2}+2z^{-3}$$
</p>
<b>Scaling</b>
<p>Again, we just insert the scaling in the definition:
$$ \mathscr{Z}\{a \cdot x(k)\} = \sum_{k=-\infty}^{\infty} a \cdot x(k) \cdot z^{-k}$$
$$ = a \cdot \sum_{k=-\infty}^{\infty} x(k) \cdot\ z^{-k} = a \cdot X(z)$$
Let's take a scaling example:
<img src="/assets/ztrafo_intro/scaling.svg" alt="Scaling a signal" class="center" style="width:80%;margin-top:20px;margin-bottom:20px">
Again, we can directly see the result in the z-transform:
$$C(z)=2 \cdot A(z) = 2 \cdot (z^{-1}+z^{-3}) = 2z^{-1}+2z^{-3}$$
</p>
<b>Time shift</b>
<p> Now things get a little bit more interesting. Let's assume we have an example function and we shift it one sample to the positive site:
<img src="/assets/ztrafo_intro/timeshift.svg" alt="Scaling a signal" class="center" style="width:80%;margin-top:20px;margin-bottom:20px">
The resulting z-transforms look as follows:
$$
\mathscr{Z}\{a(k)\} = z^{-1} + z^{-3} \quad \mathscr{Z}\{a(k-1)\} = z^{-2} + z^{-4}
$$
If we take a look at the z-transforms, we can see that the transform of \(x(k-1)\) is just the
transform of \(x(k)\) multiplied with \(z^{-1}\).
It seems like shifting a function can be expressed with a multiplication of a corresponding \(z\)!
Here is the proof (using the substitution \(l=k-a\) and therefore \(k=l+a\) ):
$$ \sum_{k=-\infty}^{\infty}x(k-a) \cdot z^{-k} = \sum_{l=-\infty}^{\infty}x(l) \cdot z^{-l-a}
= z^{-a} \cdot \sum_{l=-\infty}^{\infty}x(l) \cdot z^{-l} = z^{-a} \cdot X(z) $$
</p>
<b>Convolution</b>
<p>If you don't know what a <a href="https://en.wikipedia.org/wiki/Convolution#Discrete_convolution" target="_blank">convolution</a>
is, I highly recommend to read some literature on it, since it is one of the fundamental operators of digital signal processing.
The discrete convolution is defined as:
$$
f * g = \sum_{m=-\infty}^{\infty}f(m) \cdot g(k-m)
$$
So, basically it maps two functions on a third one.
And the really cool thing about the z-transform is that it maps the quite complex convolution to a very simple multiplication!
Here is the proof:
$$\mathscr{Z}\{f*g\} = \sum_{k=-\infty}^{\infty} \sum_{m=-\infty}^{\infty} f(m) \cdot g(k-m) \cdot z^{-k}$$
As a next step we swap the sum signs (the commutativity and associativity of addition allows us to do this):
$$= \sum_{m=-\infty}^{\infty} \sum_{k=-\infty}^{\infty} f(m) \cdot g(k-m) \cdot z^{-k}$$
Using distributivity we can put \(f(m)\) in front of the sum sign:
$$= \sum_{m=-\infty}^{\infty} f(m) \cdot \sum_{k=-\infty}^{\infty} g(k-m) \cdot z^{-k}$$
Now we substitute \(l=k-m\) and rearrange a little bit:
$$= \sum_{m=-\infty}^{\infty} f(m) \cdot \sum_{k=-\infty}^{\infty} g(l) \cdot z^{-l-m}$$
$$= \sum_{m=-\infty}^{\infty} f(m) \cdot z^{-m} \cdot \sum_{k=-\infty}^{\infty} g(l) \cdot z^{-l}$$
$$= F(z) \cdot G(z) $$
<h2> Applying the z-transform </h2>
<p>
In this section we will apply the z-transform to discrete-time systems and analyse the system in the z domain.
And trust me, we can see easily see some pretty cool things in the z domain which are hiding from our eyes in the time domain.
Discrete-time systems are often described by block diagrams which may look like this one:
<img src="/assets/ztrafo_intro/sma.svg" alt="Simple Moving Average" class="center" style="width:80%;margin-top:20px;margin-bottom:20px">
They can be found in many applications in the area of digital signal processing reaching from speed codecs to music production.
The filter depicted above is a typical simple moving average filter with an order of 4 (it sums up the last 3 and current value).
It can be classified as a lowpass filter meaning that meaning that high frequencies are dampened while low frequencies remain unaffected.
As an example listen to an unfiltered signal (works best with Firefox):</p>
<audio controls class="center"> <source src="/assets/ztrafo_intro/yes_short.wav" type="audio/wav"> Your browser does not support the audio element.</audio>
<p>And then listen to the signal which was filtered with a moving average filter of order 20:</p>
<audio controls class="center"> <source src="/assets/ztrafo_intro/yes_short_sma.wav" type="audio/wav">
Your browser does not support the audio element.</audio>
<p>
From this block diagrams we can derive a so called <b>difference equation</b>. This equation describes how the output \(y(k)\)
behaves depending on the input \(x(k)\). By doing this for the given (and simple) example we observe:
$$
y(k)=0.25 \cdot (x(k)+x(k-1)+x(k-2)+x(k-3))
$$
Note, that "T" means that the signal is delayed by one sample.
Deriving the difference equation is pretty straightforward (at least for most systems).
Basically you follow all the paths and apply the operators.
Try to determine the difference equations for the following systems as an exercise:</p>
<button class="accordion"><b>Click here to see the exercise</b></button>
<div class="panel">
<p>
<b>1.</b><img src="/assets/ztrafo_intro/linear_predictor.svg" alt="Linear predictor" class="center" style="width:60%;margin-top:20px;margin-bottom:20px">
<b>2.</b><img src="/assets/ztrafo_intro/linear_predictor_lattice.svg" alt="Linear predictor lattice" class="center" style="width:60%;margin-top:20px;margin-bottom:20px">
<b>3.</b><img src="/assets/ztrafo_intro/recursive_simple.svg" alt="Recursive example" class="center" style="width:40%;margin-top:20px;margin-bottom:20px"><br>
<b>4.</b><img src="/assets/ztrafo_intro/hard_difference_equation.svg" alt="A hard example" class="center" style="width:40%;margin-top:20px;margin-bottom:20px"><br>
</p>
</div>
<button class="accordion"><b>Click here to see the solution</b></button>
<div class="panel">
<p><b>1.</b> $$y(k)=x(k)-a_1 x(k-1) - a_2 x(k-2) - a_3 x(k-3)$$
Nice to know: The depicted structure is the sender side of a so called linear predictor.
</p>
<p><b>2.</b> $$y(k)=x(k)+x(k-1) \cdot (k_1 + k_1 k_2) + x(k-2) \cdot k_2 $$
Nice to know: Again a linear predictor but this time as a so called lattice structure.
</p>
<p><b>3.</b>$$y(k)=x(k)+0.5 \cdot y(k-1)$$
This system is recursive since we feed back the output.
</p>
<p><b>4.</b>
This is example way harder than the previous ones.
When analysing discrete systems it is often helpful to define some help signals (e.g. \(x_1, \, x_2\)):
<img src="/assets/ztrafo_intro/hard_difference_equation_help.svg" alt="A hard example" class="center" style="width:40%;margin-top:20px;margin-bottom:20px"><br>
We can now easily derive the following relations:
$$y(k)=x_1(k)+x_2(k)$$
$$x_1(k) = x_2(k) + x(k)$$
$$x_2(k) = x_2(k-1) + x(k-1)$$
We plug them together to get:
$$y(k) = x_2(k) + x(k) + x_2(k) = 2 x_2(k) + x(k)$$
$$y(k) = 2 x_2(k-1) + 2 x(k-1) + x(k)$$
Now we use a little trick (output shifted by one sample):
$$y(k-1) = 2 x_2(k-1) + x(k-1)$$
$$2 x_2(k-1) = y(k-1) - x(k-1)$$
Which leads us to:
$$y(k) = y(k-1) + x(k) + x(k-1)$$
</p>
</div><br><br>
<p>
However, in some cases the derivation of the difference equation can be really cumbersome, especially if feedback loops come into play (see previous exercise 4).
Furthermore, it is really hard to guess how a system will behave only by looking at the difference equation.
Fortunately there are other ways of representing the input output behaviour of a discrete system for which the z-transform plays an important role.
So let's be curious and apply the z-transform to see what happens.
In order to do this, we use the properties which we derived in the previous section and apply them to our moving average example:
$$
Y(z) = X(z) + X(z) \cdot z^{-1} + X(z) \cdot z^{-2} + X(z) \cdot z^{-3} = X(z) \cdot (1+z^{-1}+z^{-2}+z^{-3})
$$
Not very spectactular yet, but let's rearrange it a little bit:
$$
Y(z)/X(z) = (1+z^{-1}+z^{-2}+z^{-3}) = H(z)
$$
In the above equation we derived the so called <b>transfer function</b> \(H(z)\).
Since \(H(z)\) is just the quotient of the z-transformed input and output, multiplying it with the z-transformed input \(X(z)\)
yields the z-transformed output!
$$Y(z) = H(z) \cdot X(z)$$
This may sound simple but is actually a pretty important statement. Especially if we consider what is implied by this in the time domain.
Because if we now go back to the time domain we observe the following equation
(remember that a multiplication in z domain equals a convolution in time domain):
$$y(k) = h(k) * x(k)$$
So, we can take our input \(x(k)\) and just convolute with a function \(h(k)\) to obtain the output.
This is actually way more convenient than a difference equation. But how do get the function \(h(k)\)?
One way is to determine \(H(z)\) and then just transform it back to the time domain.
However, there is another pretty cool way to do this! If we use \(X(z)=1\), we can observe \(H(z)\) directly at the output:
$$Y(z)=H(z) \cdot X(z) = H(z) \cdot 1 = H(z)$$
In time domain \(X(z)=1\) refers to the so called <b>unit impulse</b> \(\delta(k)\):
$$
\delta (k)=
\begin{cases}
1 & k=0\\
0 & else
\end{cases}
$$
Which is just a \(1\) at \(k=0\):
<img src="/assets/ztrafo_intro/unit_impulse.svg" alt="Unit impulse" class="center" style="width:40%;margin-top:20px;margin-bottom:20px"><br>
So, in order to get \(h(k)\) we just feed the unit impulse into our system and write down the output. Actually pretty simple, or?
That's the reason why \(h(k)\) is also called <b>impulse response</b>.
Doing this for our moving average example results in:
$$h(k) = \{ h(0),h(1),h(2),h(3) \} = \{0.25, 0.25, 0.25, 0.25 \}$$
To get familiar with impulse responses try to determine them for the 4 systems of the previous exercise:
</p>
<button class="accordion"><b>Click here to see the solution</b></button>
<div class="panel">
<p><b>1.</b> $$h(k)=\{ 1, -a_1, -a_2, -a_3\} = \{h(0),h(1),h(2),h(3)\}$$.
</p>
<p><b>2.</b> $$h(k)=\{1, (k_1 + k_1 k_2), k_2\} = \{h(0),h(1),h(2)\}$$
</p>
<p><b>3.</b>$$h(k)=\{1, 1/2, 1/4, 1/8, 1/16, ...\} = \{h(0), h(1), h(2), h(3), h(4), ...\}$$
Note, that this system's impulse response is infitenely long (a so called Infinite Response Impulse (IIR)).
</p>
<p><b>4.</b>$$h(k)=\{1, 2, 2, 2, ...\} = \{h(0), h(1), h(2), h(3), ...\}$$
Similar to 3. the impulse response is infinitely long.
</p>
</div><br><br>
<p>But does this transfer function/impulse response approach work for all systems?
Well, to obtain \(H(z)\) we need to be able to rearrange the terms such that we get the quotient \(Y(z)/X(z)\).<br>
Let's consider the following example which toggles the sign of \(x(k)\) depending on the time:
<img src="/assets/ztrafo_intro/time_variant.svg" alt="A time variant system" class="center" style="width:50%;margin-top:20px;margin-bottom:20px">
The difference equation is as follows:
$$y(k) = x(k) \cdot (-1)^{k}$$
And transforming this into the z domain yields:
$$Y(z) = \sum_{k=-\infty}^{\infty} x(k) \cdot (-1)^k z^{-k}
= \sum_{k=-\infty}^{\infty} x(k) \cdot (-1)^{-k} z^{-k}
= \sum_{k=-\infty}^{\infty} x(k) \cdot (-z)^{-k} = X(-z) $$
As you can se we cannot formulate the equation in such a way that we get \(Y(z)/X(z)\).
Also feeding a unit impulse would lead use to a wrong result as the impulse response of the system is
just the unit impulse. Convoluting a function with a unit impulse does not change it.
We would therefore come to the following conclusion:
$$y(k)=h(k)*x(k)=x(k)$$
This is obviously wrong.
<p>So, for which systems can we safely determine a transfer function/impulse response?
The answer is <b>Linear Time Invariant (LTI)</b> systems. As the name suggests, they have two important characteristics.
They are linear meaning that if you scale your input the output gets scaled by the same amount.
And they are time-invarant meaning that if you shift the input by \(n\) samples also the output should be shifted by \(n\) samples.
This implies that the systems behaviour does not changer over time.
The \((-1)^k\) actually changes over time which is why we could not determine a transfer function for the most recent example.
If you stick to addition, subtraction, time-shifts and non-changing scaling factors, your resulting digital filter will always be an LTI system.
</p>
<p>
Note, that even though a system is might be an LTI system its impulse response may be infinentely long
(a so called <b>Infinite Impulse Response (IIR)</b> as already seen in the most recent exercise) and may not even converge.
But this will be covered in the next post more in detail.
</p>
<h2>Summary</h2>
<p>
If made you it this far: congratulations, you have acquired the basic knowledge of the z-transform.
So, let's make a quick summary of the most important things which were covered in this post:
<ul>
<li>The z-transform can be used to deal with discrete time data and discrete systems</li>
<li>In the z domain some operators like convolution become a multiplication and are thus easy to handle</li>
<li>By using the z-transform we showed that every LTI has a transfer function which corresponds to an impulse response in time domain</li>
<li>By convoluting the impulse response with the input we can obtain the output</li>
<li>The impulse response can be obtained by feeding the LTI system with a unit impulse and observing the output</li>
</ul>
At this point we are now ready to explore the space of all the applications and further properties of the z-transform.
<img src="/assets/ztrafo_intro/rocket.svg" alt="Space exploration" class="center" style="width:50%;margin-top:20px;margin-bottom:20px">
My <a href="/digital/signal/processing/2020/03/26/z-stability.html" target="_blank">next post</a> will
be about the stability of the z-transform, however, there are many ways to continue the journey.
</p>
<p>
If you liked this post, please tell me, and if you didn't, please tell me as well (see <a href="/about/">About</a>).
I do always have an open ear for criticism and grammar nazis ;)
</p>
<script>
var acc = document.getElementsByClassName("accordion");
var i;
for (i = 0; i < acc.length; i++) {
acc[i].addEventListener("click", function() {
this.classList.toggle("active");
var panel = this.nextElementSibling;
if (panel.style.display === "block") {
panel.style.display = "none";
} else {
panel.style.display = "block";
}
});
}
</script>/* Tooltip container */ .tooltip { position: relative; display: inline-block; }