In this post we will cover the basics of the z transform which plays an important role for digital signal processing. We will start with a basic mathematical introduction covering not everything but the most important aspects, so we can directly head towards the application.
As the name "z-transform "already suggests, we deal with a transform here. A transform will apply some kind of function to some kind of data and therefore map them to another domain (so basically different values and operators). This is often done in the hope that some operators may become easier or that the transformed data reveals some information that we couldn't see in the original representation. The z-transform allows us, for example, to analyze the stability or spectral characteristics of digital filters.

The concept of a transform may sound abstract at first, so let's start with a very simple example: the logarithmic transform. A long time ago, when modern pocket calculators were not invented yet, humans used an ancient technology called slide rules to deal with multiplication and division. Just by shifting some plastic bars, you can calculate some seemingly difficult operations in a few seconds. When I heard this for the first time, I was like: What, how can one multiply with some fancy piece of plastic?
The mathematical answer is quite simple yet clever. Imagine we want to calculate the following expression:

$$42 \cdot 23$$

If we apply the logarithm on this equation we can reformulate it as follows:

$$log(42 \cdot 23)=log(42)+log(23)$$

So in the logarithmic domain the multiplication can be represented by an addition! This can easily be implemented by shifting a bar. Of course we need to transform the result back into the domain where we started. So, most transforms come with an inverse transform. To invert the logarithm, we need to take the base of the logarithm which we used and take it to the power of the result (e.g. base 10): $$ 42 \cdot 23 = 10^{log_{10}(42)+log_{10}(23)} $$ If you want to know more about slide rules I highly suggest this video by James Grime.


So, let's get down to business and start with the z-transform. When working with data on computers or digital systems in general we have to work with finite data. So called discrete time data which is often derived by sampling of contintous data. Take the following image as an example.

A sampled signal

The continous (grey) function may be some kind of analog voltage we want to measure. This measurment (or often called sampling) is done by an analog-to-digital converter and yields the blue dots. These dots can be represented by the following function \(x(k)\):


For the sake of simplicity we assume the function to be 0 at points in time where we didn't measure.

So how do we now apply the z-transform? Let's take a look at the definition*: The definition refers to the two-sided z-transform. If we'd start with \(k=0\) it would be called single-sided z-transform instead. $$ X(z)=\mathscr{Z}\{x(k)\}=\sum_{k=-\infty}^{+\infty} x(k) \cdot z^{-k} $$ If we apply this definition to our example signal, we get: $$ X(z)= 1 \cdot z^{-1} + 2 \cdot z^{-2} - 0.5 \cdot z^{-3} + 1 \cdot z^{-4} $$ Applying the z-transform is fairly easy. We just take a value of our discrete signal and then multiply it with a \(z^{-k}\) whereby k depends on the position. Doing this for values and summing the terms up results in our transformed signal. If we talk about z-transformed signals, we will now correspond to them as being in the z domain. Similarly, we talk about the time domain if our signals are not transformed. Note, that \(z\) is just a variable. In a subsequent post, we will take a look at how the z-transform behaves for different values of z. Although the transform step is pretty easy, the result may look confusing at first. Also, the usefulness is not entirely apparent at this point. So, don't be scared to read the next section as we will bring light into the darkness step by step.

If you want to get warm with the z-transform try to transform the the following expressions. Difficulty reaches from easy to hard.

1. \( x(k)= \begin{cases} 1 & k = 0\\ 0 & else\\ \end{cases}\)

2. \( x(k)= \begin{cases} 1 & k \geq 0\\ 0 & else\\ \end{cases}\) (hint) Use the geometric series which is defined as: $$ \sum_{k=0}^{\infty}aq^{k}=\frac{a}{1-q} \quad for \, |q|<1 $$

3. \( x(k)= \begin{cases} 1 & k \in [0,3]\\ 0 & else\\ \end{cases}\) (hint) Extend with: \( \frac{1-z^{-1}}{1-z^{-1}} \)

1. $$ X(z)=1 \cdot z^0 = 1 $$

2. $$ X(z)=\sum_{k=0}^{\infty} 1 \cdot z^{-k} = \frac{1}{1-z^{-1}} \quad \text{for} \, |z^{-1}| < 1 $$ Note, that this term only converges for \(|z^{-1}|<1\), or \(|z|>1\). In the later sections, we will use the geometric series and the convergence criterium to analyze the stability of digital filters.

3. $$X(z) = z^{-3} + z^{-2} + z^{-1} + 1$$ $$= \frac{1-z^{-1}}{1-z^{-1}} \cdot (z^{-3}+z^{-2}+z^{-1}+1)$$ $$= \frac{z^{-3}+z^{-2}+z^{-1}+1 - z^{-4}-z^{-3}-z^{-2}-z^{-1}}{1-z^{-1}}$$ $$= \frac{1 - z^{-4}}{1-z^{-1}}$$ This term only converges for \(z \neq 0\)

How operators transform

Similar to the logarithmic transform some operators may change or not change when they are transformed. Knowing how operators transform is really essential knowledge and one of the most important aspects of the z-transform! What you should definitely know is:

  • Addition remains addition in z domain: \(\mathscr{Z}\{x(k)+y(k)\} = X(z)+Y(z)\)
  • Scaling a function by a factor of \(s\) remains a scaling of \(s\) in z domain: \(\mathscr{Z}\{s \cdot x(k)\} = s \cdot X(z)\)
  • Time shifting by \(u\) refers to a multiplication of \(z^{-u}\) in z domain: \(\mathscr{Z}\{x(k-u)\} = z^{-u} \cdot X(z)\)
  • Convolution becomes a multiplication in z domain: \(\mathscr{Z}\{x(k)*y(k)\} = X(z) \cdot Y(z)\)
Of course there are plenty of other operators and properties that would go beyond the scope of this post. But with the 4 mentioned properties, you have all the knowledge you need for the following sections. A comprehensive overview can be found in the Wikipedia article about the z-transform. In the following we will go a little bit more in detail. Try to proof the relations by yourself, for a nice challenge. If you are doing a speed run, you can skip the rest of this section.


If we just insert the addition in the definition, we get: $$ \mathscr{Z}\{x(k)+y(k)\} = \sum_{k=-\infty}^{\infty} (x(k)+y(k)) \cdot z^{-k}$$ $$ = \sum_{k=-\infty}^{\infty} x(k) \cdot z^{-k} + \sum_{k=-\infty}^{\infty} y(k) \cdot z^{-k} = X(z) + Y(z) $$ So, addition remains addition. Let's take some example: Adding two signals The corresponding z-transforms can be determined as: $$A(z) = z^{-1}+z^{-3} \quad B(z)=z^{-1}+2z^{-2}+z^{-3}$$ $$C(z) = A(z)+B(z)=(z^{-1}+z^{-3})+(z^{-1}+2z^{-2}+z^{-3}) = 2z^{-1}+2z^{-2}+2z^{-3}$$


Again, we just insert the scaling in the definition: $$ \mathscr{Z}\{s \cdot x(k)\} = \sum_{k=-\infty}^{\infty} s \cdot x(k) \cdot z^{-k}$$ $$ = s \cdot \sum_{k=-\infty}^{\infty} x(k) \cdot\ z^{-k} = s \cdot X(z)$$ Let's take a scaling example: Scaling a signal Again, we can directly see the result in the z-transform: $$C(z)=2 \cdot A(z) = 2 \cdot (z^{-1}+z^{-3}) = 2z^{-1}+2z^{-3}$$

Time shift

Now things get a little bit more interesting. Let's assume we have an example function and we shift it one sample to the positive site: Scaling a signal The resulting z-transforms look as follows: $$ \mathscr{Z}\{a(k)\} = z^{-1} + z^{-3} \quad \mathscr{Z}\{a(k-1)\} = z^{-2} + z^{-4} $$ If we take a look at the z-transforms, we can see that the transform of \(x(k-1)\) is just the transform of \(x(k)\) multiplied with \(z^{-1}\). It seems like shifting a function can be expressed by multiplying a corresponding \(z\)!
Here is the proof (using the substitution \(l=k-u\) and therefore \(k=l+u\) ): $$ \sum_{k=-\infty}^{\infty}x(k-u) \cdot z^{-k} = \sum_{l=-\infty}^{\infty}x(l) \cdot z^{-l-u} = z^{-u} \cdot \sum_{l=-\infty}^{\infty}x(l) \cdot z^{-l} = z^{-u} \cdot X(z) $$ Alternatively, you can shift a function by convoluting it with a unit impulse, and then transform it: $$\sum_{k=-\infty}^{\infty}(\delta(k-u)*x(k)) \cdot z^{-k} = z^{-u} \cdot X(z)$$ But this requires some concepts that exceed the scope of this post.


If you don't know what a convolution is, I highly recommend to read some literature on it, since it is one of the fundamental operators of digital signal processing. The discrete convolution is defined as: $$ f * g = \sum_{m=-\infty}^{\infty}f(m) \cdot g(k-m) $$ So, basically it maps two functions on a third one. And the really cool thing about the z-transform is that it maps the quite complex convolution to a very elementary multiplication! Here is the proof: $$\mathscr{Z}\{f*g\} = \sum_{k=-\infty}^{\infty} \sum_{m=-\infty}^{\infty} f(m) \cdot g(k-m) \cdot z^{-k}$$ As a next step we swap the sum signs (the commutativity and associativity of addition allows us to do this): $$= \sum_{m=-\infty}^{\infty} \sum_{k=-\infty}^{\infty} f(m) \cdot g(k-m) \cdot z^{-k}$$ Using distributivity we can put \(f(m)\) in front of the sum sign: $$= \sum_{m=-\infty}^{\infty} f(m) \cdot \sum_{k=-\infty}^{\infty} g(k-m) \cdot z^{-k}$$ Now we substitute \(l=k-m\) and rearrange a little bit: $$= \sum_{m=-\infty}^{\infty} f(m) \cdot \sum_{k=-\infty}^{\infty} g(l) \cdot z^{-l-m}$$ $$= \sum_{m=-\infty}^{\infty} f(m) \cdot z^{-m} \cdot \sum_{k=-\infty}^{\infty} g(l) \cdot z^{-l}$$ $$= F(z) \cdot G(z) $$

Applying the z-transform

In this section we apply the z-transform to discrete-time systems and analyze the system in the z domain. And trust me, we can see easily see some pretty cool things in the z domain which are hiding from our eyes in the time domain. Discrete-time systems are often described by block diagrams which may look like this one: Simple Moving Average They can be found in many applications in the area of digital signal processing, reaching from speed codecs to music production. The filter depicted above is a typical simple moving average filter with an order of 4 (it sums up the last 3 and the current value). It can be classified as a lowpass filter, meaning that high frequencies are dampened while low frequencies remain unaffected. As an example listen to an unfiltered signal (works best with Firefox):

And then listen to the signal which was filtered with a moving average filter of order 20:

From this block diagrams we can derive a so-called difference equation. This equation describes how the output \(y(k)\) behaves depending on the input \(x(k)\). By doing this for the given (and simple) example, we observe: $$ y(k)=0.25 \cdot (x(k)+x(k-1)+x(k-2)+x(k-3)) $$ Note, that "T" means that the signal is delayed by one sample. Deriving the difference equation is pretty straightforward (at least for most systems). Basically you follow all the paths and apply the operators. Try to determine the difference equations for the following systems as an exercise:

1.Linear predictor 2.Linear predictor lattice 3.Recursive example
4.A hard example

1. $$y(k)=x(k)-a_1 x(k-1) - a_2 x(k-2) - a_3 x(k-3)$$ Nice to know: The depicted structure is the sender side of a so called linear predictor.

2. $$y(k)=x(k)+x(k-1) \cdot (k_1 + k_1 k_2) + x(k-2) \cdot k_2 $$ Nice to know: Again a linear predictor but this time as a so called lattice structure.

3.$$y(k)=x(k)+0.5 \cdot y(k-1)$$ This system is recursive since we feed back the output.

4. This is example way harder than the previous ones. When analysing discrete systems it is often helpful to define some help signals (e.g. \(x_1, \, x_2\)): A hard example
We can now easily derive the following relations: $$y(k)=x_1(k)+x_2(k)$$ $$x_1(k) = x_2(k) + x(k)$$ $$x_2(k) = x_1(k-1) = x_2(k-1) + x(k-1)$$ We plug them together to get: $$y(k) = x_2(k) + x(k) + x_2(k) = 2 x_2(k) + x(k)$$ $$y(k) = 2 x_2(k-1) + 2 x(k-1) + x(k)$$ Now we use a little trick (output shifted by one sample): $$y(k-1) = 2 x_2(k-1) + x(k-1)$$ $$2 x_2(k-1) = y(k-1) - x(k-1)$$ Which leads us to: $$y(k) = y(k-1) + x(k) + x(k-1)$$

However, in some cases the derivation of the difference equation can be really cumbersome, especially if feedback loops come into play (see previous exercise 4). Furthermore, it is hard to guess how a system will behave only by looking at the difference equation. Fortunately, there are other ways of representing the input-output behaviour of a discrete system for which the z-transform plays an important role. So let's be curious and apply the z-transform to see what happens. In order to do this, we use the properties which we derived in the previous section and apply them to our moving average example: $$ Y(z) = X(z) + X(z) \cdot z^{-1} + X(z) \cdot z^{-2} + X(z) \cdot z^{-3} = X(z) \cdot (1+z^{-1}+z^{-2}+z^{-3}) $$ Not very spectacular yet, but let's rearrange it a little bit: $$ Y(z)/X(z) = (1+z^{-1}+z^{-2}+z^{-3}) = H(z) $$ In the above equation we derived the so called transfer function \(H(z)\). Since \(H(z)\) is just the quotient of the z-transformed input and output, multiplying it with the z-transformed input \(X(z)\) yields the z-transformed output! $$Y(z) = H(z) \cdot X(z)$$ This may sound simple, but it is actually a pretty important statement. Especially if we consider what is implied by this in the time domain. Because if we now go back to the time domain, we observe the following equation (remember that a multiplication in z domain equals a convolution in time domain): $$y(k) = h(k) * x(k)$$ So, we can take our input \(x(k)\) and just convolute with a function \(h(k)\) to obtain the output! But how do get the function \(h(k)\)? One way is to determine \(H(z)\) and then just transform it back to the time domain. However, there is another pretty cool way to do this! If we use \(X(z)=1\), we can observe \(H(z)\) directly at the output: $$Y(z)=H(z) \cdot X(z) = H(z) \cdot 1 = H(z)$$ In time domain \(X(z)=1\) refers to the so called unit impulse \(\delta(k)\): $$ \delta (k)= \begin{cases} 1 & k=0\\ 0 & else \end{cases} $$ Which is just a \(1\) at \(k=0\): Unit impulse
So, in order to get \(h(k)\), we just feed the unit impulse into our system and write down the output. Actually pretty simple, right? That's the reason why \(h(k)\) is also called impulse response. Doing this for our moving average example results in: $$h(k) = \{ h(0),h(1),h(2),h(3) \} = \{0.25, 0.25, 0.25, 0.25 \}$$ To get familiar with impulse responses try to determine them for the 4 systems of the previous exercise:

1. $$h(k)=\{ 1, -a_1, -a_2, -a_3\} = \{h(0),h(1),h(2),h(3)\}$$.

2. $$h(k)=\{1, (k_1 + k_1 k_2), k_2\} = \{h(0),h(1),h(2)\}$$

3.$$h(k)=\{1, 1/2, 1/4, 1/8, 1/16, ...\} = \{h(0), h(1), h(2), h(3), h(4), ...\}$$ Note, that this system's impulse response is infitenely long (a so called Infinite Response Impulse (IIR)).

4.$$h(k)=\{1, 2, 2, 2, ...\} = \{h(0), h(1), h(2), h(3), ...\}$$ Similar to 3. the impulse response is infinitely long.

But does this transfer function/impulse response approach work for all systems? Well, to obtain \(H(z)\) we need to be able to rearrange the terms such that we get the quotient \(Y(z)/X(z)\).
Let's consider the following example which toggles the sign of \(x(k)\) depending on the time: A time variant system The difference equation is as follows: $$y(k) = x(k) \cdot (-1)^{k}$$ And transforming this into the z domain yields: $$Y(z) = \sum_{k=-\infty}^{\infty} x(k) \cdot (-1)^k z^{-k} = \sum_{k=-\infty}^{\infty} x(k) \cdot (-1)^{-k} z^{-k} = \sum_{k=-\infty}^{\infty} x(k) \cdot (-z)^{-k} = X(-z) $$ As you can see, we cannot formulate the equation in such a way that we get \(Y(z)/X(z)\). Also feeding a unit impulse into the given system would lead us to a wrong result as the impulse response of the system is just the unit impulse. Convoluting a function with a unit impulse does not change it. We would therefore come to the following conclusion: $$y(k)=h(k)*x(k)=x(k)$$ This is obviously wrong.

So, for which systems can we safely determine a transfer function/impulse response? The answer is Linear Time Invariant (LTI) systems. As the name suggests, they have two important characteristics. They are linear, meaning that if you scale your input, the output gets scaled by the same amount. And they are time-invarant, meaning that if you shift the input by \(n\) samples, also the output should be shifted by \(n\) samples. This implies that the systems behavior does not change over time. The \((-1)^k\) actually changes over time which is why we could not determine a transfer function for the most recent example. If you stick to addition, subtraction, time-shifts and non-changing scaling factors, your resulting digital filter will always be an LTI system.

Note, that even though a system might be an LTI system, its impulse response may be infinitely long (a so-called Infinite Impulse Response (IIR) as already seen in the most recent exercise) and may not even converge. But this will be covered in the next post more in detail.


If made you it this far: congratulations, you have acquired the basic knowledge of the z-transform. So, let's make a quick summary of the most important things which were covered in this post:

  • The z-transform can be used to deal with discrete time data and discrete systems
  • In the z domain some operators like convolution become a multiplication and are thus easy to handle
  • By using the z-transform we showed that every LTI has a transfer function which corresponds to an impulse response in time domain
  • By convoluting the impulse response with the input we can obtain the output
  • The impulse response can be obtained by feeding the LTI system with a unit impulse and observing the output
At this point we are now ready to explore the space of all the applications and further properties of the z-transform. Space exploration My next post will be about the stability of the z-transform, however, there are many ways to continue the journey.

If you liked this post, please tell me, and if you didn't, please tell me as well (see About). I do always have an open ear for criticism and grammar nazis ;)