A Visual Introduction to Function Kernels
In the last few posts we’ve focused heavily on matrices and their applications. In this post we’re going to use matrices to learn about Kernels.
The Kernel of a function is the set of points that the function sends to $0$. Amazingly, once we know this set, we can immediately characterize how the matrix (or linear function) maps its inputs to its outputs.
I hope that by the end of this post you will:
 Understand what a Kernel of a function is and how it helps us understand a function better.
 Realize that the inverses of output points are always some translation of the Kernel (for linear functions).
 See that there are many pretty patterns and coincidences that flow out of the properties of linear functions.
Functions Across Spaces
In previous posts, we noticed how matrices are just linear functions. We found that the matrices we studied just rotate or stretch a vector in some way.
But we only studied square matrices (i.e. 2x2 or 3x3 matrices). What happens when our matrices aren’t square?
Let’s start with a 2x3 matrix that represents a linear function $f$. Let’s study a function $f$ defined by the matrix $F$:
$F = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix}$
What happens if I apply this matrix on a vector $v = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}$?
Let’s find out:
$\begin{aligned} F v &= \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} \\ F v &= \begin{bmatrix} 8 \\ 5 \end{bmatrix} \end{aligned}$
In other words, $f(\begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}) = \begin{bmatrix} 8 \\ 5 \end{bmatrix})$.
So $f$ effectively takes a point in 3 Dimensions, ($\begin{bmatrix}3 \\ 2 \\ 1\end{bmatrix}$) and sends it to a point in 2 Dimensions ($\begin{bmatrix} 8 \\ 5 \end{bmatrix}$). We can see this below:
We interact with functions that take points from 3D to 2D all the time. For instance, everytime you take a picture with a camera you are taking a 3D space (the world you see), and collapsing it onto a 2D space (the camera sensor).
The input space for $f$ is $R^3$ and the output space is $R^2$. More formally, we write this as:
$f: R^3 \rightarrow R^2$
Losing Information
Returning to the example of cameras, when we take pictures, we squash a 3D world onto a 2D sensor. In the process, we lose some amount of information primarily related to depth.
Specifically, points that are far away will appear close to each other even though they may be quite distant.
Eventually, points infinitely far away on the horizon all collapse onto the same point. We can see this in the example image below.
So just like a camera, will our function also “lose” some information when it moves points from 3D to 2D? Will it collapse multiple points from the input to the same point in the output?
The Kernel  Set of Points that Map to 0
To answer this, let’s start by seeing all the points $v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}$ that map onto the origin of the output space  $\begin{bmatrix} 0 \\ 0 \end{bmatrix}$. This gives us a good starting point for understanding which points from our input hit the same point in the output.
We want to solve:
$\begin{aligned} Fv &= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \\ \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} &= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{aligned}$
Carrying out this multiplication, we see this is satisfied when:
$2x + y = 0$ $y + 3z = 0$
This line is shown below. Some points on this line are: $v_1 = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$, $v_2 = \begin{bmatrix} 3 \\ 6 \\ 2 \end{bmatrix}$, $v_3 = \begin{bmatrix} 4.5 \\ 9 \\ 3 \end{bmatrix}$.
In our specific case, the Kernel is a line. When $f$ maps this line to the point $0$, we lose information about the line  in the output space, the points on the line are no longer distinguishable.
Returning to our camera analogy, this is similar to how all points on the horizon are no longer distinguishable after the conversion from 3D to 2D. Thus, you can think of the Kernel as a quick way to see how the function compresses or loses information.
Terminology Aside
Let’s get some more quick terminology out of the way before proceeding. We’re going to use the following terms:

Image  the set of outputs of the function (i.e. everything in $f(x)$). The image of a point $x$ is just $f(x)$.

PreImage  the set of inputs for the function (i.e. the $x$ in $f(x)$). The preimage of a point $y$ is just $f^{1}(y)$.
Translations of the Kernel  Mapping to $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$
We found the set of points that map to $\begin{bmatrix} 0 \\ 0 \end{bmatrix}$ (i.e. the preimage of the origin). We call this set the Kernel or $K$ for short.
Can we now similarly find the set of points that map to $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$?
 Once you know the preimage of $\begin{bmatrix} 0 \\ 0 \end{bmatrix}$, it's super simple to find the preimage of $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ or any other point for that matter.
Finding the preimage
Let’s start by finding the points that maps to $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ as before.
$\begin{aligned} F v &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{aligned}$
Solving for each variable, we find that this is just the line defined by:
$\begin{aligned} x &= \frac{1t}{2} \\ y &= t \\ z &= \frac{1t}{3} \end{aligned}$
for some $t \in R$.
Some valid point are:
$v = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$, $v = \begin{bmatrix} 3 \\ 7 \\ 2 \end{bmatrix}$.
This line looks awfully similar to the line for $K$ doesn’t it?
Let’s see them both on the same graph. Notice that they’re parallel to each other!
Translating the Kernel
So what’s the relation between the two lines we plotted above  $f^{1}(\begin{bmatrix}1 \\ 1\end{bmatrix})$ and $K = f^{1}(\begin{bmatrix} 0 \\ 0 \end{bmatrix})$?
$f^{1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix})$ is just a translation of $K$.
It is a translation by any vector $v \in f^{1}(\begin{bmatrix}1 \\ 1\end{bmatrix})$.
Or said another way,
This seems kind of too good to be true. Is it? Let’s test it out!

Let’s take a point $k \in K$. For instance, $k = \begin{bmatrix} 3 \\ 6 \\ 2 \end{bmatrix}$.

Let’s take a $v \in f^{1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix})$ like $v = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$.
What’s $f(k + v)$?
$\begin{aligned} f(k + v) &= f(\begin{bmatrix} 3 \\ 6 \\ 2 \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}) \\ &= f(\begin{bmatrix} 3 \\ 7 \\ 2 \end{bmatrix}) \\ &= F \begin{bmatrix} 3 \\ 7 \\ 2 \end{bmatrix} \\ &= \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} 3 \\ 7 \\ 2 \end{bmatrix} \\ &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{aligned}$
So it is indeed the case here that $f(k+v)$ is $\begin{bmatrix}1 \\1 \end{bmatrix}$!
All Translations of the Kernel are PreImages
Ok there’s something kind of mind blowing going on here:
 We took one point in $f^{1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix})$.
 We added $K$ to it.
 And suddenly we got ALL of $f^{1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix})$!
In fact this is true more generally!
Breaking Down Why
Let’s break the above statement down into two parts.
 First, we’re saying that given some $v$, all points in the set $v+K$ will map to the same place as $v$ (i.e. $f(v) = f(v+K)$).
 Next, these are ALL the points that map to $f(v)$. Or, every point that maps to $f(v)$ must be in the set $v+K$.
Let’s prove each of the above statements more formally, starting with the first.
1. all points in the set $v+K$ will map to the same place as $v$
A more formal way of saying this is:
$f(v+K) = f(v) \text{, for any } v \in R^3$
Let’s break down why this is true. Take any $k \in K$ (in the kernel). Then,
$\begin{aligned} f(v+k) &= f(v) + f(k) & \text{Since $f$ is a linear function} \\ f(v+k) &= f(v) + 0 & \text{Since } k \in K\\ f(v+k) &= f(v) \end{aligned}$
The below video shows this visually.
Additionally, given this is true for some $v+k$, this is true for all points on the line $v+K$. The reason is that the different amounts of $K$ all contribute nothing different and it’s only the value of $v$ that matters to $f$. This is shown below:
Let’s now move to the next statement.
2. Every point that maps to $f(v)$ must be in the set $v+K$
Essentially, this is saying that there can be no point $w$ such that $w$ maps to $f(v)$ but is not in $v+K$.
Let’s prove this.
Choose any $v, w$ such that $f(v) = v'$ and $f(w) = v'$. We wish to show that $w \in v+K$.
 Let $w' = w  v$.
 Then $f(w') = f(w  v) = f(w)  f(v) = 0$.
 Hence $w' \in K$ (as all points that map to $0$ are in $K$).
 Thus, $v + w' \in v +K$.
 Since $v+w' = v + w  v = w$, $w \in v +K$.
So we’ve successfully proved our two points!
The Relation Between Translations of $K$ and Points in the Image
We’ve already seen something really cool  every translation of $K$ is the full preimage of a point in $f$.
Now is there any relation between how far apart two translations of $K$ are (say $v+K$ and $w+K$) and how far apart their images are ($f(v)$, $f(w)$)?
Yes!
Why is this the case? It again follows pretty simply:
$\begin{aligned} w+K &= v+K + v'+K & \text{Since $w+K$ and $v+K$ are $v'$ apart} \\ w+K &= v+v' + K & \text{ } \\ f(w+K) &= f(v+v'+K) & \text{apply $f$} \\ f(w) &= f(v+v') & \text{Since $f(w+K) = f(w)$} \\ f(w) &= f(v) + f(v') & \text{Hence f(v') apart} \end{aligned}$
Overall Space
Let’s now take a step back and view what’s happening in the overall space.
Every point in the image can be seen as the image of some translation of $K$. As we move $K$ around, we get new points in the image!
Conclusion
We’ve now seen some really cool things that you may not have noticed before:
 Every matrix is a linear function and that linear function will have some kernel $K$ that maps to $0$.
 All preimages of output points are just going to be translations of $K$.
 If $v'$ is the distance between the translations of $K$, $f(v')$ is the distance between their images.
The last point actually leads us to the first isomorphism theorem of group theory. This broadly states that the relation between the sets of preimages of a special type of function known as a homomorphism (in our case $f$) is the exact same as the relation between the set of output points (we’ll go into this in the next blog post!).
There are many practical uses of this knowledge but I wanted to share it for simpler reasons. Sometimes math is just pretty  it has all these cool properties that fit together so nicely that you can’t help but enjoy seeing them.
For example:

Who would have thought that all the preimage sets are just translations of each other?

Or that the relation between these preimage sets mirrors the relation between the points in the image?
I hope you enjoyed getting a taste of some abstract algebra and I’ll see you in the next post!