A Visual Introduction to Function Kernels
The Kernel of a function is the set of points that the function sends to . Amazingly, once we know this set, we can immediately characterize how the matrix (or linear function) maps its inputs to its outputs.
I hope that by the end of this post you will:
- Understand what a Kernel of a function is and how it helps us understand a function better.
- Realize that the inverses of output points are always some translation of the Kernel (for linear functions).
- See that there are many pretty patterns and coincidences that flow out of the properties of linear functions.
Functions Across Spaces
In previous posts, we noticed how matrices are just linear functions. We found that the matrices we studied just rotate or stretch a vector in some way.
But we only studied square matrices (i.e. 2x2 or 3x3 matrices). What happens when our matrices aren’t square?
Let’s start with a 2x3 matrix that represents a linear function . Let’s study a function defined by the matrix :
What happens if I apply this matrix on a vector ?
Let’s find out:
In other words, .
So effectively takes a point in 3 Dimensions, () and sends it to a point in 2 Dimensions (). We can see this below:
We interact with functions that take points from 3D to 2D all the time. For instance, everytime you take a picture with a camera you are taking a 3D space (the world you see), and collapsing it onto a 2D space (the camera sensor).
The input space for is and the output space is . More formally, we write this as:
Returning to the example of cameras, when we take pictures, we squash a 3D world onto a 2D sensor. In the process, we lose some amount of information primarily related to depth.
Specifically, points that are far away will appear close to each other even though they may be quite distant.
Eventually, points infinitely far away on the horizon all collapse onto the same point. We can see this in the example image below.
So just like a camera, will our function also “lose” some information when it moves points from 3D to 2D? Will it collapse multiple points from the input to the same point in the output?
The Kernel - Set of Points that Map to 0
To answer this, let’s start by seeing all the points that map onto the origin of the output space - . This gives us a good starting point for understanding which points from our input hit the same point in the output.
We want to solve:
Carrying out this multiplication, we see this is satisfied when:
This line is shown below. Some points on this line are: , , .
In our specific case, the Kernel is a line. When maps this line to the point , we lose information about the line - in the output space, the points on the line are no longer distinguishable.
Returning to our camera analogy, this is similar to how all points on the horizon are no longer distinguishable after the conversion from 3D to 2D. Thus, you can think of the Kernel as a quick way to see how the function compresses or loses information.
Let’s get some more quick terminology out of the way before proceeding. We’re going to use the following terms:
Image - the set of outputs of the function (i.e. everything in ). The image of a point is just .
Pre-Image - the set of inputs for the function (i.e. the in ). The pre-image of a point is just .
Translations of the Kernel - Mapping to
We found the set of points that map to (i.e. the pre-image of the origin). We call this set the Kernel or for short.
Can we now similarly find the set of points that map to ?
- Once you know the pre-image of , it's super simple to find the pre-image of or any other point for that matter.
Finding the pre-image
Let’s start by finding the points that maps to as before.
Solving for each variable, we find that this is just the line defined by:
for some .
Some valid point are:
This line looks awfully similar to the line for doesn’t it?
Let’s see them both on the same graph. Notice that they’re parallel to each other!
Translating the Kernel
So what’s the relation between the two lines we plotted above - and ?
is just a translation of .
It is a translation by any vector .
Or said another way,
This seems kind of too good to be true. Is it? Let’s test it out!
Let’s take a point . For instance, .
Let’s take a like .
So it is indeed the case here that is !
All Translations of the Kernel are Pre-Images
Ok there’s something kind of mind blowing going on here:
- We took one point in .
- We added to it.
- And suddenly we got ALL of !
In fact this is true more generally!
Breaking Down Why
Let’s break the above statement down into two parts.
- First, we’re saying that given some , all points in the set will map to the same place as (i.e. ).
- Next, these are ALL the points that map to . Or, every point that maps to must be in the set .
Let’s prove each of the above statements more formally, starting with the first.
1. all points in the set will map to the same place as
A more formal way of saying this is:
Let’s break down why this is true. Take any (in the kernel). Then,
The below video shows this visually.
Additionally, given this is true for some , this is true for all points on the line . The reason is that the different amounts of all contribute nothing different and it’s only the value of that matters to . This is shown below:
Let’s now move to the next statement.
2. Every point that maps to must be in the set
Essentially, this is saying that there can be no point such that maps to but is not in .
Let’s prove this.
Choose any such that and . We wish to show that .
- Let .
- Then .
- Hence (as all points that map to are in ).
- Thus, .
- Since , .
So we’ve successfully proved our two points!
The Relation Between Translations of and Points in the Image
We’ve already seen something really cool - every translation of is the full pre-image of a point in .
Now is there any relation between how far apart two translations of are (say and ) and how far apart their images are (, )?
Why is this the case? It again follows pretty simply:
Let’s now take a step back and view what’s happening in the overall space.
Every point in the image can be seen as the image of some translation of . As we move around, we get new points in the image!
We’ve now seen some really cool things that you may not have noticed before:
- Every matrix is a linear function and that linear function will have some kernel that maps to .
- All pre-images of output points are just going to be translations of .
- If is the distance between the translations of , is the distance between their images.
The last point actually leads us to the first isomorphism theorem of group theory. This broadly states that the relation between the sets of pre-images of a special type of function known as a homomorphism (in our case ) is the exact same as the relation between the set of output points (we’ll go into this in the next blog post!).
There are many practical uses of this knowledge but I wanted to share it for simpler reasons. Sometimes math is just pretty - it has all these cool properties that fit together so nicely that you can’t help but enjoy seeing them.
Who would have thought that all the pre-image sets are just translations of each other?
Or that the relation between these pre-image sets mirrors the relation between the points in the image?
I hope you enjoyed getting a taste of some abstract algebra and I’ll see you in the next post!