Many of us have probably encountered matrices at some point in math. They’re these tables with seemingly byzantine rules for combining them. Take the first element of the first row, first element of the first column, multiply them together, then add, then spin three times fast… And this is to say nothing for the rules of inverting.
How did someone come up with this?
We all probably memorized these rules at some point, but what’s really going on underneath it all? How did someone actually come up with these rules and the very idea of matrices? Through this post, we’ll aim to answer these questions through deriving matrices ourselves and seeing their underlying intuition.
Before we jump into matrices, we need to start by understanding functions and how we represent them. Functions can be thought of as rules that take in an input and return an output.
One simple way to represent a function is to have a table that takes every input and displays the function’s output. For instance, for a function we could write:
This is a perfectly fine representation, although very verbose (the table would include every input and output combination). Can we create a simpler representation?
Yes: . This representation, a polynomial, is not only concise, but powerful.
How? Let’s say I have another function , and someone asks, “what happens if I apply , followed by ?”.
Before I had this representation, I would have had to make a full table where I write out f(x), and then apply g(x) on it per input like below:
Yikes. This is tedious.
But now that we have the polynomial representation, we can just do
Using simple steps, we can figure out and describe the composition function for any arbitrary input. The same is true of etc.
In short, the right representation allows us to quickly understand, combine, and process functions.
Let’s now focus on a specific type of functions that form the foundations of matrices: Linear Maps. Linear Maps are functions that have a few special properties. Here’s a simple example of a linear map:
- for real numbers .
The special properties of these functions are:
- You can multiply by a scalar before or after applying the function and get the same result.
- Our example function:
- You can add before or after applying the function and get the same result.
- Our example function:
Early mathematicians realized that such functions can model a lot of the phenomena that happen in the real world from physics, to statistics. In fact, geometrically, all linear maps can be thought of as rotations and scalings:
- Functions that rotate shapes around a point.
- Functions that takes a 2d image and scale their width and/or height.
Representing Linear Maps
Now that we’ve understood linear maps, how can we represent succinctly in a way that makes them easy to combine and use?
Let’s start with linear maps that take in real numbers and return real numbers (all in 1 dimension - the real line).
Let's start with linear maps that take inputs on the real line and return outputs on the real line
These linear transforms can always be written as for some . Pretty simple!
But what about in 2 dimensions (i.e. the input is a vector such as and the output is also a vector in 2D)?
Let’s say is a scaling function that stretches its input width by 3x and its height by 5x. You can see it visually below:
The linear map scales the purple rectangle into the black rectangle.
Let’s just start by describing what the function does to two very simple vectors and .
Based on this, can we figure out is? Amazingly, yes!
The below diagram is going to show our approach visually:
We will decompose into a combination of and (the two vectors which we know the value of for).
More precisely, here’s how we find the final value:
So we’re able to find by just knowing and .
So is there a way I can quickly say what a linear map does to and ?
Yes! A simple 2x2 matrix!
(as we defined it in the previous section) can be represented by the notation
This is extremely cool - we can describe the entire function and how it operates on an infinite number of points by a little 4 value table.
Example of Creating a Matrix Representation
Let’s quickly see an example of how we find the matrix representation for a linear map. Let’s imagine we have the linear map that rotates vectors 90º counter clockwise. What would the matrix for this function be?
Let’s as earlier start with just understanding on our two simple vectors:
turned 90º counterclockwise is .
turned 90º counterclockwise is .
Putting it altogether, we see that , the matrix representation of , is .
Applying the Function
Ok so we know we can represent a linear map as a matrix - but how do we apply this function on an input?
Specifically, say I have the same function . I want to apply this function on the vector . How do we do it?
Let’s start with what we know:
- The first column of the matrix tells us .
- the second column tells us .
To make use of that, let’s break into a combination of and .
Thanks to the amazing properties of linear maps (), we can now simplify this to:
And then using another property of linear maps (),
And now we apply the knowledge we have (from the columns of the matrix)!
Does this look familiar? Because it is exactly what you would get by carrying out the matrix multiplication !
Matrix Multiplication as Function Composition
Ok so we’ve seen that multiplying a 2x2 matrix by a vector is just applying the function on the vector. But what does matrix multiplication when we are multiplying two 2x2 matrices?
To answer this, we’re going to take a small detour. We know how to apply a function on an input vector . What if I ask you to find the composition where we know and ?
Let’s work through this now:
- Let be the matrix for the function .
- Let be the matrix for function .
We want to find the matrix that represents the function .
This is going to be:
Let be the first column of and let be the second column.
Then we can simplify to:
So what’s ?
Well we just saw earlier that this is just . So we can now write:
Seem familiar? Because it’s exactly what the formula for the 2x2 matrix multiplication !
So , the matrix representing , is just .
An Example of Function Composition
Let’s work through a full example to see this linear map composition in action.
- Let be our function as earlier (represented by the matrix ).
- Let be the function that rotates a vector 90º counter-clockwise as before (represented by ).
Finding Composed With
Let’s work through applied on the vector . We can use just the properties of and to figure out this result (see diagram below).
- We start with .
- We know stretches its input by in the x direction and by in the y direction, leading to .
- We then just rotate the resulting vector 90º counter-clockwise to get .
Composed With Using Matrices
Do we get the same result when using function composition through matrices? Let’s find out:
- The matrix for is .
- To apply on the vector we just do .
This is exactly what we got when visually applying . So we were able to get the same result without having to apply each function one at a time - we were just able to use matrix multiplication to find out the final function and apply it directly. While this already saves us time, imagine how much time this would save if we had to compose 10 or 20 different functions!
Big Picture: Matrices Give Us Power
Remember how much power we gained by being able to represent a function as a polynomial? We were able to combine polynomials, compose them, multiply them - we now have all that same power for linear maps!
We can compose them (matrix multiplication), add them (matrix addition), and invert them (matrix inversion) all by following some simple rules.
A more sophisticated way of saying this is we have an algebra on these functions (just like we have an algebra on integers etc.).
So we’ve seen pretty clearly that matrices are just functions and that linear algebra is the study of combining these functions.
Understanding this helps us perceive some of the key ideas of linear algebra in a new way. In the next post, we’re going to focus on eigenvectors and see how they provide valuable insights into these linear maps.