Relearning Matrices as Linear Functions
Many of us have probably encountered matrices at some point in math. They’re these tables with seemingly byzantine rules for combining them. Take the first element of the first row, first element of the first column, multiply them together, then add, then spin three times fast… And this is to say nothing for the rules of inverting.
How did someone come up with this?
We all probably memorized these rules at some point, but what’s really going on underneath it all? How did someone actually come up with these rules and the very idea of matrices? Through this post, we’ll aim to answer these questions through deriving matrices ourselves and seeing their underlying intuition.
Representing Functions
Before we jump into matrices, we need to start by understanding functions and how we represent them. Functions can be thought of as rules that take in an input and return an output.
One simple way to represent a function is to have a table that takes every input and displays the function’s output. For instance, for a function $f$ we could write:
$x$  $f(x)$ 

1  2 
2  4 
3  6 
This is a perfectly fine representation, although very verbose (the table would include every input and output combination). Can we create a simpler representation?
Yes: $f(x) = 2x$. This representation, a polynomial, is not only concise, but powerful.
How? Let’s say I have another function $g(x) = 10x$, and someone asks, “what happens if I apply $f$, followed by $g$?”.
Before I had this representation, I would have had to make a full table where I write out f(x), and then apply g(x) on it per input like below:
$x$  $f(x)$  $g(f(x))$ 

1  2  20 
2  4  40 
3  6  60 
Yikes. This is tedious.
But now that we have the polynomial representation, we can just do $g(x) = 10x$ $g(f(x)) = 10\cdot f(x)$ $g(f(x)) = 20x$
Using simple steps, we can figure out $g(f(x))$ and describe the composition function for any arbitrary input. The same is true of $g(x) + f(x)$ etc.
In short, the right representation allows us to quickly understand, combine, and process functions.
Linear Maps
Let’s now focus on a specific type of functions that form the foundations of matrices: Linear Maps. Linear Maps are functions that have a few special properties. Here’s a simple example of a linear map:
 $f(x) = 3x$ for real numbers $x$.
The special properties of these functions are:
 $f(c \cdot x) = c \cdot f(x)$
 You can multiply by a scalar before or after applying the function and get the same result.
 Our example function:
 $f(5\cdot x) = 3 \cdot 5x = 15x = 5 \cdot f(x)$.
 $f(x+y) = f(x) + f(y)$
 You can add before or after applying the function and get the same result.
 Our example function:
 $f(x + y) = 3 \cdot (x + y) = 3x + 3y = f(x) + f(y)$.
Early mathematicians realized that such functions can model a lot of the phenomena that happen in the real world from physics, to statistics. In fact, geometrically, all linear maps can be thought of as rotations and scalings:
 Rotations
 Functions that rotate shapes around a point.
 Scalings
 Functions that takes a 2d image and scale their width and/or height.
Representing Linear Maps
Now that we’ve understood linear maps, how can we represent succinctly in a way that makes them easy to combine and use?
1Dimension
Let’s start with linear maps that take in real numbers and return real numbers (all in 1 dimension  the real line).
Let's start with linear maps that take inputs on the real line and return outputs on the real line
These linear transforms can always be written as $f(x) = mx$ for some $m$. Pretty simple!
2Dimensions
But what about in 2 dimensions (i.e. the input is a vector such as $\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}$ and the output is also a vector in 2D)?
Let’s say $f(x)$ is a scaling function that stretches its input width by 3x and its height by 5x. You can see it visually below:
The linear map $f$ scales the purple rectangle into the black rectangle.
Let’s just start by describing what the function does to two very simple vectors $\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}$ and $\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}$.
We define:
$f(\textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix}}) = \begin{bmatrix} 3 \\ 0\end{bmatrix}$ $f(\textcolor{#228B22}{\begin{bmatrix}0 \\1\end{bmatrix}}) = \begin{bmatrix}0 \\5\end{bmatrix}$
Based on this, can we figure out $f\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}$ is? Amazingly, yes!
The below diagram is going to show our approach visually:
We will decompose $\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}$ into a combination of $\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix}})$ and $\textcolor{#228B22}{\begin{bmatrix} 0 \\1\end{bmatrix})}$ (the two vectors which we know the value of $f$ for).
More precisely, here’s how we find the final value:
$f(\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}) = f(\begin{bmatrix} \textcolor{blue}{10} \\ 0\end{bmatrix}) + f(\begin{bmatrix} 0 \\ \textcolor{#228B22}{9}\end{bmatrix})$
$= 10 \cdot f( \textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix})} + 9 \cdot f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix})}$ $= 10 \cdot \textcolor{blue}{\begin{bmatrix} 3 \\ 0 \end{bmatrix}} + 9\cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 5 \end{bmatrix}}$ $= \textcolor{blue}{\begin{bmatrix} 30 \\ 0 \end{bmatrix}} + \textcolor{#228B22}{\begin{bmatrix} 0 \\ 45 \end{bmatrix}}$ $= \begin{bmatrix} 30 \\ 45 \end{bmatrix}$
So we’re able to find $f(\begin{bmatrix}\textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix})$ by just knowing $f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix})}$ and $f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1\end{bmatrix}})$.
Matrices
So is there a way I can quickly say what a linear map does to $\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}$ and $\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}$?
Yes! A simple 2x2 matrix!
$f(x)$ (as we defined it in the previous section) can be represented by the notation $\begin{bmatrix} f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) \end{bmatrix}$ $= \begin{bmatrix} \textcolor{blue}{3} & \textcolor{#228B22}{0} \\ \textcolor{blue}{0} & \textcolor{#228B22}{5} \end{bmatrix}$
This is extremely cool  we can describe the entire function and how it operates on an infinite number of points by a little 4 value table.
Example of Creating a Matrix Representation
Let’s quickly see an example of how we find the matrix representation for a linear map. Let’s imagine we have the linear map $g$ that rotates vectors 90º counter clockwise. What would the matrix $G$ for this function $g$ be?
Let’s as earlier start with just understanding $g$ on our two simple vectors:
$\textcolor{blue}{\begin{bmatrix}1 \\ 0\end{bmatrix}}$ and $\textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}}$.

$\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix}}$ turned 90º counterclockwise is $\begin{bmatrix}0 \\ 1\end{bmatrix}$.

$\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1\end{bmatrix}}$ turned 90º counterclockwise is $\begin{bmatrix}1 \\ 0\end{bmatrix}$.
Putting it altogether, we see that $G$, the matrix representation of $g$, is $G = \begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix}$.
Applying the Function
Ok so we know we can represent a linear map as a matrix  but how do we apply this function on an input?
Specifically, say I have the same function $g$. I want to apply this function on the vector $\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}$. How do we do it?
Let’s start with what we know:
 The first column of the matrix tells us $g(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}})$ .
 the second column tells us $g(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})$.
To make use of that, let’s break $\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}$ into a combination of $\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}$ and $\textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}}$.
$\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} = x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} + y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}$
$g(\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}) = g(x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} + y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})$
Thanks to the amazing properties of linear maps ($g(a+b) = g(a) + g(b)$), we can now simplify this to:
$= g(x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) + g(y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})$
And then using another property of linear maps ($g(c \cdot a) = c\cdot g(a)$),
$= x\cdot g(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) + y \cdot g(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})$
And now we apply the knowledge we have (from the columns of the matrix)!
$= x \cdot \textcolor{blue}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}} + y\cdot \textcolor{#228B22}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}$ $g(\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}) = \begin{bmatrix} 0x  1y \\ x + 0y \end{bmatrix}$
Does this look familiar? Because it is exactly what you would get by carrying out the matrix multiplication $\begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}$!
Matrix Multiplication as Function Composition
Ok so we’ve seen that multiplying a 2x2 matrix by a vector is just applying the function on the vector. But what does matrix multiplication when we are multiplying two 2x2 matrices?
To answer this, we’re going to take a small detour. We know how to apply a function $f$ on an input vector $v$. What if I ask you to find the composition $h = g(f(v))$ where we know $f$ and $g$?
Let’s work through this now:
 Let $G$ be the matrix for the function $g$.
 Let $F$ be the matrix for function $f$.
We want to find the matrix $H$ that represents the function $h = g(f(v))$.
This is going to be:
$H = \begin{bmatrix} g(f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & g(f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})) \end{bmatrix}$
Let $\textcolor{blue}{F_{col1}}$ be the first column of $F$ and let $\textcolor{#228B22}{F_{col2}}$ be the second column.
Then we can simplify to:
$H = \begin{bmatrix} g(\textcolor{blue}{F_{col1}}) & g(\textcolor{#228B22}{F_{col2}}) \end{bmatrix}$
So what’s $g(\textcolor{blue}{F_{col1}})$?
Well we just saw earlier that this is just $G \cdot \textcolor{blue}{F_{col1}}$. So we can now write:
$H = \begin{bmatrix} G \cdot \textcolor{blue}{F_{col1}} & G \cdot \textcolor{#228B22}{F_{col2}} \end{bmatrix}$
Seem familiar? Because it’s exactly what the formula for the 2x2 matrix multiplication $H = G\cdot F$!
So $H$, the matrix representing $g(f(x))$, is just $G\cdot F$.
An Example of Function Composition
Let’s work through a full example to see this linear map composition in action.
 Let $f$ be our function as earlier (represented by the matrix $F = \begin{bmatrix}3 & 0 \\ 0 & 5\end{bmatrix}$).
 Let $g$ be the function that rotates a vector 90º counterclockwise as before (represented by $G = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$).
Finding $g$ Composed With $f$
Let’s work through $h = g \circ f$ applied on the vector $v = \begin{bmatrix} 2 \\ 1\end{bmatrix}$. We can use just the properties of $f$ and $g$ to figure out this result (see diagram below).
 We start with $v$.
 We know $f$ stretches its input by $3x$ in the x direction and by $5x$ in the y direction, leading to $f(v) = \begin{bmatrix} 6 \\ 5 \end{bmatrix}$.
 We then just rotate the resulting vector 90º counterclockwise to get $g(f(v)) = \begin{bmatrix} 5 \\ 6 \end{bmatrix}$.
$g$ Composed With $f$ Using Matrices
Do we get the same result when using function composition through matrices? Let’s find out:
 The matrix for $g \circ f$ is $H = G \cdot F$.
 $H = G \cdot F = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & 5 \end{bmatrix} = \begin{bmatrix} 0 & 5 \\ 3 & 0 \end{bmatrix}$
 To apply $H$ on the vector $v$ we just do $H \cdot v$.
 $H \cdot v = \begin{bmatrix} 0 & 5 \\ 3 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 5 \\ 6 \end{bmatrix}$.
This is exactly what we got when visually applying $g(f(v))$. So we were able to get the same result without having to apply each function one at a time  we were just able to use matrix multiplication to find out the final function and apply it directly. While this already saves us time, imagine how much time this would save if we had to compose 10 or 20 different functions!
Big Picture: Matrices Give Us Power
Remember how much power we gained by being able to represent a function as a polynomial? We were able to combine polynomials, compose them, multiply them  we now have all that same power for linear maps!
We can compose them (matrix multiplication), add them (matrix addition), and invert them (matrix inversion) all by following some simple rules.
A more sophisticated way of saying this is we have an algebra on these functions (just like we have an algebra on integers etc.).
Looking Forward
So we’ve seen pretty clearly that matrices are just functions and that linear algebra is the study of combining these functions.
Understanding this helps us perceive some of the key ideas of linear algebra in a new way. In the next post, we’re going to focus on eigenvectors and see how they provide valuable insights into these linear maps.