Dhruv's Blog     About     Archive     Feed

PageRank - How Eigenvectors Power the Algorithm Behind Google Search.

Welcome back! In the last post we derived Eigenvectors. In particular, we saw how useful they are in analyzing matrices we need to apply again and again.

In this post, we’re going to dive into one of the most famous applications of Eigenvectors - the original PageRank algorithm that allowed Google to create the world’s best search engine. PageRank was Larry Page’s phD thesis while at Stanford. He spun out this amazing algorithm to redefine search and create one of the most iconic companies in the modern era. We’re going to re-derive it here and see that it’s just a simple application of Eigenvectors!

By the end of this post, you will:

  1. Feel like you could have come up with PageRank yourself.
  2. Understand how eigenvectors are central to PageRank.
  3. See how central eigenvectors are to dynamic processes in general.

Making Friends

Ok let’s build out our understanding of pagerank using a hypothetical scenario.

Let’s say you just moved to a new city and you’re trying to make some new friends. You need to figure out which of your new friends to hang out with in your free time. But unfortunately you’re super busy and have very little free time!

So you need to rank all these potential friends to prioritize which ones to chill with.

This could be you! Just need to find the friends...

In a moment of clarity you realize the following:

  1. Friends you’re close with are probably going to hang out with people you’d like.
  2. Someone everyone is friends with is probably someone you are going to be friends with.

You decide to use these two insights to create a scheme you call FriendRank.

FriendRank gives a potential friend a score (FriendScore) based on:
  1. How many people are friends with this person weighted by,
  2. How close you are to the people who are friends with them.
In the above picture, how close you are to a person is shown by the size of the circle. To find your friend score for potential friend FF, you just add up how close you are to AA, CC, and DD.

Let’s write this down as a formula. Let’s say the strength of the friendship between you and some person pp is FS(p)FS(p). Let Friends(p)Friends(p) be the set of friends of pp.

The FS (FriendScore) for a potential friend Fred is: FS(Fred)=pFriends(Fred)FS(p) FS(Fred) = \sum_{p \in Friends(Fred)} FS(p)

Calculating Our First Friend Score

Let’s work through an example. Say you have 4 potential friends:

  1. Jimmy (closeness score - 5)
  2. Maya (closeness score - 9)
  3. Lisa (closeness score - 8)
  4. Fred (?)

In this world, you’re really close to Maya (with a score of 9) and Lisa (8) but only kinda close to Jimmy (5).

Only Jimmy and Maya are friends with Fred.

Graphically, this is:

Only Jimmy and Maya claim to be friends with Fred. Lisa seems to be tearing him apart.

So: FS(Fred)=FS(Jimmy)+FS(Maya) FS(Fred) = FS(Jimmy) + FS(Maya) FS(Fred)=5+9=14FS(Fred) = 5 + 9 = 14


Accounting For People who are Friends With Everyone

Not such a crazy idea right? You start this process and you realize one thing. Your really close friend Maya claims to be friends with EVERYONE. You start wondering, “Hmm maybe I need to downweight Maya’s opinion because she doesn’t really seem to be that picky”.

This is easy - we just divide Maya’s weighting score by how many people she claims to be friends with. That way if Maya claims 100 friends, his weighting for each friend is divided by 100. So our formula is now:

FS(Fred)=pFriends(Fred)FS(p)1NumFriends(p) FS(Fred) = \sum_{p \in Friends(Fred)} FS(p) \cdot \frac{1}{NumFriends(p)}

Going back to our example, our score becomes: FS(Fred)=FS(Jimmy)2+FS(Maya)4FS(Fred) = \frac{FS(Jimmy)}{2} + \frac{FS(Maya)}{4} FS(Fred)=52+94=4.75FS(Fred) = \frac{5}{2} + \frac{9}{4} = 4.75

Awesome!

Let’s now simplify this formula a little.

  1. Let’s call the set of all friends FF.
  2. Let IsFriends(p,Fred)={1if p calls Fred a friend0otherwiseIsFriends(p, Fred) = \begin{cases} 1 & \text{if}\ \textrm{p calls Fred a friend} \\ 0 & \text{otherwise} \end{cases}

    Then we can simplify our earlier formula as shown below:

FS(Fred)=pFriends(Fred)FS(p)1NumFriends(p)FS(Fred) = \sum_{p \in Friends(Fred)} FS(p) \cdot \frac{1}{NumFriends(p)}

FS(Fred)=pFFS(p)IsFriends(p,Fred)NumFriends(p)FS(Fred) = \sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Fred)}{NumFriends(p)}
All the people who aren’t friends with Fred just get 00 terms which gives us the same thing.

So Who’s Friends With Whom?

Ok this is looking great! You decide to go ahead and implement this scheme with your friends Bobby, Lauren, Daniel, Rony, and Saba. You start by asking all your friends to write down who they’re friends with! So you send them all the following email:

From: parry.lage@boogle.com
To: Bobby, Lauren, Daniel, Rony, Saba
Subject: Who are your friends?

My dearest "friends",

I’m trying to rank you and all my other potential friends and need your help to do so. Below is a list of people. Please put a 1 next to their name if you call them your friend.

Person Friends?
Bobby ___
Lauren ___
Daniel ___
Rony ___
Saba ___

Yours truly,
Parry Lage

P.S. Just because I'm sending you this email doesn't mean we're friends.

So Saba sends you something like this:

Person IsFriends
Bobby 11
Lauren 11
Daniel 00
Rony 00
Saba 00

Looking good so far! You then remember you need to downweight Saba’s score by how many friends she claims to have. She’s claiming 2 friends here so we divide by 2 giving us:

Person IsFriends
Bobby 12\frac{1}{2}
Lauren 12\frac{1}{2}
Daniel 00
Rony 00
Saba 00

Sweet.

You now decide to organize all your friends responses into a little table so that it’s easy to see. Each column of the table is a different friend. It looks like this (notice Saba’s answers above are the last column in the table below):

Bobby Lauren Daniel Rony Saba
Bobby 00 11 13\frac{1}{3} 00 12\frac{1}{2}
Lauren 14\frac{1}{4} 00 13\frac{1}{3} 00 12\frac{1}{2}
Daniel 14\frac{1}{4} 00 00 00 00
Rony 14\frac{1}{4} 00 00 00 00
Saba 14\frac{1}{4} 00 13\frac{1}{3} 11 00

This table is just a matrix, which we’ll call AA. The element i,ji,j in the matrix is

A[i,j]=IsFriends(j,i)NumFriends(j)A[i,j] = \frac{IsFriends(j, i)}{NumFriends(j)}

For example, let’s examine A[1,5]A[1,5]:

  • Person 11 is Bobby and Person 55 is Saba.
  • IsFriends(5,1)IsFriends(5,1) is whether Saba claims to be friends with Bobby. We saw in her response she did, so IsFriends(5,1)=1IsFriends(5,1) = 1.
  • NumFriends(5)NumFriends(5) is the number of friends Saba has (2). Hence NumFriends(5)=2NumFriends(5) = 2.
  • So A[1,5]=12A[1, 5] = \frac{1}{2} just as it is in the table.

Notice additionally that IsFriendsIsFriends is not reflexive! For instance, Rony may claim to be friends with Saba, but in the above example, Saba doesn’t call Ron a friend.

Hence, IsFriends(Ron,Saba)=1IsFriends(Ron, Saba) = 1, but IsFriends(Saba,Ron)=0IsFriends(Saba, Ron) = 0.


Adding Your Own Scores

You then decide you want to come up with an initial guess for everyone’s FriendScore!

There are some people you know better than others so this is really a guess for what your final FriendScore is for each of them. You write down the following and call it FS0FS_{0} - your initial FriendScore:

Person FriendScore
Bobby 11
Lauren 11
Daniel 11
Rony 11
Saba 11

Combining Everyone’s Opinion

Omg this is a lot more work than I expected.

Ok we now have the following:

  1. AA - a Matrix who’s elements are A[i,j]=IsFriends(j,i)NumFriends(j)A[i,j] = \frac{IsFriends(j, i)}{NumFriends(j)}.
  2. FS0FS_0 - An initial guess of everyone’s FriendScore.

Now that we know who’s friends with who, we can figure out who’s really likely to be our friend.

We’d like to come up with a new FriendScore FS1FS_{1} based on this information.

How do we do that?

Recall from earlier our formula for Fred’s FriendScore:

FS(Fred)=pFFS(p)IsFriends(p,Fred)NumFriends(p)FS(Fred) = \sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Fred)}{NumFriends(p)}

So similarly, we want to update our rankings, FS1FS_{1}, for our friends to be:

Person FriendScore
Bobby pFFS(p)IsFriends(p,Bobby)NumFriends(p)\sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Bobby)}{NumFriends(p)}
Lauren pFFS(p)IsFriends(p,Lauren)NumFriends(p)\sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Lauren)}{NumFriends(p)}
Daniel pFFS(p)IsFriends(p,Daniel)NumFriends(p)\sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Daniel)}{NumFriends(p)}
Rony pFFS(p)IsFriends(p,Rony)NumFriends(p)\sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Rony)}{NumFriends(p)}
Saba pFFS(p)IsFriends(p,Saba)NumFriends(p)\sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Saba)}{NumFriends(p)}

Let’s shorten our notation:

  1. Let IFIF be IsFriendsIsFriends
  2. Let NFNF be NumFriendsNumFriends

Then Bobby’s ideal new ranking is:

pFIF(p,Bobby)NF(p)FS(p)\sum_{p \in F} \frac{IF(p, Bobby)}{NF(p)} \cdot FS(p)

This looks an awful lot like this dot product (go ahead and compute it!):

[IF(Bobby,Bobby)NF(Bobby)IF(Lauren,Bobby)NF(Lauren)IF(Daniel,Bobby)NF(Daniel)IF(Rony,Bobby)NF(Rony)IF(Saba,Bobby)NF(Saba)][FS(Bobby)FS(Lauren)FS(Daniel)FS(Rony)FS(Saba)]\begin{bmatrix} \frac{IF(Bobby, Bobby)}{NF(Bobby)} & \frac{IF(Lauren, Bobby)}{NF(Lauren)} & \frac{IF(Daniel, Bobby)}{NF(Daniel)} & \frac{IF(Rony, Bobby)}{NF(Rony)} & \frac{IF(Saba, Bobby)}{NF(Saba)} \end{bmatrix} \cdot \begin{bmatrix} FS(Bobby) \\ FS(Lauren) \\ FS(Daniel) \\ FS(Rony) \\ FS(Saba) \end{bmatrix}

Breaking Down the Dot Product - First Vector

Let’s try and see if we have the two vectors in this dot product. Let’s start with the first vector:

[IF(Bobby,Bobby)NF(Bobby)IF(Lauren,Bobby)NF(Lauren)IF(Daniel,Bobby)NF(Daniel)IF(Rony,Bobby)NF(Rony)IF(Saba,Bobby)NF(Saba)]\begin{bmatrix} \frac{IF(Bobby, Bobby)}{NF(Bobby)} & \frac{IF(Lauren, Bobby)}{NF(Lauren)} & \frac{IF(Daniel, Bobby)}{NF(Daniel)} & \frac{IF(Rony, Bobby)}{NF(Rony)} & \frac{IF(Saba, Bobby)}{NF(Saba)} \end{bmatrix}

Where can we find this vector?

Remember the elements of AA are just A[i,j]=IF(j,i)NF(j)A[i,j] = \frac{IF(j, i)}{NF(j)}. Let’s look at AA again, substituting the first row of the matrix with the formula representation - we’ll see that it’s the same as this vector!

Bobby Lauren Daniel Rony Saba
Bobby IF(Bobby,Bobby)NF(Bobby)\frac{IF(Bobby, Bobby)}{NF(Bobby)} IF(Lauren,Bobby)NF(Lauren)\frac{IF(Lauren, Bobby)}{NF(Lauren)} IF(Daniel,Bobby)NF(Daniel)\frac{IF(Daniel, Bobby)}{NF(Daniel)} IF(Rony,Bobby)NF(Rony)\frac{IF(Rony, Bobby)}{NF(Rony)} IF(Saba,Bobby)NF(Saba)\frac{IF(Saba, Bobby)}{NF(Saba)}
Lauren - - - -
Daniel - - - - -
Rony - - - - -
Saba - - - - -

Great so we have the first vector in the dot product.

Breaking Down the Dot Product - Second Vector

Let’s now see if we can get the second one:

[FS(Bobby)FS(Lauren)FS(Daniel)FS(Rony)FS(Saba)]\begin{bmatrix} FS(Bobby) \\ FS(Lauren) \\ FS(Daniel) \\ FS(Rony) \\ FS(Saba) \end{bmatrix}

You’ve already created this vector when you came up with your initial guesses FS0FS_{0} in the following table:

Person FriendScore
Bobby 11
Lauren 11
Daniel 11
Rony 11
Saba 11
So we have both vectors of the dot product needed for finding FS(Bobby)FS(Bobby).
  1. The first vector is just the first row of AA.
  2. The second vector is FS0FS_{0}.

Matrix Multiplication To Find FriendScores

If we just multiply AFS0A \cdot FS_{0} we compute this dot product and get FS(Bobby)FS(Bobby) in the first row of the resulting vector.

AFS0=[pFFS(p)IsFriends(p,Bobby)NumFriends(p)]=[FS(Bobby)] A \cdot FS_{0} = \begin{bmatrix} \sum_{p \in F} FS(p) \cdot \frac{IsFriends(p, Bobby)}{NumFriends(p)} \\ - \\ - \\ - \\ - \end{bmatrix} = \begin{bmatrix} FS(Bobby) \\ - \\ - \\ - \\ - \end{bmatrix}

More generally, you’ll find that the second row of this vector is exactly FS(Lauren)FS(Lauren) etc, giving us:

AFS0=[FS(Bobby)FS(Lauren)FS(Daniel)FS(Rony)FS(Saba)] A \cdot FS_{0} = \begin{bmatrix} FS(Bobby) \\ FS(Lauren) \\ FS(Daniel) \\ FS(Rony) \\ FS(Saba) \end{bmatrix}

So, if we call FS1FS_{1} the result of folding in the results of AA into your initial predictions, we get: AFS0=FS1 A \cdot FS_{0} = FS_{1}

When Do We Stop?

Ok so we now have a super quick way of finding our next FriendScore. Just take our current guess, which we’ll call FSiFS_i, multiply it by AA, and obtain FSi+1FS_{i+1}:

FSi+1=AFSiFS_{i+1} = A \cdot FS_{i}

Everytime we compute AFSiA \cdot FS_{i}, we are updating our guess of our Friend Score with the rest of our friends opinions.

When do we stop this process? We stop when FSi+1FS_{i+1} stops changing:

FSi+1=FSi=FSFS_{i+1} = FS_{i} = FS^*

At this point, which we call FSFS^*, AFS=FSA \cdot FS^* = FS^* Does this look like the following? Ax=λxAx = \lambda x Yes! This is exactly the formula for an eigenvector!
So the final solution FSFS^*, is an eigenvector of AA!

Seeing This Visually

How does FSiFS_i move to FSFS^*? In many ways, the matrix AA pulls FSiFS_i towards its eigenvector FSFS^*.

There’s an awesome visualization of this here by setosa.io that I highly recommend you check out. Here’s an extract directly from their writeup. In the below image, s1s1 is an eigenvector of AA. Notice how as we keep applying AA on vv, we move closer to s1s1. In particular, the sequence vv, AvAv, A2vA^2v, … eventually ends up directly on s1s1.

As you keep applying AA on vv, vv gets pulled towards the eigenvector s1s_1.

Similarly in FriendRank, as we keep multiplying FS0FS_{0} by AA, we get the sequence AFS0A \cdot FS_0, A2FS0A^2 \cdot FS_0, … which eventually ends up on FSFS^* - the eigenvector of AA.


From Friends to Pages

You’ve probably guessed at this point that you can switch the word friends to pages in the above and get PageRank! In the world of webpages:

  1. One webpage claims to be “friends” with another if it links out to the webpage.

  2. Google finds out who’s friends with who (our initial survey) by just crawling the web and finding out who links to who.

  3. The number of “friends” a webpage claims to have is just the number of links present on the webpage.

  4. We create the matrix AA by crawling the web. Once we have it, we find its eigenvector to get an estimate of the PageRanks.

To get PageRank, just replace people with pages.

Putting this together gives us a way of ranking all webpages on the internet for relevance!

I find it really cool how PageRank models the abstract concept of webpages using simple insights into how we interact as humans.

A few quick asides:

  1. The formula we’ve derived for PageRank is a simplified version of the actual one. To see the full formula, read the original paper here.

  2. Notice that PageRank is actually a next level pun. PageRank was written by Larry Page about ranking webpages - well done Larry.


Conclusion

So we’ve seen that eigenvectors are at the heart of one of the most important algorithms of all time! More generally, whenever you see a linear function being applied again and again and again (Like AA), you are without a doubt going to be looking for the eigenvectors of that linear function/matrix. The eigenvectors will tell you where that process ends!

I hope you enjoyed this unique and powerful application of eigenvectors. In the next post, we’re going to step back into Matrix Algebra. We’ll learn about Kernels (or Nullspaces) and visually see how they help us immediately grasp what matrices do to their inputs.