Strassen's Fast Matrix Multiplication Algorithm

Posted on January 31, 2017

This semester I’m TAing the course CSC373: Algorithm Design, Analysis, and Complexity at UofT. Before I give a tutorial, I’ll usually write up a set of tutorial notes. Whenever I can, I’ll post a (slightly) cleaned-up version of my notes here.

Today’s topic is algorithms for fast matrix multiplication. Given two matrices and with entries , the matrix has entries defined by

Given two matrices and , it is straightforward to compute their product , namely, by simply using this formula iteratively to compute each entry in the product individually.

How quickly does this simple algorithm run? We’ll measure complexity in terms of the number of arithmetic operations performed. To compute a single entry in , we perform multiplications, and addition operations, so arithmetic operations. Since is an matrix, there are entries we need to compute, so the total running time is .

Simple Divide and Conquer Algorithm

Based on the definition of matrix multiplication (from the formula), it seems like is the best running time we can hope for, for any algorithm for multiplying two square matrices. It turns out that we can, in fact, do better, by applying the divide and conquer strategy.

For simplicity, we’ll assume that the inputs and are both matrices, and that is always a power of 2. This will make things easier, since it ensures that we can always divide an matrix neatly into four equally-sized matrices. It’s okay to assume this, since we can always pad our matrices with zeros until is a power of 2, while increasing the size of each matrix by a factor of at most 4.

The main idea is that we view matrix multiplication in terms of block matrices:

where each submatrix is an matrix. It turns out that we can compute a product of block matrices by applying the usual formula, but using matrix multiplication (of the smaller matrices) rather than by multiplying and adding numbers, so that and so on. This leads to a natural recursive algorithm.

For the base case, the product of two matrices is just .

Otherwise, given two matrices, partition each of them into four matrices as above. Perform 8 matrix multiplications recursively, and combine the results:

Running Time

Let be the number of steps that this algorithm takes to multiply two matrices. The base case is simple; it’s a single arithmetic operation, so .

For the recursive case, we perform 8 recursive calls on matrices. We also do extra work to combine (via matrix addition) the results of these subproblems. This gives us the recurrence relation

However, if we solve this recurrence using the master theorem, we get , which is the same running time as the simple iterative algorithm. So why even bother with any of this divide and conquer business?

Strassen’s Algorithm (CLRS 4.2)

Key idea behind Strassen’s algorithm: Using a bit (a lot) of matrix algebra, we can reduce the number of recursive calls from 8 down to 7. This will reduce our running time to .

Strassen’s algorithm involves four high-level steps, given matrices and :

Partition and into submatrices. This takes time using indices (or by copying entries).
Create 10 matrices by adding and subtracting the submatrices from step 1. This requires time.
Using the submatrices in step 1 and the 10 matrices from step 2, recursively compute 7 matrix products .
Compute by adding and subtracting combinations of the matrices. time.

Now for the details, here are the matrices that we compute in step 2:

Then in step 3, we compute the following matrices:

In step 4, we combine the results of previous computations in the following way:

Checking that this works out:

The rest of the entries of an be checked similarly.

Improvements

Strassen’s algorithm gives us an asymptotic improvement over the iterative matrix multiplication algorithm. The bound seems a bit unnatural, so one might ask “can we do better?” What’s the best we could possibly hope to do? Since the output is an matrix, it’ll take steps to write down the output, so any algorithm for multiplying two matrices has a running time of at least .

Over the years, since Strassen’s algorithm was first published, computer scientists have developed faster and faster matrix multiplication algorithms, gradually bringing down the exponent over time. The following table gives us an idea of how upper bounds on the complexity of matrix multiplication have evolved over the years.

Increasingly better upper bounds on the complexity of matrix multiplication.
Author(s)	Year	Upper bound
Binet (?)	1812
Strassen	1969
Coppersmith–Winograd	1990
Andrew Stothers	2010
Virginia Vassilevska Williams	2013
François Le Gall	2014

As you can see, although we have seen improved bounds, progress has been gradual, and we are still far from the original trivial lower bound of . In fact, this leads us to one of the major open problems in theoretical computer science: Can we prove an upper bound that matches this lower bound (i.e. by finding an algorithm that multiplies matrices in time)? If not, can we prove an optimal lower bound?

In practice though (as far as I can tell; I’m not an expert on scientific computing), we do not typically use algorithms that are faster than Strassen’s algorithm to actually multiply matrices. This is because the constants hidden in the big O notation are so large that they only provide a noticeable speedup for extremely huge matrices. There are other factors that need to be considered when choosing a practical matrix multiplication algorithm as well, such as numerical stability (due to the finite precision of floating point numbers), and cache performance.

There also exist other ways of obtaining faster matrix multiplication algorithms, such as exploiting shared-memory parallel processing and distributed computation to multiply matrices, however, that’s a topic for another day.