Memoization in Depth

Published 2015-03-04. Last modified 2015-08-09.
Time to read: 7 minutes.

Memoization is a well-known technique that can provide a dramatic performance boost.

The material in this course relies heavily upon the material presented in these preceding lectures:

This lecture shows you progressively more sophisticated memoization techniques, starting with a baby memoization that generates output to show how it works, to a Memoize trait, a Memoizer class and two techniques for memoizing methods and functions that accept more than one parameter.

This lecture also shows how a memoized lambda could have unit tests written for it, thereby addressing one of the primary complaints about lambdas.

The sample code for this lecture can be found in courseNotes/src/main/scala/cache/Memoize.scala.

Review

The Recursion and Functional Programming lecture of the Introduction to Scala course showed the following code that memoized the results of the fn method. It uses an immutable.HashMap (discussed in the Immutable Collections lecture) to store values that needed to be permanently stored, and a mutable.WeakhashMap (discussed in the Mutable Collections lecture) to store a potentially large number of computed values for quick access.

Scala code

object FibMem extends App {
  val defaultMap = immutable.HashMap(0 -> BigInt(0), 1 -> BigInt(1))
  val cache = mutable.WeakHashMap[Int, BigInt]().withDefault(defaultMap)

  def fn(n: Int): BigInt = {
    @tailrec
    def fibIter(count: Int, fibN1: BigInt, fibN2: BigInt): BigInt =
      if (count == n) {
        cache += n -> fibN1
        fibN1
      }
      else fibIter(count + 1, fibN2, fibN1 + fibN2)

    fibIter(0, 0L, 1L)
  }

  def fibMem(n: Int): BigInt = cache.getOrElse(n, fn(n))

  Time.fibTest(fibMem)
}

The memoization code is intertwined with the other logic; this was necessary because we had not yet discussed the material presented in the Higher-Order Functions and Parametric Types lectures. This lecture demonstrates techniques that factors out memoization from the rest of the program.

The code examples shown in the remainder of this lecture assume that all values are equally important, so only a WeakHashMap is employed throughout. If you need to memoize a recursive method which starts from fixed values then you should use an immutable.HashMap to store the initial value(s), as shown in the above example.

Baby Steps

Normally only Function1s are memoized, which is to say that methods and functions that accept only one parameter are commonly memoized. We’ll explore two techniques later in this lecture to memoize methods and FunctionNs that accept more than one parameter.

Here is an example of a MemoizeBaby trait that can be mixed into any object or class that contains a method or FunctionN that you would like to memoize. It just consists of one method, memoize, that accepts two parameters: a Function1 to be memoized, and a name for the method that will be output to ease your understanding while learning.

Scala code

trait MemoizeBaby {
  /** Version just for learning.
    * Transform given Function1 instance into another Function1 instance backed by a WeakHashMap for memoization of parameter/result pairs.
    * @param f Must be Function1 (single argument)
    * @param name displayed for this memoized Function, only present for learning purposes; remove for production code */
  def memoize[Key, Value](f: Key => Value, name: String="") = {
    /** Each memoized Function has its own cache */
    val cache = collection.mutable.WeakHashMap.empty[Key, Value]
    (key: Key) => {
      if (cache.keySet.contains(key)) println(s"Retrieving $name($key)=${cache(key)}")
      cache.getOrElseUpdate(key, f(key))
    }
  }
}

An empty cache is created, then a Function1[Key, Value] is defined that references the cache, and because the Function1 is the last expression in the method, it is returned as the value of the method. Key and Value are parametric types, and could have been written as single letters like [T, U], but I wrote it them using full words because it made their purposes more clear.

The Function1 returned by the memoize method has a reference to the cache created within the method, so the cache will remain in memory after memoize returns. This style of programming can lead to memory leaks, but this is not a mistake, it was done on purpose. Just beware that the cache will remain until all references to it are eventually destroyed.

Let’s say we want to memoize the method1 method, defined within a console app that has the MemoizeBaby trait mixed in.

Scala code

object Memo extends App with MemoizeBaby {
  def method1(i: Int): Int = {
    val result = i * 2 + 1
    println(s"Computing f1($i)=$result")
    result
  }
}

We learned about method lifting in the Functions are First Class lecture of the Introduction to Scala course, so we know that method1 can be lifted from the object, trait or class that it was defined in and embedded in a new instance of a Function1[Int, Int]. Scala automatically lifts methods as required, so when we write the following, method1 is lifted into a Function1 called f1.

Scala REPL

scala> val f1 = Memo.memoize(method1, "f1")
f1: Int => Int = <function1>

The type returned by memoize is Int => Int, or written in long form, Function1[Int, Int]. If this seems unfamiliar to you, please review the Functions are First Class and More Fun With Functions lectures of the Introduction to Scala course. If you just need practice on recognizing these forms, please work through the Lambda Review & Drill lecture from that course.

The f1 instance of Function1 is a memoized version of method1. We can invoke a memoized version just the same as the original method or Function1.

Scala REPL

scala> f1(1)
Computing f1(1)=3
res0: Int = 3

scala> f1(1)
Retrieving f1(1)=3
res1: Int = 3

The first time f1 was invoked with an argument of 1, the returned value was computed. All subsequent invocations with that argument are satisfied by returning the cached value, instead of recomputing it.

The memoization cache is created by the memoize method each time it is invoked, which means that any number of methods in the given class or object can be memoized by mixing in the MemoizeBaby trait. If we add the method2 method to the Memo object, we can memoize it also.

Scala code

def method2(i: Int): Int = {
  val result = i * 6 - 21
  println(s"Computing f2($i)=$result")
  result
}

We can see that the caches for each memoized method are separate.

Scala REPL

scala> val f2 = Memo.memoize(method2, "f2")
f2: Int => Int = <function1>

scala> f2(1)
Computing f2(1)=-15
res0: Int = -15

scala> f1(1)
Retrieving f1(1)=3
res1: Int = 3

You can run the above code by typing:

Shell

$ sbt "runMain Memo"

Production Version of Memoize Trait

This version does away with the training wheels, in that the second parameter to the previous version of memoize is not present any more.

Scala code

trait Memoize {
  /** Transform given `Function1` instance into another `Function1` instance backed by a `WeakHashMap` for memoization of parameter/result pairs.
    * @param f Must be `Function1` (single argument) */
  def memoize[Key, Value](f: Key => Value) = {
    /** Each memoized Function has its own cache */
    val cache = collection.mutable.WeakHashMap.empty[Key, Value]
    (key: Key) => {
      cache.getOrElseUpdate(key, f(key))
    }
  }
}

Exercise – Mixing in the Memoize Trait

Memoize method1 and method2 above using the production version of the Memoize trait. You can reference these methods in your code as cache.Memo.method1 and cache.Memo.method2. Test the resulting memoized methods with a variety of input values.

Solution

Scala code

package solutions

object MemoProd extends App with cache.Memoize {
  import cache.Memo.{method1, method2}

  val f1: Int => Int = memoize(method1)
  val f2: Int => Int = memoize(method2)

  // Compute values; side effect: memoizes results into caches
  1 to 3 foreach { i =>
    println(s"f1(i)=${f1(i)}")
    println(s"f2(i)=${f2(i)}")
  }

  // Fetch first 3 results from memo caches and compute the other values
  1 to 6 foreach { i =>
    println(s"f1(i)=${f1(i)}")
    println(s"f2(i)=${f2(i)}")
  }
}

You can run this solution by typing:

Shell

$ sbt "runMain solutions.MemoProd"

The output of f1 and f2 shows when a value is computed for the first time.

Computingf1(1)=3
f1(i)=3
Computing f2(1)=-15
f2(i)=-15
Computing f1(2)=5
f1(i)=5
Computing f2(2)=-9
f2(i)=-9
Computing f1(3)=7
f1(i)=7
Computing f2(3)=-3
f2(i)=-3
f1(i)=3
f2(i)=-15
f1(i)=5
f2(i)=-9
f1(i)=7
f2(i)=-3
Computing f1(4)=9
f1(i)=9
Computing f2(4)=3
f2(i)=3
Computing f1(5)=11
f1(i)=11
Computing f2(5)=9
f2(i)=9
Computing f1(6)=13
f1(i)=13
Computing f2(6)=15
f2(i)=15

Memoizing Arbitrary Methods

If you don’t want to mix in the Memoize trait into the enclosing class or trait that contains the method that you wish to memoize, or you can’t because you don’t have access to the source code or because it is a lambda, fear not! You can use the technique shown in this section to memoize any Function1, including anonymous Function1s. Recall that another name for an anonymous FunctionN is lambda function, or simply lambda (discussed in the Functions are First Class lecture of the Introduction to Scala course). The private[this] access modifier was discussed in the Deceptively Familiar: Access Modifiers lecture of the same course.

Scala code

class Memoizer[Key, Value](f: Key => Value) {
  private[this] val vals = collection.mutable.Map.empty[Key, Value]

  def apply(x: Key): Value =
    if (vals.keySet.contains(x)) {
      vals(x)
    } else {
      val y = f(x)
      vals += x -> y
      y
    }
}

object Memoizer {
  def apply[Key, Value](function1: Key => Value) = new Memoizer(function1)
}

The Memoizer object has an apply method, which we have seen used as a factory method to create class instances many times before.The Memoizer class also has an apply method, which means that an instance of the class can invoke that apply default method. For example, let’s create an instance of the Memoizer class called memoizedDoubler which memoizes a lambda and then implicitly invoke the Memoizer class’s apply method.

Scala REPL

scala> val memoizedDoubler = Memoizer((_:Int) * 2)
memoize: Memoizer[Int,Int] = Memoizer@28e19366

scala> memoizedDoubler(3)
res2: Int = 6

Because the memoized lambda is stored in a variable, we could write unit tests for it, which removes one of the primary complaints about lambdas.

Exercise – Wrapping a Method in the Memoizer Class

The following code uses the same calculatePiFor and time methods we’ve seen before in order to compute Pi to arbitrary precision and to measure the length of time it takes to perform the computation. Your task is to use the Memoizer class to compute piMemoized.

Scala code

object MemoDemo extends App {
  val calculatePiFor: Int => Double = (decimals: Int) => {
      var acc = 0.0
      for (i <- 0 until decimals)
        acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)
      acc
    }

  /** Measure execution time of the given block of code */
  def time[T](msg: String)(block: => T): T = {
    val t0 = System.nanoTime()
    val result: T = block
    val elapsedMs = (System.nanoTime() - t0) / 1000000
    println(s"  Elapsed time for $msg: " + elapsedMs + "ms")
    result
  }

  val piMemoized = ???

  def doIt(n: Int): Double = time(s"pi to $n decimals")(piMemoized(n))

  doIt(2000000)
  doIt(2000000)
  doIt(3000000)
  doIt(3000000)
}

Solution

The solution may seem almost trivial, but do you understand it?

Scala code

val piMemoized = Memoizer(calculatePiFor)

You can run this solution by typing:

Shell

$ sbt "runMain solutions.MemoDemo"

Memoizing Higher-Arity FunctionNs

If you have a method or FunctionN that requires multiple arguments, you have two choices:

Use a tuple of the arguments as the key for one of the memoize functions we used earlier. This approach effectively folds an extra dimension’s worth of requests and responses into a 2D space. Lookups are a bit slower than for scalar keys, and memory usage could explode geometrically, but the results are still obtained much faster than without memoization.
Create a partially applied function from the FunctionN to be memoized, and only memoize the partially applied function. This means that the remaining parameters must be provided If you need a review of partially applied functions and how they relate to FunctionNs, see the Partially Applied Functions lecture before going any further.

Let’s explore each of these approaches. Each of them is fancy in its own special way.

If you need to memoize a method or FunctionN with arity > 2 you could combine both of these approaches.

Memoizing Using a Tuple as a Key

The type parameter Key could just as well be a TupleN[Type1, Type2... TypeN] instead of a scalar type like Int or String. For example, let’s memoize a method that accepts three parameters.

def dupChop(string: String, times: Int, maxLength: Int): String = {
  val result = string * times
  result.substring(0, math.min(result.length, maxLength))
}

The dupChop method can be converted to a Function1 that accepts a tuple quite easily. As we learned in the Functions are First Class lecture, we can lift the method to a Function3 by following it with a space, then an underscore, like this.

Scala REPL

scala> dupChop _
res3: (String, Int, Int) => String = <function3>

Note that a Function3[String, Int, Int, String] was returned, but we wanted a Function1[Tuple3[String, Int, Int], String], which can be written in shorthand as ((String, Int, Int)) => String. To do this, simply call the tupled method on the Function3.

Scala REPL

scala>scala>  (dupChop _).tupled
res4: ((String, Int, Int)) => String = <function3>

The syntax is easier to read when written using postfix notation, so I’ll enable that syntax as discussed in the Scala Imports and Packages lecture of the Introduction to Scala course.

Scala REPL

scala> import scala.language.postfixOps
import scala.language.postfixOps

scala> val tupledDupChop = dupChop _ tupled
tupledDupChop: ((String, Int, Int)) => String = <function1>

Now that we have a Function1 called tupledDupChop that accepts a Tuple3[String, Int, Int] and returns a String. Lets create an instance of the necessary type of tuple and then pass it to tupledDupChop.

Scala REPL

scala> val tuple = ("hi ", 3, 10)
tuple: (String, Int, Int) = ("hi ",3,10)

scala> tupledDupChop(tuple)
res5: String = "hi hi hi "

As you can see, we got the expected result from invoking tupledDupChop: "hi hi hi". We can memoize this tupled Function1 in two ways.

By mixing the Memoize trait into the enclosing object, trait or class
By wrapping tupledDupChop in a Memoizer instance.

Here is an example of the latter approach:

Scala REPL

scala> val tdc = Memoizer(tupledDupChop)
tdc: Memoizer[(String, Int, Int),String] = Memoizer@75bd28d

scala> tdc(tuple)
res6: String = "hi hi hi "

scala> tdc(("bye ", 5, 20))
res7: String = "bye bye bye bye bye "

You can run this code by typing:

Shell

$ sbt "runMain MemoizeTuple"

Memoizing Partially Applied Functions

By binding a FunctionN parameter we can create a partially applied function. Let’s define a method called method that accepts two parameters.

Scala REPL

scala> def method(i: Int, string: String): String = string * i
method1: (i: Int, string: String)String

As you will see next, there is no difference in how memoization works regarding how the parameters are bound.

Binding a Parameter to a Variable

Now we’ll bind the first parameter to the value of key1 and lift it into a partially applied Function1 using placeholder syntax.

Scala REPL

scala> val key1 = 1
key1: Int = 1

scala> val pf1 = method(key1, _: String)
pf1: String => String = <function1>

scala> pf1("hi ")
res8: String = "hi "

When we invoked the partially applied function pf1 we just supplied the value of the second parameter, and the result was returned.

Now lets memoize the partially bound Function1 and invoke it.

Scala REPL

scala> val memoizedF1 = memoize(pf1)
memoizedF1: String => String = <function1>

scala> memoizedF1("hi ")
res10: String = "hi "

Binding a Parameter to a Constant

Let’s try that again, but instead of providing the value to be bound via a variable, we’ll provide a constant.

Scala REPL

scala> val pf2 = method(2, _: String)
pf2: String => String = <function1>

scala> pf2("bye ")
res8: String = "bye bye "

Now lets memoize the partially bound Function1 and invoke it.

Scala REPL

scala> val memoizedF2 = memoize(pf2)
memoizedF2: String => String = <function1>

scala> memoizedF2("bye ")
res11: String = "bye bye "

Binding a Parameter to the Result of a Computation

Let’s try that again, but this time provide a computation that will provide the value for the bound variable.

Scala REPL

scala> val pf42 = method(key1*42, _: String)
pf42: String => String = <function1>

scala> pf42("42 ")
res9: String = "42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 "

Now lets memoize the partially bound Function1 and invoke it.

Scala REPL

scala> val memoizedF42 = memoize(pf42)
memoizedF42: String => String = <function1>

scala> memoizedF1("42 ")
res12: String = "42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 "

Previous lecture: Using Partially Applied Functions

Next lecture: MultiThreading

© Copyright 1994-2024 Michael Slinn. All rights reserved.
If you would like to request to use this copyright-protected work in any manner, please send an email.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.