Mike Slinn

For-Loops and For-Comprehensions

— Draft —

Published 2014-02-08. Last modified 2016-10-24.
Time to read: 11 minutes.

This lecture focuses on for-loops and for-comprehensions. Both flavors of for statements are explained, demonstrated and contrasted.

The sample code for this lecture can be found in courseNotes/src/main/scala/ForFun.scala.

Scala’s for keyword is used to write for-expressions. There are two types of for-expressions: for-loops and for-comprehensions. The only difference in how they are written is the presence or absence of the yield keyword. There are huge differences in how the two flavors of for-expressions work, however.

For-Loops

For-loops return Unit; that means they are only useful for their side effects. They use a generator, which defines a temporary, locally scoped immutable variable.

Let’s start off by printing out something 3 times.

Scala REPL
scala> for ( i <- 1 to 3 ) println("Hello, world!")
Hello, world!
Hello, world!
Hello, world! 

We saw Ranges in the Learning Scala Using The REPL lecture of the Introduction to Scala course. In this case, the Range 1 to 3 is called a generator, and it is written to the right of the <- symbol. The Scala compiler creates a locally scoped, temporary immutable variable called i to be instantiated for each value that the generator emits, and the body of the for expression is executed for each instantiation of i. The println expression returns Unit, but that is not why the for loop returns Unit; Scala for-loops always return Unit, regardless of what might be in the body.

Guards

Generators in for-loops can incorporate a guard, which is actually a filter. The syntax for a guard is the if keyword, followed by a boolean expression. The boolean expression is not enclosed in parentheses.

Here is how we could use a generator with a guard to print out a list of even numbers up to 10:

Scala REPL
scala> for ( i <- 1 to 10 if i % 2 == 0 ) println(i)
2
4
6
8
10 

Compare the above statement to the following, which is similar to what the compiler actually generates.

Scala REPL
scala> 1 to 10 filter( _ % 2 == 0 ) foreach { println }
2
4
6
8
10 

In the above, foreach is a higher-order function that we passed a lambda function to. We discussed lambda functions and the associated shorthand in the Higher-Order Functions lecture.

BTW, the curly braces around the println are optional, since there is only one statement in the body of the foreach construct. We can rewrite the expression without curly braces like this.

Scala REPL
scala> 1 to 10 filter( _ % 2 == 0 ) foreach println
2
4
6
8
10 

Of course, if we wanted to write this in a real program, we might have written the expression without using a filter this way:

Scala REPL
scala> ( 2 to 10 by 2 ) foreach println
2
4
6
8
10 

Here is an equivalent for-loop:

Scala REPL
scala> for ( i <- 2 to 10 by 2 ) println(i)
2
4
6
8
10 

Parentheses or Curly Braces?

Either parentheses or curly braces can be used to surround the for-loop generator(s) or the for-loop body. As always with Scala, you may use either parentheses or curly braces when enclosing a single expression, but curly braces must be used to enclose a block of code consisting of more than one expression.

Generators

You might want to get into the habit of always using curly braces around the generators. For example, this does exactly the same thing as the expression we saw a moment ago, written with parentheses.

Scala REPL
scala> for { i <- 2 to 10 by 2 } println(i)
2
4
6
8
10 

As is normal with Scala you could use parentheses to contain multiple expressions if semicolons are used to separate the expressions. This is helpful when writing a one-line expression for the REPL.

Scala REPL
scala> for (
   i      <- List(1, 2, 3);
   string <- List("a", "b", "c") if i % 2 == 0
 ) println(string * i)
aa
bb
cc 

Here is an example of using curly braces to contain two generators; note that no semicolons are required:

Scala REPL
scala> for {
   i      <- List(1, 2, 3)
   string <- List("a", "b", "c") if i % 2 == 0
 } println(string * i)
aa
bb
cc 

Body

The body of the for-expression can consist of a block of code contained within parentheses or curly braces. As we have already seen, if the body consists of a single expression then neither parentheses nor curly braces are required. Again, curly braces are required if the body of the for-expression contains two or more expressions. I have adopted the convention of not enclosing bodies consisting of only a single expression, otherwise I use curly braces to enclose the body.

Scala REPL
scala> for ( i <- 1 to 10 if i % 2 == 0 ) {
     |   val str = "abcdefghijk".substring(0, i)
     |   println(str)
     | }
ab
abcd
abcdef
abcdefgh
abcdefghij 

Definition (Assignment)

Variable names that appear on the left-hand side of a generator cause a new temporary and locally scoped immutable variable to be defined.

A for-loop can use generators or assignment to a locally defined variable in order to create locally scoped temporary immutable variables. In this code example, j is a local variable defined within the scope of the for loop, whose value is accessible from the point where it is defined until the end of the for-loop (including the body of the for-loop).

Scala REPL
scala> for {
     |   i      <- List(1, 2, 3)
     |   j      =  i * 2
     |   string <- List("a", "b", "c") if i == j / 2
     | } println(string * j)
aa
bb
cc
aaaa
bbbb
cccc
aaaaaa
bbbbbb
cccccc 

Shadowed Variables

As we have just seen, assigning a value in a for-loop or for-comprehension defines a new local variable. If this variable has the same name as a variable in an outer scope, the outer variable will be shadowed, which means it will be inaccessible from the inner scope. This can result in hard-to-find bugs. Here is an example:

Scala REPL
scala> val outerVariable = 0
outerVariable: Int = 0
scala> scala> for { i <- List(1, 2, 3) outerVariable = i // error, silently defines a shadow variable string <- List("a", "b", "c") if i % 2 == 0 } println(string * i) aa bb cc
scala>
outerVariable res8: Int = 0

As you can see, you can place any code you want on the right-hand side of the <- and = operators of a for-comprehension without any ill effect. However, all variable names that appear on the left-hand side of those operators cause a new temporary and locally scoped immutable variable to be defined with that name.

Short-Circuiting

We discussed Option in the Option, Some and None lecture of the Introduction to Scala course, discussed Try in the Try and try/catch/finally lecture of the same course, and we will discuss Future over several lectures starting with the Futures & Promises lecture later in this course.

Option and Future are both monads, and while Try is not a true monad, it behaves like one when used in a for-expression. We discussed monads in the Combinators lecture; in summary, you can think of monads as containers that provide standard operations such as map, flatMap and filter. While Option, Try and Future can only assume one value, other monads, such as List and Vector contain collections of values, and Map contains many name/value pairs.

For-expressions consider many monads as having a failure indication; None is the failure indication for Option, and both Try and Future use Exception as a failure indication. Collections do not carry a failure indication; instead they merely have zero or more values or name/value pairs.

For-expressions that have multiple generators use short-circuiting to terminate inner loops. When a generator fails, or a generator runs out of data, or a guard on a generator returns false, the for-expression short-circuits. When a for-loop short-circuits, any remaining generators are not evaluated and the body of the for-loop is not evaluated. We will discuss for-comprehension short-circuiting separately because it is subtlety different.

Some examples should help make this clear. First, let’s create some variables of type Option[Int].

Scala REPL
scala> val a = Some(1)
a: Some[Int] = Some(1)
scala>
val b = None b: None.type = None
scala>
val c = None c: None.type = None
scala>
val d = Some(4) d: Some[Int] = Some(4)

This for-expression loops over the values returned by the four generators:

Scala REPL
scala> for {
   x <- a
   y <- b
   z <- c
   w <- d
} println(w)

Because b is of type None, the for-loop short-circuits when evaluating the generators, which causes the remaining generators (c and d) to be ignored, and the body of the for-loop is not evaluated, so nothing is printed.

A guard can also be used to short-circuit a for-expression. Collections such as List are different from Option, Try and Future, in that they can contain multiple values. When a generator that returns a collection monad short-circuits, only the one loop is affected and the generator fetches the next value, if any, from the collection. In this example, the first generator short-circuits when count has odd values.

Scala REPL
scala> for {
   count  <- List(1, 2, 3, 4) if count%2==0
   string <- List("a", "b", "c")
 } println(string * count)
aa
bb
cc
aaaa
bbbb
cccc 

Extraction

Multiple assignment is possible using extractors. This example shows how two temporary immutable variables (x and y) are assigned values from the tuples stored in the List.

Scala REPL
scala> for { (x, y) <- List((2, 3), (4, 5), (6, 7)) } yield x*y
res1: List[Int] = List(6, 20, 42) 

Exercise – Indexing Into a 2D Array

You can index into a 2D array easily using multiple generators. Here is how to create an array containing only zero values:

Scala REPL
scala> val array = Array.ofDim[Int](3, 4)
array: Array[Array[Int]] = Array(Array(0, 0, 0, 0), Array(0, 0, 0, 0), Array(0, 0, 0, 0)) 

Your task is to assign these values to the array and print them out.

Desired output
2, 4, 6, 8
4, 8, 12, 16
6, 12, 18, 24

Hints:

  • A row of a two-dimensional Array is a one-dimensional Array.
  • The range of row indices can be generated from array.indices.
  • The range of column indices can be generated from array.head.indices.
  • Generate indices called row and column.
  • Access elements of the array as array(row)(column).
  • Compute the value of each element of the array as (row+1) * 2*(column+1).

Solution

Scala REPL
scala> for {
     |   row    <- array.indices
     |   column <- array.head.indices
     | } array(row)(column) = (row+1) * 2*(column+1)

You can print out the array using a for-loop:

Scala REPL
scala> for { row <- array } println(row.mkString(", "))
2, 4, 6, 8
4, 8, 12, 16
6, 12, 18, 24 

We first saw Iterator.mkString in the Collections Overview lecture.

... or you can print out the array using foreach:

Scala REPL
scala> array.foreach(row => println(row.mkString(", ")))
2, 4, 6, 8
4, 8, 12, 16
6, 12, 18, 24 

You can run this solution by typing:

Shell
$ sbt "runMain solutions.TwoD"

Idioms

Default Value Idiom

This is my name for the Scala idiom; I am unaware of a generally accepted name for this idiom. The Default Value Idiom is useful when writing for-loops and for-comprehensions where the generator must provide a default value for values that would otherwise be filtered out. An example, using Option, should help make this clear.

Scala REPL
scala> def shout(maybeName: Option[String]): Unit =
   for {
     name <- maybeName.orElse(Some("UNKNOWN NAME"))
   } println(name.toUpperCase)
shout: (maybeName: Option[String])Unit
scala>
shout(Some("Chloe")) CHLOE
scala>
shout(None) UNKNOWN NAME

You can use this idiom with many types of monads and monadic-like containers, such as Try and Future.

Discarded Value Idiom

This is my name for this Scala idiom; I am unaware of a generally accepted name for this idiom. The discarded value idiom is useful for debugging. It has a dark side, in that it could be used for changing state in another scope as a side effect. We’ll look at both the good and evil sides of this idiom.

Good Usage: Simplifies Code

For example, how would you print out the value of i in this code example, prior to the assignment to string:

Scala code
for {
  i <- List(1, 2, 3)
  string <- List("a", "b", "c")
} println(string * i)

One way would be to create two nested for-loops; this is clumsy, however:

Scala code
for {
  i <- List(1, 2, 3)
} {
  println(s"i=$i")
  for {
    string <- List("a", "b", "c")
  } println(string * i)
}

Output is:

Output
i=1
a
b
c
i=2
aa
bb
cc
i=3
aaa
bbb
ccc

The discarded value idiom cleans up the syntax quite a bit. It takes advantage of the fact that all Scala expressions return a value. Both println and assignment return Unit. You can wrap any value, including Unit, into a List, Option or Try.

This means we can use side-effect code in a for-loop generator. Because we do not actually want the Unit value returned from the generator, we discard the value by assigning to a variable called underscore (_).

Scala code
for {
  i <- List(1, 2, 3)
  _ <- List(println(s"i=$i"))
  string <- List("a", "b", "c")
} println(string * i)

Output is the same as before, and we did not need to write an inner loop so our code is much cleaner. The type of the container used must match the container types used in the remainder of the for-comprehension.

Evil Usage: Warning!

If you wrote this code,
I would fire you.

It would be a misuse of the discarded value idiom to alter state in another scope as a side effect. As an example, a misguided programmer might set the variable called outerVariable to the highest even-value stored in the List provided in the first generator this way.

Scala code
var outerVariable = 0
for {
  i <- List(1, 2, 3)
  _ <- List(outerVariable = i)
  string <- List("a", "b", "c") if i % 2 == 0
} println(string * i)
println(s"outerVariable=$outerVariable")

Output is:

Output
aa
bb
cc
outerVariable=3

for-Comprehensions

In set theory, set-builder notation is used to describe a set by stating the properties that its members must satisfy. Forming sets in this manner is also known as set comprehension. For-comprehensions are similar, in that they generate or filter data according to properties of the data.

For-comprehensions can be distinguished from for-loops by the addition of one keyword: yield. For-comprehensions provide a more convenient way to express combinations of map, flatMap and filter. Unlike for-loops, for-comprehensions return a non-Unit value.

In particular, for-comprehensions return the same type of monadic container that was used in the generator, or if multiple generators are used, a common superclass of the monadic containers that were used as generators is returned.

The Partial Functions lecture showed this code example.

Scala REPL
scala> val vector2 = Vector(Some(1), None, Some(3), Some(4))
vector2: scala.collection.immutable.Vector[Option[Int]] = Vector(Some(1), None, Some(3), Some(4))
scala>
vector2.flatMap { _.map(_*2) } res0: scala.collection.immutable.Vector[Int] = Vector(2, 6, 8)

Here is equivalent code, written using a for-comprehension:

Scala REPL
scala> for {
     |   maybeItem <- vector2
     |   item      <- maybeItem
     | } yield item*2
res2: scala.collection.immutable.Vector[Int] = Vector(2, 6, 8) 

The first generator loops through all of the values of vector2 and assigns them one by one to v. Because vector2 has type Vector[Option[Int]], maybeItem has type Option[Int]. The second generator takes advantage of Option being a collection of zero or one items, and loops through all of the values contained in maybeItem. This automatically filters out empty values, including None, Nil and null. The yield statement builds the result value, adding an item each time the inner loop executes. Empty values of maybeItem do not loop, so they are not included in the final result. The returned value of the for-comprehension is of the same monadic type as vector2, which is Vector.

You could write this for-comprehension to show the types of all of the temporary variables if you like:

Scala REPL
scala> for {
     |   maybeItem: Option[Int] <- vector2
     |   item: Int              <- maybeItem
     | } yield item*2
res3: scala.collection.immutable.Vector[Int] = Vector(2, 6, 8) 

Of course we could have used flatten and achieved simpler code:

Scala REPL
scala> vector2.flatten.map { _ * 2 }
res4: scala.collection.immutable.Vector[Some[Int]] = Vector(2, 6, 8) 

A for-expression could also be used:

Scala REPL
scala> for { item <- vector2.flatten } yield item * 2
res5: scala.collection.immutable.Vector[Int] = Vector(2, 6, 8) 

Can’t Mix Monads

As we learned in the Combinators lecture, applying map or flatMap to a monad returns a transformed instance of the same monad. For example, apply a map to an Option, and you’ll get another Option. Apply a flatMap to a List and you’ll get another List. Since for-comprehensions are merely syntactic sugar for flatMap / map combinations, this rule also applies to for-comprehensions, and so the following two rules apply to for-comprehensions.

  1. A for-comprehension with only one generator will return the same monadic type as the generator.
  2. All generators in a for-comprehension must be of the same monadic type, or there needs to be an implicit conversion in scope.

Some examples will help you understand. The following will not compile.

Scala REPL
scala> def reps(list: List[Int], maybeString: Option[String]): List[String] = for {
     |   j <- maybeString
     |   i <- list
     | } yield j*i
Error:(152, 7) type mismatch;
 found   : List[String]
 required: Option[?]
    i <- list
      ^
Error:(151, 7) type mismatch;
 found   : Option[Nothing]
 required: List[String]
    j <- maybeString
      ^ 

There is no implicit conversion from List to Option, so the compiler generates an error. If you reversed the order of the generators, no error would be generated because Predef.scala defines an implicit conversion from List to Option.

Scala REPL
scala> def reps(list: List[Int], maybeString: Option[String]): List[String] = for {
   i <- list
   j <- maybeString
 } yield j*i
reps: (list: List[Int], maybeString: Option[String])List[String]
scala>
reps(List(1, 2, 3), Some("a")) res29: List[String] = List(a, aa, aaa)

Often you require a specific order of generators, so in that case you should explicitly convert Options to Lists with the toList method. This is possible because Option is a collection with zero or one elements.

Scala REPL
scala> def reps(list: List[Int], maybeString: Option[String]): List[String] = for {
   j <- maybeString.toList
   i <- list
 } yield j*i
reps: (list: List[Int], maybeString: Option[String])List[String]
scala>
reps(List(1, 2, 3), Some("a")) res30: List[String] = List(a, aa, aaa)

You can run this program by typing:

Shell
$ sbt "runMain ForFunMonads"

We will see these two rules again in the Working With Collections of Futures lecture later in this course.

Short-Circuiting a for-comprehension

for-comprehensions and for-expressions use short-circuiting slightly differently, and this subtle difference can introduce hard-to-find bugs in a program. As you know, when a for-loop short-circuits, its body is not evaluated. Under special conditions, however, the value of a for-comprehension might be evaluated even when a generator short-circuits; this can introduce hard-to-find errors.

For example, lets modify the code example we examined when discussing short-circuiting for-loops. Once again we have the same variables.

Scala REPL
scala> val a = Some(1)
a: Some[Int] = Some(1)
scala>
val b = None b: None.type = None
scala>
val c = None c: None.type = None
scala>
val d = Some(4) d: Some[Int] = Some(4)

This for-comprehension loops over the values returned by the four generators and yields the value of the innermost generator.

Scala REPL
scala> for {
     |   x <- a
     |   y <- b
     |   z <- c
     |   w <- d
     | } yield w
res4: Option[Int] = None 

What actually happened was that the second generator failed, causing y to assume the value None, and the remaining generators were short-circuited. Notice that the value of w was set from evaluating the last generator, however, even though it was not evaluated. Surprised? This is one of the reasons why generators in for-comprehensions must all have the same monadic type. When the second generator fails, all remaining generators also fail, which means they all return None.

For-comprehensions usually detect when a value set from a short-circuited generator is undefined. Here is an example of the compiler catching this type of error.

Scala REPL
scala> for {
     |    x <- a
     |    y <- b
     |    z <- c
     |    w <- d
     |  } yield x + y*10 + z*100 + w*1000
<console>:21: error: value * is not a member of Nothing
        } yield x + y*10 + z*100 + w*1000 

The For-Loops and For-Comprehensions Examples lecture shows an example of how the Scala 2.12 compiler fails to catch this type of error when using right-biased Either.

Desugaring with Quasiquotes (Supplemental)

This section discusses a Scala 2.11 compatible way of displaying desugared output. Scala 2.12 introduced a better way of showing desugaring, which is described in the Show desugarings performed by the compiler section of the Learning Scala Using The REPL lecture of the Introduction to Scala course.

Scala 2.11 introduced quasiquotes, which is a feature related to Scala compiler macros. This course will not describe macros beyond simply stating that they allow the Scala language to be extended. A quasiquoted String is a form of interpolation, rather like the s"" interpolation that was discussed in the Learning Scala Using The REPL lecture of the Introduction to Scala course. Quasiquoted strings are denoted by a leading q, and are enabled by importing scala.reflect.runtime.universe in a special way, like this (apologies for the magic):

Scala code
val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe
import universe._

We can use quasiquotes to completely desugar an expression. For example, let’s desugar this method.

Scala code
def foo(n: Int, v: Int) =
  for (i <- 0 until n;
    j <- i until n if i + j == v) yield (i, j)

Here is sample output from invoking the foo method:

Scala REPL
scala> foo(10, 9)
res11: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((0,9), (1,8), (2,7), (3,6), (4,5)) 

Before you can use quasiquotes in SBT console, the project build.sbt must declare this dependency. Note that instead of explicitly declaring the version of the Scala compiler, I merely referenced the value for this project.

build.sbt fragment
"org.scala-lang" %  "scala-reflect" % scalaVersion.value

Here is how it would work in the Scala REPL:

Shell
$ scala console
...lots of output...
scala> val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe
universe: reflect.runtime.universe.type = scala.reflect.runtime.JavaUniverse@518f0556
scala>
import universe._ import universe._
scala>
q""" def foo(n: Int, v: Int) = for { i <- 0 until n j <- i until n if i + j == v } yield (i, j) """ res6: universe.DefDef = def foo(n: Int, v: Int) = 0.until(n).flatMap(((i) => i.until(n).withFilter(((j) => i.$plus(j).$eq$eq(v))).map(((j) => scala.Tuple2(i, j)))))

The last line of output is hard to read. This is what it looks like when formatted:

Formatted output
def foo(n: Int, v: Int) =
  0.until(n).flatMap(((i) =>
    i.until(n).withFilter(((j) =>
      i.$plus(j).$eq$eq(v))).map(((j) =>
        scala.Tuple2(i, j)))))

The Scala compiler converted the + operator to $plus, the == operator to $eq$eq, added some extra parentheses and converted the tuple shorthand to scala.Tuple(). We can rewrite as.

Formatted output
def foo(n: Int, v: Int) =
  0 until n flatMap { i =>
    i until n withFilter { j =>
      i + j == v
    } map { j =>
      (i, j)
    }
  }

Output is the same as for the version written using syntactic sugar.

You can run this code example by typing:

Shell
$ sbt "runMain ForFunQuasi"
[info] Loading global plugins from C:\Users\mslinn\.sbt\0.13\plugins
[info] Loading project definition from C:\work\training\projects\public_code\group_ScalaCore\course_scala_intermediate_code\courseNotes\project
[info] Set current project to IntermediateScalaCourse (in build file:/C:/work/training/projects/public_code/group_ScalaCore/course_scala_intermediate_code/courseNotes/)
[info] Running ForFunQuasi
def foo(n: Int, v: Int) = 0.until(n).flatMap(((i) => i.until(n).withFilter(((j) => i.$plus(j).$eq$eq(v))).map(((j) => scala.Tuple2(i, j)))))
[success] Total time: 2 s, completed Jan 31, 2016 4:57:04 AM 

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.