Mike Slinn

Pattern Matching on Collections

— Draft —

Published 2019-07-02.
Time to read: 8 minutes.

We covered pattern matching and extractors in the Pattern Matching and Sealed Classes and Extractors lectures of the Introduction to Scala course. This lecture expands on that material and applies it to collections like Array, Seq, List, IndexedSeq and subtypes. Although pattern matching is powerful and cool, it is not always the most efficient or expressive way to write code. Alternative expressions are shown where appropriate.

This lecture discusses pattern matching operations on Arrays and Seq and its subtypes such as List and Vector. However, Array does not support all of the techniques shown in this lecture, so in order to use those techniques you will need to convert the Array to a List. The Array.toList method can do the job; this conversion will allow you to use all of the techniques shown in this lecture on Arrays.

As an aside, when the collection classes were reworked for Scala 2.13, they became a horrible mess. The Array class inheritance diagram is no help in determining where Array.toList is defined because it is incomplete. Array.toList is provided by an implicit conversion from Array[T] to toIndexedSeq[T] performed by the copyArrayToImmutableIndexedSeq method in scala.LowPriorityImplicits2; Predef is responsible for LowPriorityImplicits and LowPriorityImplicits2.

The sample code for this lecture can be found in courseNotes/src/main/scala/collections/PatMatch.scala.

These examples work with Arrays and any type of collection that extends Seq.

Match an Empty Collection

Use the SeqOps.isEmpty method which is available for Array and subclasses of Seq:

Scala REPL
scala> Nil.isEmpty
val res0: Boolean = true
scala>
Array().isEmpty val res1: Boolean = true
scala>
List().isEmpty val res2: Boolean = true
scala>
Seq().isEmpty val res3: Boolean = true
scala>
Vector().isEmpty val res4: Boolean = true
scala>
List.empty.isEmpty val res5: Boolean = true
scala>
Seq.empty.isEmpty val res6: Boolean = true
scala>
Vector.empty.isEmpty val res7: Boolean = true
scala>
Array(1, 2).isEmpty val res8: Boolean = false
scala>
List(1, 2, 3).isEmpty val res9: Boolean = false
scala>
Seq(1, 2, 3).isEmpty val res10: Boolean = false
scala>
Vector().isEmpty val res11: Boolean = false

To test for a non-empty collection, use SeqOps.nonEmpty.

Scala REPL
scala> Nil.nonEmpty
val res0: Boolean = false
scala>
Array().nonEmpty val res1: Boolean = false
scala>
List().nonEmpty val res2: Boolean = false
scala>
Seq().nonEmpty val res3: Boolean = false
scala>
Vector().nonEmpty val res4: Boolean = false
scala>
List.empty.nonEmpty val res5: Boolean = false
scala>
Seq.empty.nonEmpty val res6: Boolean = false
scala>
Vector.empty.nonEmpty val res7: Boolean = false
scala>
Array(1, 2).nonEmpty val res8: Boolean = true
scala>
List(1, 2, 3).nonEmpty val res9: Boolean = true
scala>
Seq(1, 2, 3).nonEmpty val res10: Boolean = true
scala>
Vector().nonEmpty val res11: Boolean = true

Match Specified Elements

This section shows you several ways of ensuring that a collection starts with a zero element. This test is not generally useful, but the techniques shown are.

The first two ways show interesting ways of working with Scala, but the third way is actually the best practice because it uses standard combinators.

(1) Use Placeholder Splat Syntax

The Scala notation for matching all remaining elements of a Seq or Array is _*. This is a new usage of the placeholder splat syntax introduced in Splatting a Seq section of the Classes Part 2 lecture of the Introduction to Scala course.

When used in a case statement as shown below the notation _* matches zero or more remaining elements in the collection. Placeholders used in match statements do not capture the matched values.

Scala code
def hasLeadingZero(seq: Seq[Int]): Boolean =
  seq match {
    case Seq(0, _*) => true
    case _ => false
  }

The hasLeadingZero method above checks if a collection begins with a zero, and uses _* to match all elements that might follow the leading zero. Placeholder splat syntax will match an empty collection as well as a non-empty collection.

We can write similar code for an Array:

Scala code
def hasLeadingZero(array: Array[Int]): Boolean =
  array match {
    case Array(0, _*) => true
    case _ => false
  }

Both methods have the same name, but accept different parameter types. If both methods are defined in the same scope, Scala’s polymorphism mechanism will invoke the appropriate method based on the argument type passed.

The REPLs for Scala 2 and Scala 3 differ slightly. Scala 3’s REPL allows both methods to be pasted together:

Shell
$ scala
Welcome to Scala 3.4.2 (11.0.23, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.
scala>
def hasLeadingZero(seq: Seq[Int]): Boolean = | seq match { | case Seq(0, _*) => true | case _ => false | } | | def hasLeadingZero(array: Array[Int]): Boolean = | array match { | case Array(0, _*) => true | case _ => false | } | def hasLeadingZero(seq: Seq[Int]): Boolean def hasLeadingZero(array: Array[Int]): Boolean

You cannot paste in these methods one at a time into a Scala 2 REPL; instead, use the REPL’s :paste command to define them together. We discussed the REPL’s :paste command in the Scala Imports and Packages lecture of the Introduction to Scala course.

Let’s test these methods:

Scala REPL
scala> hasLeadingZero(List(0, 1, 2))
res9: Boolean = true
scala>
hasLeadingZero(List(1, 2, 3)) res10: Boolean = false
scala>
hasLeadingZero(List(0)) res10: Boolean = true
scala>
hasLeadingZero(List(1)) res11: Boolean = false

Now let’s test Arrays:

Scala REPL
scala> hasLeadingZero(Array(0, 1, 2))
res12: Boolean = true
scala>
hasLeadingZero(Array(1, 2, 3)) res13: Boolean = false
scala>
hasLeadingZero(Array(0)) res14: Boolean = true
scala>
hasLeadingZero(Array(1)) res15: Boolean = false

(2) Use a Variable Name to Capture the Remaining Sequence

If you wanted to capture the value matched by the placeholder you could use a variable name instead of _*. The variable will match whatever is found, which will be a collection of the same type as the original input. This means that the variable might contain an empty collection. Here I use a variable called remainder to hold the remaining part of seq that is ignored.

Scala code
def hasLeadingZero2(seq: Seq[Int]): Boolean =
  seq match {
    case Seq(0, remainder) => true
    case _ => false
  }

We can write similar code for an Array:

Scala code
def hasLeadingZero2(array: Array[Int]): Boolean =
  array match {
    case Array(0, remainder) => true
    case _ => false
  }

Again, you must use :paste to define them together in the REPL. Let’s test these methods:

Scala REPL
scala> hasLeadingZero2(List 1, 2))
res16: Boolean = true
scala>
hasLeadingZero2(List(1, 2, 3)) res17: Boolean = false
scala>
hasLeadingZero2(Array(0, 1, 2)) res18: Boolean = true
scala>
hasLeadingZero2(Array(1, 2, 3)) res19: Boolean = false

(3) The Best Way

Here is a better way to write hasLeadingZero. This expression uses headOption to ensure that the expression does not fail if the collection is empty. If the resulting Option[Int] contains Some(value), the combinator contains tests that the Option contains a zero. If either the predicate fails or headOption returned None, the resulting value is false.

Scala code
def hasLeading0(seq: Seq[Int]): Boolean = seq.headOption.contains(0)

This improved version can be invoked with both Lists and Arrays:

Scala REPL
scala> hasLeading0(List(0, 1, 2))
res20: Boolean = true
scala>
hasLeading0(List(1, 2, 3)) res21: Boolean = false
scala>
hasLeading0(Array(0, 1, 2)) res22: Boolean = true
scala>
hasLeading0(Array(1, 2, 3)) res23: Boolean = false

You can run this code by typing:

Shell
$ sbt "runMain collections.HasLeadingZero"

Extra Goodness with List and Array

Moving on from pattern matching with Seq, here are some examples of pattern matching that can only be done with List or Array.

Extracting Members from a Collection of Unknown Size

(1) Use cons

Here is one way to print out the first three tokens of a string, or "Nope" if the string had less than three tokens. This example uses the List constructor method :: (also known as cons), which was discussed in the Immutable Collections lecture earlier in this course.

Scala code
def extract[T](list: List[T]): String = list match {
  case x1 :: x2 :: x3 :: rest => s"x1=$x1, x2=$x2, x3=$x3"
  case _ => "Nope"
}
def extract[T](array: Array[T]): String = extract(array.toList)

We can invoke extract with an Array[String] resulting from splitting a String into tokens delimited by spaces:

Scala code
scala> extract("one two three blah blah".split(" "))
res25: String = x1=one, x2=two, x3=three
scala>
extract("one two three".split(" ")) res26: String = x1=one, x2=two, x3=three
scala>
extract("one two".split(" ")) res27: String = Nope

Inside the extract method, rest contains a List containing the rest of the tokens, so for the first invocation of extract it has the value List("blah", "blah"); for the second invocation it has the value Nil.

(2) Use Placeholder Splat

Scala code
def extract2[T](list: List[T]): String = list match {
    case List(x1, x2, x3, _*) => s"x1=$x1, x2=$x2, x3=$x3"
    case _ => "Nope"
  }
def extract2[T](array: Array[T]): String = array match { case Array(x1, x2, x3, _*) => s"x1=$x1, x2=$x2, x3=$x3" case _ => "Nope" }

Again both methods are defined in the same scope, but because they accept different parameter types, Scala’s support for polymorphism invokes the appropriate method. As before, you must use the REPL’s :paste command in order to define both polymorphic methods at once:

Scala code
scala> extract2("one two three blah blah".split(" "))
res28: String = x1=one, x2=two, x3=three
scala>
extract2("one two three blah blah".split(" ").toList) res29: String = x1=one, x2=two, x3=three
scala>
extract2("one two three".split(" ").toList) res30: String = x1=one, x2=two, x3=three
scala>
extract2("one two".split(" ").toList) res31: String = Nope

(3) Use a Variable Name to Capture the Remaining Sequence

Instead of using placeholder splat syntax, which causes the matched remainder of the collection to be inaccessible, you can capture it by using a variable, in this case in a variable called remainder. Although remainder’s type is not declared, it has the same type as the type of the variable that was used to match:

Scala code
def extract3[T](list: List[T]): String = list match {
    case List(x1, x2, x3, remainder) => s"x1=$x1, x2=$x2, x3=$x3"
    case _ => "Nope"
  }
def extract3[T](array: Array[T]): String = extract3(array.toList)

Output is as you would expect, plus you could inspect the value of remainder within the code for the first case expression.

You can run this program by typing:

Shell
$ sbt "runMain collections.ListExtractAnyLen"

Output is as shown above.

Extracting Members While Asserting Collection Size

This code matches against a String that has exactly three tokens, and either returns a formatted String containing the tokens or or "Nope" otherwise:

Scala code
"one two three".split(" ").toList match {
  case x1 :: x2 :: x3 :: Nil => s"x1=$x1, x2=$x2, x3=$x3"
  case _ => "Nope"
}

Here is an alternative implementation that uses Array’s unapply method:

Scala code
"one two three".split(" ") match {
  case Array(x1, x2, x3) => s"x1=$x1, x2=$x2, x3=$x3"
  case _ => "Nope"
}

You can run this code by typing:

Shell
$ sbt "runMain collections.ListExtractAssertLen"
x1=one, x2=two, x3=three 

Output for all of these implementations is the same.

Extra Goodness with IndexedSeq / Vector

Recall from the Collections Overview lecture that the IndexedSeq trait is optimized for random access, and Vector is an implementation of IndexedSeq. All of the following examples with work with Array, List and other Seq trait implementations, but they will take a long time for large collections compared to IndexedSeq trait implementations such as Vector. For that reason I do not show versions of these examples that accept Array.

Match head and tail

To get the head and tail, use the Seq method called +: (prepend), which was discussed in the Immutable Collections lecture earlier in this course.

Scala code
def maybeHeadTail[A](vector: Vector[A]): Option[(A, Vector[A])] = vector match {
  case head +: tail => Some(head -> tail)
  case _ => None
}

We can call maybeHeadTail with a nonEmpty Vector. Notice how the REPL shows the matching type, which is Option[Tuple2[Int, Vector[Int]]] in this case:

Scala REPL
scala> maybeHeadTail(Vector(1, 2, 3, 4))
res32: Option[(Int, Vector[Int])] = Some((1,Vector(2, 3, 4))) 

Let’s call maybeHeadTail with a Vector that only has one element, which yields a tail with no elements:

Scala REPL
scala> maybeHeadTail(Vector(1))
res33: Option[(Int, Vector[Int])] = Some((1,Vector())) 

Let’s call maybeHeadTail with a Vector that has no elements, which yields None instead of a head and a tail. Notice that the matching type is Option[Tuple2[Nothing, Vector[Nothing]]], which does not match the type that the compiler expects for the first case statement.

Scala REPL
scala> maybeHeadTail(Vector())
res34: Option[(Nothing, Vector[Nothing])] = None 

You can run this code by typing:

Shell
$ sbt "runMain collections.VectorHeadTail"

Match init and last

The Immutable Collections lecture discussed init and last. Notice how similar this method is to the preceding. It uses the :+ (append) operator instead of the prepend operator.

Scala code
def maybeInitLast[A](vector: Vector[A]): Option[(Vector[A], A)] = vector match {
  case init :+ last => Some(init -> last)
  case _ => None
}

We can call maybeInitLast with a nonEmpty Vector. Notice how the REPL shows the matching type, which is Option[Tuple2[Vector[Int], Int]] in this case:

Scala REPL
scala> maybeInitLast(Vector(1, 2, 3, 4))
res35: Option[(Vector[Int], Int)] = Some((Vector(1, 2, 3),4)) 

Let’s call maybeInitLast with a Vector that only has one element, which yields an init with no elements:

Scala REPL
scala> maybeInitLast(Vector(1))
res36: Option[(Vector[Int], Int)] = Some((Vector(),1)) 

Let’s call maybeInitLast with a Vector that has no elements, which yields None instead of a head and a tail. Again, notice that the matching type is Option[Tuple2[Vector[Nothing], Nothing]], which does not match the type that the compiler expects for the first case statement.

Scala REPL
scala> maybeInitLast(Vector())
res37: Option[(Vector[Nothing], Nothing)] = None 

You can run this code by typing:

Shell
$ sbt "runMain collections.VectorInitLast"

Scala 2.13’s Pattern Matching on String

This is very cool! Scala 2.13 introduced pattern matching on String using scala.StringContext.unapplySeq(). The unapplySeq method is a version of unapply for collections, and Scala treats String like a collection of Char. Here are some simple examples, inspired by the code examples in the Scaladoc, and pasted into a REPL. The matching substring allow the non-matching substring to be captured into a new variable called name:

Scala REPL
scala> val s"Hello, $name" = "Hello, James"
name: String = James 

A MatchError results if the strings don’t match at all:

Scala REPL
scala> val s"Goodbye, $name" = "Hello, James"
scala.MatchError: Hello, James (of class java.lang.String)
  ... 28 elided 

You might want to use this with a match or partial function so MatchErrors are handled invisibly.

Scala REPL
scala> "Hello, James" match {
  case s"Hello, $name" => println(s"Greeted $name")
  case x => println(s"Could not parse ’$x’")
}
Greeted James 

Sadly, a compiler bug means this capability is not yet ready for production code. The following blows up the compiler:

Scala REPL
scala> "Goodbye, James" match {
   case s"Hello, $name" => println(s"Greeted $name")
   case x => println(s"Could not parse ’$x’")
}

Scala 2.13.1 will fix this bug.

Multiple matches are supported:

Scala REPL
scala> val s"$greeting, $name" = "Hello, James"
greeting: String = Hello
name: String = James 

List.unapplySeq also in concert with String’s pattern matching:

Scala REPL
scala> List("Not a match", "name: suffix", "Who goosed the moose?").map {
  case s"name: $value" => value
  case s"Who goosed the $animal?" => animal
  case _ => ""
}
res4: List[String] = List("", suffix, moose) 

Regular expressions also work:

Scala REPL
scala> val TimeSplitter = "([0-9]+)[.:]([0-9]+)".r
TimeSplitter: scala.util.matching.Regex = ([0-9]+)[.:]([0-9]+)
scala>
val s"The time is ${ TimeSplitter(hours, mins) }" = "The time is 10.50" hours: String = 10 mins: String = 50

Again, using match provides a way to handle parse errors gracefully:

Scala REPL
scala> "The time is 10.50" match {
  case s"The time is ${ TimeSplitter(hours, mins) }" => Some((hours, mins))
  case _ => None
}
res6: Option[(String, String)] = Some((10,50)) 

Aliases and Sequence Element Matching

A partial function can express complex logic. Aliases and sequence matching can make the logic much easier to understand. The Sealed Classes and Extractors lecture of the Introduction to Scala course introduced match aliases. Let’s have a little fun while exploring how to use aliases with sequence matching.

Your mission is to eat dessert, but you are on a diet. Meals with only 1 course do not have dessert, so you can only eat dessert as part of a 3 course meal. You can feed the first course to the dog if you don’t like it, but the dog cannot eat spinach since that is poisonous to dogs. You need to be able to determine which meals you can order that either are low calorie, or have a tasty enough item that is worth breaking the diet for. Let’s start by defining a Food case class:

Scala code
case class Food(name: String, calories: Int, yumminess: Int) {
  def dogCanEat = name!="Spinach"
}

And let’s define a Menu case class:

Scala code
case class Menu(foods: Food*) {
  def totalCalories = foods.map(_.calories).sum
def mostYucky = foods.sortBy(_.yumminess).head
def mostTasty = foods.sortBy(_.yumminess).last
def isWorthOrdering: Boolean = { //println(s"totalCalories=$totalCalories; mostYucky.yumminess=${mostYucky.yumminess}; mostTasty.yumminess=${mostTasty.yumminess}") totalCalories<200 || mostTasty.yumminess>=9 }
override def toString = s"${foods.map(_.name).mkString("_")}: totalCalories=$totalCalories; mostYucky=${mostYucky.name}; mostTasty=${mostTasty.name}" }

Now we can define the menu:

Scala code
val course1a = Food("Spinach",  10,  2)
val course2a = Food("Turnips",  10,  1)
val course1b = Food("Peas",     50,  5)
val course2b = Food("Potatoes", 110, 6)
val dessert  = Food("BelgianChocolate", 135, 9)
val menu1 = Menu(course1a) val menu2 = Menu(course1a, course2a) val menu3 = Menu(course1a, course2a, dessert) val menu4 = Menu(course1b, course2a, dessert) val menu5 = Menu(course1b, course2b, dessert)

Now we can get to the point of this example. Notice how the alias allows the entire matched item to be given a name (menu), while still ensuring that the matched list has exactly three elements, and that the remaining constraints are satisfied. This complex logic is easy to understand and maintain. We start by creating a List[Menu] containing all available Menu instances and filter the List. Filter is a higher-order FunctionN that accepts a lambda or named FunctionN as discussed in the Higher-Order Functions lecture earlier in this course.

As we learned in the Partial Functions lecture, you can supply a partial function whenever a regular FunctionN is expected. This partial function has two cases: one returns true and the other returns false.

The first case is the interesting one. It features an alias called menu of type Menu. The Menu constructor is a varargs Array[Menu], and this case will only match for menus that have exactly three courses, named c1, c2 and c3. Then the menu item that is currently being matched is worth ordering, or if we can feed the first course to the dog. If Menu.unapply matches because the Menu has three courses and the guard passes then this case returns true.

Scala code
val result = List(menu1, menu2, menu3, menu4, menu5) filter {
  case menu @ Menu(c1, c2, c3) if menu.isWorthOrdering || c1.dogCanEat => true
  case _ => false
}
println(result.mkString("\n"))

You can run this example by typing:

Shell
$ sbt "runMain collections.MatchAlias"

Output is as follows. Notice that menu1 and menu2 were filtered out because they did not contain 3 courses.

Scala code
Spinach_Turnips_BelgianChocolate: totalCalories=155; mostYucky=Turnips; mostTasty=BelgianChocolate
Peas_Turnips_BelgianChocolate: totalCalories=195; mostYucky=Turnips; mostTasty=BelgianChocolate
Peas_Potatoes_BelgianChocolate: totalCalories=295; mostYucky=Peas; mostTasty=BelgianChocolate

Exercise – Extract File Type from Path

Write a method that accepts a String containing an arbitrary file name, which may or may not include a path (subdirectories). Use list pattern matching to parse out the file type. Because the file name might just consist of a dot, the method should return an Option[String]. For example:

  • Given prefix.page.html, return Some(html).
  • Given blah.png, return Some(png).
  • Given dir/path/to/a.b.c.d, return Some(d).

Hints

  1. The StringOps class is an implicit class that enriches String and adds a split method. You can specify this regular expression to match on a period: "\\."
  2. StringOps also defines the nonEmpty and reverse methods. Take a look at the other methods that StringOps provides, they are most useful.

Solution

This solution works for any number of periods in a file name:

Scala code
def findExtension(s: String): Option[String] = s.reverse.split("\\.") match {
  case Array(ext, _*) if ext.nonEmpty && !s.endsWith(".") && s.contains(".") => Some(ext.reverse)
  case _ => None
}

The string s is first reversed, and then split wherever a period occurred into an Array[String], which is matched against a case using placeholder splat syntax. The pattern requires at least two strings in the Array[String], and the first String must not be empty; this had to be tested for because of how split works for various types of input. The guard also ensures that the original string did not end with a period and that there was at least one period . Try it in the REPL to see:

Scala REPL
scala> "prefix.page.html".split("\\.")
res0: Array[String] = Array(prefix, page, html)
scala>
".page.html".split("\\.") res1: Array[String] = Array("", page, html)
scala>
"page.html".split("\\.") res2: Array[String] = Array(page, html)
scala>
".html".split("\\.") res3: Array[String] = Array("", html)
scala>
"page.".split("\\.") res4: Array[String] = Array(page)
scala>
"page".split("\\.") res5: Array[String] = Array(page)

The extracted file name extension is then reversed back to its original direction. Let’s try it in the REPL:

Scala REPL
scala> findExtension("html")
res6: Option[String] = None
scala>
findExtension(".html") res7: Option[String] = Some(html)
scala>
findExtension("page.html") res8: Option[String] = Some(html)
scala>
findExtension("path/prefix.page.html") res9: Option[String] = Some(html)

To run this program, type:

Shell
$ sbt "runMain solutions.ExtractFileType"

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.