Mike Slinn

Immutable Collections

— Draft —

Published 2014-03-10. Last modified 2019-07-09.
Time to read: 11 minutes.

Scala's immutable collections are explored. This information is vital for large-scale computation.

The sample code for this lecture can be found in courseNotes/src/main/scala/collections/CollectionImmutableFun.scala.

This lecture discusses many of the non-parallel versions of the following collection types.

    Collection Traits and Subclasses
  Special Foundation Set Sequence Random Access Sequence Map Stack & Queue
Immutable Option
ParRange
Range
Stream
Iterable
ParIterable
Traversable
BitSet
HashSet
ListSet
ParHashSet
ParSet
Set
SortedSet
TreeSet
List
LinearSeq
PagedSeq
ParSeq
Seq
IndexedSeq
Vector
ParVector
HashMap
ListMap
Map
ParMap
ParHashMap
SortedMap
TreeMap
Queue

List

Scala’s List is an immutable singly-linked list, which means that List is an ordered collection. List is designed for linear access, also known as sequential access.

You can easily make a new List:

Scala REPL
scala> val list1 = List(1, 2, 3)
list1: List[Int] = List(1, 2, 3) 

The compiler examined all of the elements of the list and noticed that they were all Ints, \ so list1 has type List[Int].

Nil refers to the empty list.

Scala REPL
scala> Nil
res0: scala.collection.immutable.Nil.type = List() 

Nil is not parametric; it is a singleton and represents any and all empty lists.

You can use the cons operator (::) or the prepend operator(+:) to prepend items to a list and thereby create a new list. While :: is specific to List, and +: can be used for any type of sequence, both of these operators are equivalent for List instances. List items are concatenated with either of these operators, and the end of the concatenation is denoted by Nil.

Scala REPL
scala> val list2 = 4 :: 5 :: 6 :: Nil
list2: List[Int] = List(4, 5, 6)
scala>
7 +: 8 +: 9 +: Nil res0: List[Int] = List(7, 8, 9)

You can concatenate two lists with the ::: and ++ operators. While ::: is specific to List and ++ can be used with any sequence, they both do the same thing for List instances.

Scala REPL
scala> list1 ::: list2
res1: List[Int] = List(1, 2, 3, 4, 5, 6)
scala>
list1 ++ list2 res2: List[Int] = List(1, 2, 3, 4, 5, 6)

Appending a element to an immutable.List is inefficient because the whole collection must be copied. For this reason Scala does not provide an append-single-element-to-list operator for immutable Lists. However, prepending an element (with :: or +:) to an immutable.List is not computationally expensive. Because an instance of immutable.List cannot be changed, the prepend operators (:: and +:) and the concatenation operators (::: and ++) return new instances of immutable.List. Also, the concatenation operators have linear complexity (O(n)), which is fine if used once in a while, but does not perform well in a loop.

You can have a List that contains any type of object. For example, let’s say we have a Thing case class.

Scala REPL
scala> case class Thing(i: Int, s: String)
defined class Thing 

We could make a collection of Things:

Scala REPL
scala> val thing1 = Thing(1, "z")
thing1: Thing = Thing(1,a)
scala>
val thing2 = Thing(2, "y") thing2: Thing = Thing(2,b)
scala>
val thing3 = Thing(3, "x") thing3: Thing = Thing(3,c)
scala>
val things = List(thing3, thing1, thing2) things: List[Thing] = List(Thing(3,x),Thing(1,z),Thing(2,y))
Be sure to specify the type of items in empty collections

The Scala compiler recognized that all of the elements in the things list were of type Thing, so things has type List[Thing]. Notice what happens if I add a String to the list.

Scala REPL
scala> "Hi" :: things
res3: List[java.io.Serializable] = List(Hi, Thing(3,x), Thing(1,z), Thing(2,y))

The type of the new collection is List[Serializable], which is rather generic. I could have specified an even more generic type.

Scala REPL
scala> val x: List[Any] = "Hi" :: things
x: List[Any] = List(Hi, Thing(3,x), Thing(1,z), Thing(2,y))

To guard against inadvertently adding something of the wrong type into a collection, be sure to specify the type that you require. The compiler will then be able to notify you of elements with mismatched types.

Scala REPL
scala> val x: List[Thing] = "Hi" :: things
<console>:14: error: type mismatch;
found : String
required: Thing
val x: List[Thing] = "Hi" :: things 

Like we saw for Array, there are several ways to create empty collections like List. The compiler needs to know the type of item contained in the collection, and there are two ways you can provide that type hint.

Scala REPL
scala> List.empty[Double]
res3: List[Double] = Seq()
scala>
val emptyList: List[Double] = List.empty emptyList: List[Double] = List()
scala>
val emptyList: List[Double] = List() emptyList: List[Double] = List()

Notice what happens if you do not provide a type hint to the compiler to indicate the element type in the List.

Scala REPL
scala> val badList = List.empty
badList: List[Nothing] = Seq() 

As you can see, the element type in the collection is inferred to be Nothing, defeating type safety. Be sure to specify a type hint to the right or the left of the equals sign.

More About Nil

The empty list is represented by the Nil object. Nil is implemented as a singleton, and therefore cannot be created with a type specifier.

Scala REPL
scala> Nil[Int]
<console>:8: error: object Nil does not take type parameters.
Nil[Int]
^ 

Look what happens when you apply a combinator to Nil and no type hints are provided to the compiler – the type of element in the collection is inferred to be Nothing.

Scala REPL
scala> Nil.headOption
res1: Option[Nothing] = None 

You should always provide a type hint with Nil. Here is an example of providing a type hint on the left side of the equals sign, so the Scala compiler infers the element type to be Int.

Scala REPL
scala> val x: Option[Int] = Nil.headOption
x: Option[Int] = None 

Here is another example, where the type hint is provided on the right side of the equals sign. The Scala compiler knows that List("hi") is of type List[String], so it coerces the type of the empty list denoted by Nil to match.

Scala REPL
scala> val y = Nil ::: List("hi")
y: List[String] = List(hi) 

Right-Associative Operators

You may have noticed that the :: and +: operators act unusually, in that the item to be prepended precedes the List instance. This is possible because Scala treats all operators that end in a colon character as right-associative when written using infix notation. In other words, this expression.

Scala REPL
scala> 5 :: Nil
res4: List[Int] = List(5) 

... is equivalent to:

Scala REPL
scala> Nil.::(5)
res5: List[Int] = List(5) 

In this expression, the :: operator is a method of the empty List denoted by Nil. Because :: is right-associative, the binding is backwards from what you may be used to. The argument 5 is passed to List.:: method.

These expressions are also equivalent.

Scala REPL
scala> 5 +: Nil
res6: List[Int] = List(5)
scala>
Nil.+:(5) res7: List[Int] = List(5)

Here is another example of building a List[Int].

Scala REPL
scala> 7 +: 8 +: 9 +: Nil
res1: List[Int] = List(7, 8, 9) 

Vector

The immutable.Vector class is also an immutable ordered collection that provides a compromise between indexed and linear access. It has both effectively constant time indexing overhead and constant time linear access overhead. Because if this, Vector is a good foundation for mixed access patterns where both indexed and linear accesses are used. The :: and ::: List operations are not supported by Vector, but you can use the +: and ++: equivalents.

Scala REPL
scala> val vector1 = Vector(1, 2, 3)
res8: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)
scala>
0 +: vector1 res9: scala.collection.immutable.Vector[Int] = Vector(0, 1, 2, 3)
scala>
Vector(thing1, thing2, thing3) res10: scala.collection.immutable.Vector[Thing] = Vector(Thing(1,z), Thing(2,y), Thing(3,z))

If you cannot decide if you should use List or Vector, use Vector if possible.

Scala REPL
scala> val vector2 = Vector(4, 5, 6)
res11: scala.collection.immutable.Vector[Int] = Vector(4, 5, 6)
scala>
vector1 ++ vector2 res12: Vector[Int] = Vector(1, 2, 3, 4, 5, 6)

Seq

Seq is the base trait for ordered sequences and lists such as List and Vector. Unlike the Iterator trait, the Seq trait enables collections with a definite order. Seq is a generalization of sequences, so you should use Seq to define interfaces and use List or Vector for defining variables, like this.

Scala REPL
scala> def doSomething(seq: Seq[Int]) = seq.foreach(println)
doSomething: (seq: Seq[Int])Unit
scala>
doSomething(List(1, 2, 3)) 1 2 3
scala>
doSomething(Vector(4, 5, 6)) 4 5 6

In other words, Scala’s Seq is similar to Java’s List interface, and Scala’s List is similar to an immutable version of Java’s LinkedList concrete class.

BTW, the Seq constructor actually returns a List, which implements the Seq trait. It is not good practice to use this constructor, even though many Scala programmers write code like this.

Scala REPL
scala> val seq = Seq(1, 2, 3)
seq: Seq[Int] = List(1, 2, 3) 

The old Scala Collections documentation has more useful information on Seq.

Sorting on Demand

You can sort on demand using one or more properties of an ordered collection’s elements. Of course, for this to work all elements in the collection must have the required properties. This means that the declared type of the items in the collection will define sortable properties, and you would not sort collection types such as HashMap or HashSet.

Scala REPL
scala> things.sortBy(_.i)
res13: List[Thing] = List(Thing(1,z), Thing(2,y), Thing(3,x))
scala>
things.sortBy(_.s) res14: List[Thing] = List(Thing(3,x), Thing(2,y), Thing(1,z))

In order to sort by more than one property, specify them as a tuple. The order of the sort elements defines the sort key order.

Scala REPL
scala> things.sortBy(t => (t.i, t.s))
res15: List[Thing] = List(Thing(1,z), Thing(2,y), Thing(3,x)) 

You will often want to define a natural ordering for your collection. We will discuss that topic in the lecture on Sorting and Ordered Collections.

You can sort tuples in a similar manner:

Scala REPL
scala> val tuples = Vector((4, "z"), (2, "q"), (5, "b"))
tuples: scala.collection.immutable.Vector[(Int, String)] = Vector((4,z), (2,q), (5,b))
scala>
tuples.sortBy(_._1) res16: scala.collection.immutable.Vector[(Int, String)] = Vector((2,q), (4,z), (5,b))
scala>
tuples.sortBy(_._2) res17: scala.collection.immutable.Vector[(Int, String)] = Vector((5,b), (2,q), (4,z))
scala>
tuples.sortBy(t => (t._1, t._2)) res18: scala.collection.immutable.Vector[(Int, String)] = Vector((2,q), (4,z), (5,b))
Avoid calling head, tail and last because they throw an Exception when used on an empty collection

head, tail and the Remainder of a Collection

head, init, tail, last, headOption and lastOption are defined in IterableOps, which means there are available throughout the Scala collection classes. They can be applied to ordered and unordered collections, and also to mutable and immutable collections.

head returns the first item in a collection, or throws a NoSuchElementException if the collection is empty.

Scala REPL
scala> val seq: Seq[Int] = Vector(1, 2, 3)
seq: Seq[Int] = Vector(1, 2, 3)
scala>
seq.head res19: Int = 1
scala>
Vector[Int].empty.head java.util.NoSuchElementException: head of empty list
scala>
Nil.head java.util.NoSuchElementException: head of empty list
scala>
List.empty[Int].head java.util.NoSuchElementException: head of empty list at scala.collection.immutable.Nil$.head(List.scala:417) at scala.collection.immutable.Nil$.head(List.scala:414) ... 27 elided

headOption returns an Option (Some or None) and never throws an exception.

Scala REPL
scala> seq.headOption
res20: Option[Int] = Some(1)
scala>
Vector[Int].empty.headOption res21: Option[Int] = None
scala>
Nil.headOption res22: Option[Nothing] = None
scala>
List.empty[Int].headOption res23: Option[Int] = None

tail returns the remainder of the collection after the head is removed. If the collection is empty an exception is thrown.

Scala REPL
scala> seq.tail
res24: Seq[Int] = Vector(2, 3)
scala>
Nil.tail java.lang.UnsupportedOperationException: tail of empty list

last returns the last element of the collection, or throws an exception if the collection is empt.

Scala REPL
scala> seq.last
res24: Int = 3
scala>
Nil.last java.util.NoSuchElementException

lastOption returns an Option and never throws an exception.

Scala REPL
scala> seq.lastOption
res25: Option[Int] = Some(3)
scala>
Nil.lastOption res26: Option[Nothing] = None
scala>
List.empty[Int].lastOption res27: Option[Int] = None

init returns all but the last element, or throws an exception if the collection is empty. If there is only one element in the collection, the empty collection is returned.

Scala REPL
scala> seq.init
res28: Seq[Int] = Vector(1, 2)
scala>
Vector(1).init res29: Vector[Int] = List()
scala>
Nil.init java.lang.UnsupportedOperationException: empty.init

There is no initOption combinator. You should avoid using init. Instead, consider using take, like this, or wrap this code into a method (which you will do shortly!).

Scala REPL
scala> seq.take(seq.size-1)
res30: Vector[Int] = Vector(1, 2) 

The Create a Failure-Proof init Method exercise in the Parametric Variance and Bounds lecture explores how to write this method in a generic manner for all collections.

take, drop and takeWhile

take returns up to the specified number of elements from the beginning of the collection. drop returns the remainder of the collection after the specified number of elements have been removed. If you try to take or drop more elements from the collection than actually exist, only the number of elements requested are processed and no exception is raised.

Scala REPL
scala> seq.take(2)
res31: Seq[Int] = Vector(1, 2)
scala>
seq.take(0) res32: Seq[Int] = Vector()
scala>
seq.take(4) res33: Seq[Int] = Vector(1, 2, 3)
scala>
seq.drop(2) res34: Seq[Int] = Vector(3)
scala>
seq.drop(0) res35: Seq[Int] = Vector(1, 2, 3)
scala>
seq.drop(-4) res36: Seq[Int] = Vector(1, 2, 3)
scala>
seq.drop(4) res37: Seq[Int] = Vector()

takeWhile is a version of take that accepts a predicate instead of a constant value.

Maps

Immutable.Map is very similar to mutable.Map, with the obvious difference being that immutable.Maps cannot be modified. However, two immutable.Maps can have binary operations performed, such as concatenation and difference, however the result is a new immutable.Map.

SeqMap

Scala 2.13 introduced a new type of immutable.Map trait, SeqMap, and two concrete implementations: TreeSeqMap and VectorMap.

TreeSeqMap

TreeSeqMap is similar in behavior to a ListMap, but it performs better for larger collections and uses more memory.

VectorMap

VectorMap is similar in behavior to ListMap, but provides constant lookup time while using more memory and generally running slower.

Iterator

Iterator defines some powerful methods that are inherited by all Scala collections.

Iterator.iterate

The iterate method starts by initializing its accumulator with the initial value, then repeatedly applies the accumulated value back into the input of the given function. Here I invoke iterate with a zero initial value, and then generate values that are computed by adding 1 to the accumulated value:

Scala REPL
scala> Iterator.iterate(0)(_ + 1) take(10) foreach println
0
1
2
3
4
5
6
7
8
9 

This is similar, with the computation doubling the accumulated value then adding one:

Scala REPL
scala> Iterator.iterate(0)(_ * 2 + 1) take(10) foreach println
0
1
3
7
15
31
63
127
255
511 

Warning! This never stops.

Scala REPL
scala> Iterator.iterate(0)(_ + 1) foreach println

Iterator.fill

Iterator.fill produces a collection containing the results of a computation a specified number of times. Variants of fill are defined for 1-dimensional data through 5-dimensional data. This example uses fill to retrieve the elements of a PriorityQueue and sort them in ascending order.

Scala REPL
scala> val q = PriorityQueue(1289, 12, 123, 894, 1)(Ordering.Int.reverse)
q: scala.collection.mutable.PriorityQueue[Int] = PriorityQueue(1, 12, 123, 894, 1289)
scala>
Iterator.fill(q.size)(q.dequeue) foreach println 1 12 123 894 1289

Iterator.tabulate

Iterator.tabulate produces a collection containing values of a given function over a range of integer values starting from 0. Variants of tabluate are defined for 1-dimensional data through 5-dimensional data. Here is an example.

Scala REPL
scala> Array.tabulate(10)(_.toDouble)
res17: Array[Double] = Array(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0) 

Here is a 2-dimensional example:

Scala REPL
scala> Array.tabulate(3, 3)((a: Int, b: Int) => a.toDouble * b)
res20: Array[Array[Double]] = Array(Array(0.0, 0.0, 0.0), Array(0.0, 1.0, 2.0), Array(0.0, 2.0, 4.0)) 

Iterator.continually

We can use takeWhile in conjunction with Iterator.continually to obtain unique values until a user indicates they have enough.

Scala REPL
scala> import collection.Iterator
import collection.Iterator
scala>
Iterator. continually(System.nanoTime). takeWhile(_ => !io.StdIn.readLine("\nMore? <Y/n>: "). toLowerCase. startsWith("n") ). foreach(println) More? <Y/n>: 23773991672470 More? <Y/n>: 23775269319438 More? <Y/n>:

Iterator.unfold

Iterator.unfold, introduced in Scala 2.13, is to producing collections as fold is to consuming collections. unfold provides a purely functional way of producing a collection of which we don’t know the size and elements a priori. When using unfold, the elements are computed one after the other, until a None termination signal is computed, otherwise the elements continue infinitely.

This example defines a method that uses unfold to build a list of lists containing similar values.

Scala REPL
scala> def segments[T](xs: List[T]): List[List[T]] =
  List.unfold(xs)(xs =>
    if (xs.isEmpty) None
    else Some(xs.span(_ == xs.head))
  )
segments: [T](xs: List[T])List[List[T]]
scala>
segments(List(1,1,2,3,3,3,4,5,5)) res0: List[List[Int]] = List(List(1, 1), List(2), List(3, 3, 3), List(4), List(5, 5))

Another example is the Fibonacci series, implemented with LazyList.unfold.

Scala REPL
scala> val fib =
  LazyList.unfold((1, 1)) { case (a, b) => Some((a, (b, a + b))) }
fib: scala.collection.immutable.LazyList[Int] = LazyList(<not computed>)
scala>
fib.take(10).force res6: scala.collection.immutable.LazyList[Int] = LazyList(1, 1, 2, 3, 5, 8, 13, 21, 34, 55)

unfold is more powerful than Iterator.apply (for example, the default apply method is invoked when writing List(1, 1, 2, 3)). apply constrains the user to enumerate all the elements of the collection when invoked. unfold is more powerful than tabulate, which constrains the user to provide the collection length and is limited to collections whose elements can be computed from their index. unfold is also more powerful than iterate, which is limited to collections whose nth element can be computed from the n-1th element.

We can prove that by implementing iterate, tabulate, and apply in terms of unfold.

Scala code
def apply[A](as: A*): List[A] =
  List.unfold(as) { case h +: t => Some((h, t)) case _ => None }
def tabulate[A](length: Int)(f: Int => A): List[A] = List.unfold(length)(n => if (n > 0) Some((f(n), n - 1)) else None)
def iterate[A](start: A, length: Int)(f: A => A): List[A] = List.unfold((length, start)) { case (0, _) => None case (n, a) => Some((a, (n - 1, f(a)))) }

Use unfold when the iteration starts from some hidden state.

Scala 2.13 defines Iterator.unfold. The version for List is essentially the same method, is much simpler than the general case, and works for all Scala versions.

Scala code
def unfold[A, S](start: S)
                (op: S => Option[(A, S)]): List[A] =
  Iterator.
    iterate(op(start))(_.flatMap{ case (_, s) => op(s) }).
    map(_.map(_._1)).
    takeWhile(_.isDefined).
    flatten.
    toList

Exercise – Compute SHA of a file

This exercise replicates the functionality of sha1sum.

Git uses the SHA-1 algorithm to obtain the digest of each file that it checks in. SHA-1 is not secure but is inexpensive to compute. Write a console application that reports the SHA-1 digest of the contents of /etc/passwd. Do not write this application for maximum efficiency; instead, use it as practice for processing a java.io.InputStream of data. Even though the source of the data is a file, this algorithm works on a java.io.InputStream so it is appropriate for an input of unknown length.

Implement a method with this signature as part of the solution; you can provide a default value for the algorithm if you like. For this exercise, the value of algorithm must be "SHA-1".

def digest(inStream: java.io.InputStream, algorithm: String): Array[Byte]

Hints.

  • In case you have encountered Scala Streams before, they have no relationship to Java Streams. Scala Streams are not required for this solution, only Java Streams and Scala’s Iterator.
  • You can create a buffer to hold the accumulated result by calling new Array[Byte](1024).
  • You can get the SHA-1 message digest algorithm by calling java.security.MessageDigest.getInstance("SHA-1")
  • java.io.FileInputStream.read returns the number of bytes read, or -1 to signal no more data.
  • Use LazyList.continually and takeWhile to read successive byte arrays from an InputFileStream until FileInputStream.read signals no more data.
  • Iterate using foreach over MessageDigest.update to calculate an updated digest in the MessageDigest, using the data in the buffer as input. The third argument must be set to the number of bytes read for this iteration. One way of expressing this is:
    Scala code, or whatever
    (whatever).foreach { b => md.update(buffer, 0, b) }
    Another way, using shorthand, is:
    Scala code, or whatever
    (whatever).foreach { md.update(buffer, 0, _) }

Output should be something like:

Output
SHA-1 of /etc/passwd is 2112088146319-7-11555-1126870-43-105-63-21-5028-43-98

Solution

Shell
package solutions
import java.security.MessageDigest import java.io.{FileInputStream, InputStream}
@main object SHA1 { def digest(fileName: String, algorithm: String): Array[Byte] = { val inStream = new FileInputStream(fileName) digest(inStream, algorithm) }
def digest(inStream: InputStream, algorithm: String="SHA-256"): Array[Byte] = { val md = MessageDigest.getInstance(algorithm) val buffer = new Array[Byte](1024) LazyList.continually { inStream.read(buffer) } .takeWhile(_ != -1) .foreach { md.update(buffer, 0, _) } md.digest }
println(s"""SHA-1 of /etc/passwd is ${digest("/etc/passwd", "SHA-1").mkString}""") }

To run this solution, type:

Shell
$ sbt "runMain solutions.SHA1"

Views

Most Scala collections are strict – that is, all values of the collection are computed and stored, even if they are not retrieved. In contrast, the elements of a lazy collection are computed only when accessed. Ranges, which we first discussed in the Learning Scala Using The REPL 1/3 lecture of the Introduction to Scala course, are examples of lazy collections, specifically lazy ordered sequences.

Scala REPL
scala> val range: Range.Inclusive = 3 to 8
range: Range.Inclusive = Range 3 to 8
scala>
range(0) res0: Int = 3

Making Lazy Collections from Strict Collections

Lazy collections require less computation than strict collections for many or most circumstances because collection elements are only computed when required.

The SeqOps trait is available to all collections. The SeqOps.view method can convert an otherwise strict collection to a lazy collection. The view method returns a subclass of SeqOps. Even though Ranges are lazy and not strict, the view method is defined because it implements IndexedSeqOps, which implements SeqOps, which implements IterableOps, which declares view.

Scala REPL
scala> val view = range.view
view: scala.collection.IndexedSeqView[Int] = IndexedSeqView(<not computed>)
scala>
view(0) res1: Int = 3

LazyList

Prior to Scala 2.13, Stream was used to model a lazily evaluated list.

An evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations.

Scala 2.13 deprecated Stream in favor of LazyList. As its name suggests, a LazyList is a linked list whose elements are lazily evaluated. An important semantic difference with Stream is that in LazyList both the head and the tail are lazy, whereas in Stream only the tail is lazy.

Scala lazy lists are unrelated to Java streams. Instead, Scala lazy lists are subtypes of immutable.Seq. LazyList is commonly used to handle subsets of infinite sequences, and for sequences that are expensive to compute. Predef imports collection.immutable.LazyList so it is always available to any program.

Lazy lists are parametric, or generic, which means elements of a lazy list conform to a specific type. LazyList[T] is a lazy collection whose elements of type T are memoized, or cached.

Because Stream is memoized, given a stream with sufficient unique values, the memoization cache will eventually fill all available memory. We will discuss memoization in the Memoization in Depth lecture.

BTW, Scala 2.13 renamed Stream.append to LazyList.lazyAppendedAll.

LazyList.continually creates a Scala LazyList, which can be considered as a lazy Seq which contains elements that are only evaluated when they are needed. For example, here is how one might compute a LazyList of unique Longs.

Scala REPL
scala> import collection.immutable.LazyList
import collection.immutable.LazyList
scala>
val ids = LazyList.continually(System.nanoTime) ids: scala.collection.immutable.LazyList[Long] = Stream(86653342711375, ?)

We can examine the first 5 elements of the stream this way. Note that toVector forces evaluation.

Scala REPL
scala> ids.take(5)
res37: scala.collection.immutable.LazyList[Long] = LazyList(86549694623903, ?)
scala>
ids.take(5).toVector res38: Vector[Long] = Vector(86549694623903, 86592186252578, 86592186259051, 86592186261332, 86592186263348)

This demonstrates that LazyList memoizes values that have been generated.

Scala REPL
scala> ids.head
res39: Long = 72091815908327 

LazyList.continually can be used to write a similar program as the one I showed you a moment ago that used Iterator.continually, but should only be used if the previous values need to be accessed as the program executes. Notice that I put dots at the end of each line of the REPL input, except for the last line. This is so the REPL knows to expect a multi-line expression. If I had put the dots at the start of the last two lines instead, the REPL would not have read all three lines as one expression.

Scala REPL
scala> LazyList.continually(System.nanoTime).
  takeWhile(_ => !io.StdIn.readLine("\nMore? <Y/n>: ")
                              .toLowerCase
                              .startsWith("n")
  ).foreach(println)
More? <Y/n>: 24282594984003
More? <Y/n>: 24283649919676
More? <Y/n>: 

Iterator.continually provides a similar mechanism to LazyList.continually that also supports lazy sequences, but without the memory overhead of memoization. You should use Iterator.continually when you do not need to memoize the values.

About foreach

Here is how you can discover the acceptable expressions that can be passed into the foreach higher-order method above.

The collection.Iterator Scaladoc shows the method signature for continually. We see that it returns an Iterator[A] where A is defined by the type streamed from continually. In the above example, System.nanoTime returns a Long, so A is of type Long.

Scala code
def continually[A](elem: => A): Iterator[A]

The Scaladoc for Iterator.foreach shows the method signature.

Scala code
def foreach( f: (A) => Unit): Unit

Thus Iterator.foreach accepts a Function1[A, Unit], which can also be written as A => Unit. In other words, this foreach accepts any Function1 that accepts an instance of A, and does not return a value (returns Unit).

Acceptable examples include:

  1. Scala code
    x => println(x) // println returns Unit
  2. Scala code
    println(_) // shorthand for above
  3. Scala code
    println // shorter shorthand for above
  4. Scala code
    x =>
    val y = callSomeMethod(x)
    doSomethingWithY(y)
    () // explicitly return Unit
  5. Scala code
    def myMethod[T]: Unit = {
      doSomethingWithT(t)
      () // explicitly return Unit
    }
    Iterator .continually(System.nanoTime) .takeWhile(_ => !io.StdIn.readLine("\nMore? <Y/n>: ") .toLowerCase .startsWith("n") ).foreach(myMethod) // myMethod is automatically lifted to a Function1

Streaming Binary I/O

Here is a more efficient way to read in a binary file into a byte array, than the sample code shown in the the Higher-Order Functions lecture. This version does not do pointless character encoding.

Scala REPL
scala> import scala.language.postfixOps
import scala.language.postfixOps
scala>
import sys.process._ import sys.process._
scala>
val fileName = ("which scalac".!!).trim fileName: String = /usr/local/bin/scalac
scala>
val bis = new java.io.BufferedInputStream(new java.io.FileInputStream(fileName)) bis: java.io.BufferedInputStream = java.io.BufferedInputStream@3c04ddda
scala>
val bArray = Stream.continually(bis.read). takeWhile(-1 !=). map(_.toByte). toArray bArray: Array[Byte] = Array(35, 33, 47, 117, 115, 114, 47, 98, 105, 110, 47, 101, 110, 118, 32, 98, 97, 115, 104, 10, 35, 10, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 10, 35, 32, 67, 111, 112, 121, 114, 105, 103, 104, 116, 32, 50, 48, 48, 50, 45, 50, 48, 49, 51, 32, 76, 65, 77, 80, 47, 69, 80, 70, 76, 10, 35, 10, 35, 32, 84, 104, 105, 115, 32, 105, 115, 32, 102, 114, 101, 101, 32, 115, 111, 102, 116, 119, 97, 114, 101, 59, 32, 115, 101, 101, 32, 116, 104, 101, 32, 100, 105, 115, 116, 114, 105, 9...) scala>
scala> bis.close()

Several variations of this code are provided in HigherFun.scala. You can run this program by typing:

Shell
$ sbt "runMain BinaryIO"

Here is an explanation.

  1. BufferedInputStream.read returns the next byte of data, or -1 if the end of the input stream is reached.
  2. LazyList.continually creates a Scala Stream, which is a lazy Seq where elements are only evaluated when they are needed. Stream memoizes elements as they are computed.
  3. takeWhile returns the beginning elements of a Seq that satisfy a predicate.
  4. Stream.continually(bis.read).takeWhile(_.!=(-1)) returns a Stream[Int]
  5. Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte) returns a Stream[Byte]
  6. The function value -1 != is extremely terse because shorthand is combined with infix notation. It is equivalent to each of these incantations:
    1. item => item != -1
    2. item => item.!=(-1)
    3. _.!=(-1)
    4. _ != -1 This is my preferred style because it is reasonably terse while remaining understandable to most programmers.
    5. -1 != _

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.