Mike Slinn

Hash Tables

— Draft —

Published 2014-03-10. Last modified 2019-07-29.
Time to read: 13 minutes.

Hash tables are concrete types that provide quick access to specific items in a collection. Scala supports two broad classes of hash tables: hash sets and hash maps. Mutable and immutable versions of each exist, and they sometimes behave differently. Sometimes the class names of similar classes in the mutable and immutable packages are the same, and sometimes the corresponding class name for each package is quite different.

This table shows the hash table traits and concrete implementations that this lecture will discuss:

Trait Implementations
Set

HashSet (mutable & immutable.

LinkedHashSet (mutable.

ListSet (immutable.

BitSet (mutable & immutable.

Map

HashMap (mutable & immutable.

ListMap (mutable & immutable.

SeqMap

LinkedHashMap (mutable.

TreeSeqMap (immutable.

SortedSet TreeSet (mutable & immutable)
SortedMap TreeMap (mutable & immutable)

Paraphrasing the Scala documentation for mutable hash tables, this passage applies to both mutable and immutable hash tables.

Hash tables are fast so long as the objects placed in them have a good distribution of hash codes. Iteration over a hash table is not guaranteed to occur in any particular order. Iteration simply proceeds through the underlying array in whichever order it happens to be in. To get a guaranteed iteration order, use a linked hash map or set instead of a regular one.

Further to the last sentence, Scala 2.13 introduced LinkedHashSet, and LinkedHashMap. We will discuss both of these later in this lecture.

ListSet and ListMap have been around a long time, yet I have not found any decent information on when to use them beyond this statement:

ListMap and ListSet incur O(n) complexity on operations such as head, inserting or removing entries. Merely last and init are constant time operations.

I included these collections in the benchmark program described in the Collections Overview lecture. When I’ve got data to report I’ll update this lecture with the facts. Until then, just know that these collections have different performance characteristics from the default Set and Map implementations, however exactly what that means is speculative at this time.

Be Specific! (mutable or immutable)

It is best to request the type of collection you want (mutable or immutable). As I’ve previously mentioned, you could just import mutable and immutable.

Scala REPL
scala> import scala.collection.{mutable, immutable}
import scala.collection.{mutable,immutable} 

Or you could import everything from the scala.collection package:

Scala REPL
scala> import scala.collection._
import scala.collection._ 

The rest of this lecture assumes that we’ve typed one of the previous two import statements into the Scala REPL; it does not matter which import is used.

Set

The sample code for these sections is provided in the SetFun object of HashFun.scala.

A set is a collection without duplicate elements. scala.collection.Set is the base trait that defines the common behavior for the mutable.Set and immutable.Set traits, which are themselves used as the basis for concrete implementations such as BitSet, HashSet, ListSet and LinkedHashSet. If you just specify the type Set, the Scala compiler will understand you want the type to be scala.collection.Set, which is neither mutable nor immutable. This base trait might not have all the methods you need or expect.

Scala REPL
scala> val set = Set(0, 2, 4, 6)
immSet: scala.collection.Set[Int] = HashSet(0, 10, 6, 2, 8, 4) 

The current default implementation of mutable.Set is mutable.HashSet, which we will get to know during this lecture.

The default implementation of an immutable.Set depends on the number of elements in the set, however you should not normally care how immutable.Set is implemented. For the curious: an empty immutable.Set is represented by private singleton object called EmptySet; sets with up to four items are represented by singleton objects that store elements as fields (Set1, Set2, Set3 and Set4); larger immutable sets are implemented as a private class called immutable.HashTrieSet.

Now let’s make two Sets with the same data, one mutable and one immutable.

Scala REPL
scala> val immSet = immutable.Set(0, 2, 4, 6, 8, 10)
immSet: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 2, 8, 4)
scala>
val mutSet = mutable.Set(2, 3, 5, 7, 11, 13) mutSet: scala.collection.mutable.Set[Int] = HashSet(2, 3, 5, 7, 11, 13)

Add An Element

Adding an element to a mutable hash table (sets and maps) can normally be done in constant time. We can add single elements to a mutable.Set with the += method.

Scala REPL
scala> mutSet += 123
res0: mutSet.type = HashSet(2, 3, 5, 7, 11, 123, 13)

We can also add single elements to a mutable.Set with the addOne method; actually the same method gets called whether you invoke the += or the addOne methods:

Scala REPL
scala> mutSet.addOne(23).addOne(29)
res1: mutSet.type = HashSet(2, 3, 5, 7, 23, 11, 123, 13, 29) 

Conversely, create a new immutable.Set by adding single elements to an immutable.Set with the incl (include) method.

Scala REPL
scala> val largerImmSet = immSet.incl(8).incl(10)
largerImmSet: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 2, 8, 4) 

Remove An Element

We can remove single elements from a mutable.Set with the -= method:

Scala REPL
scala> mutSet -= 123
res0: mutSet.type = HashSet(2, 3, 5, 7, 23, 11, 13, 29) 

We can also remove single elements from a mutable.Set with the subtractOne method; actually the same method gets called whether you invoke the -= or the subtractOne methods:

Scala REPL
scala> mutSet.subtractOne(23).subtractOne(29)
res1: mutSet.type = HashSet(2, 3, 5, 7, 11, 13) 

Conversely, we can create a new immutable.Set by removing single elements from an immutable.Set with the excl (exclude) method.

Scala REPL
scala> val smallerImmSet = immSet.excl(8).excl(10)
smallerImmSet: scala.collection.immutable.Set[Int] = HashSet(0, 6, 2, 4) 

Intersection

Compute the intersection (common elements) between any two Sets using the & (intersection) operator. The type of the Set created from the intersection is determined from the first (left-most) operand. The operation is associative in terms of the data values stored in the result set, which means the order of the operands does not matter – the same result set data is computed regardless of the order of the sets, however the type of the result set (mutable or immutable) is not associative.

Scala REPL
scala> mutSet & immSet
res0: scala.collection.mutable.Set[Int] = HashSet(2)
scala>
immSet & mutSet res1: scala.collection.immutable.Set[Int] = HashSet(2)

Difference

Compute the difference (unique elements) between any two Sets using the &~ (difference) operator. Again, the type of the Set created from the difference is determined from the first (left-most) operand. Also notice that the difference operator is not associative.

Scala REPL
scala>  mutSet &~ immSet
res2: scala.collection.mutable.Set[Int] = HashSet(17, 3, 19, 5, 7, 23, 11, 13)
scala>
immSet &~ mutSet res3: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 8, 4)

Working with Other Seq Implementations

Add elements to a mutable.Set from any Seq implementations using the ++ (concatenate) operator.

Scala REPL
scala> mutSet ++ List(123)
res4: scala.collection.mutable.Set[Int] = HashSet(2, 3, 5, 7, 11, 13, 17, 19, 23, 123, 29) 

immutable.Set also supports the ++ operator, however this creates a new immutable.Set:

Scala REPL
scala> immSet ++ List(123)
res5: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 2, 123, 8, 4) 

Create a new immutable.Set by remove all elements in mutSet from immSet using the removedAll operator.

Scala REPL
scala> immSet.removedAll(mutSet)
res6: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 8, 4)
HashSet 

I often find when working with sets that the code reads better without punctuation. We discussed infix notation in the Learning Scala Using The REPL lecture of the Introduction to Scala course.

Scala REPL
scala> immSet removedAll mutSet
res55: scala.collection.immutable.Set[Int] = HashSet(0, 10, 6, 8, 4) 

HashSet

We can define mutable and immutable instances of HashSet. An immutable.HashSet is like a List without duplicates, and a mutable.HashSet is like a mutable.Buffer without duplicates.

Scala REPL
scala> val immHashSet = immutable.HashSet(2, 5, 1)
immHashSet: scala.collection.immutable.HashSet[Int] = HashSet(5, 1, 2)
scala>
val mutHashSet = mutable.HashSet(2, 5, 1) mutHashSet: scala.collection.mutable.HashSet[Int] = HashSet(1, 2, 5)

All the operators we just discussed for mutable.Set and immutable.Set work with the respective HashSet implementations, of course.

Say What You Mean

If you want a Set, ask for a HashSet because that’s what you’ll get unless you specify some other type of Set, and you might as well communicate the allocated type to the programmers who must read your code in the future. To be perfectly clear, instead of writing this:

Scala code
import scala.collection.immutable._
val x = Set(2, 4)

I suggest you get in the habit of writing:

Scala code
import scala.collection.immutable
val x = immutable.HashSet(2, 4)

... or:

Scala code
import scala.collection.mutable
val x = mutable.HashSet(2, 4)

LinkedHashSet

Both HashSet and LinkedHashSet are like List without duplicates. LinkedHashSet maintains FIFO ordering, while HashSet is unordered. LinkedHashSet is only available as a mutable collection. You can see the difference here.

Scala REPL
scala> import scala.collection.mutable
scala> mutable.HashSet(4, 2, 4) res0: scala.collection.mutable.HashSet[Int] = HashSet(2, 4)

Multiple instances of any number passed to the constructor are ignored, which is why the second 4 was not included in the HashSet. Also notice that the order of the elements is not specified.

Scala REPL
scala> import scala.collection.mutable
scala> mutable.LinkedHashSet(4, 2, 4) res1: scala.collection.mutable.LinkedHashSet[Int] = Set(4, 2)

The same rules for constructing HashSets also apply when constructing LinkedHashSets, however the LinkedHashSet preserves the ordering of the original data.

BitSet

The Scaladoc for BitSet is almost completely devoid of useful information. Wikipedia to the rescue!

A bit array (also known as bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores k*w bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

A bit array is a mapping from some domain (almost always a range of integers) to values in the set {0, 1}. The values can be interpreted as dark/light, absent/present, locked/unlocked, valid/invalid, et cetera. The point is that there are only two possible values, so they can be stored in one bit. As with other arrays, the access to a single bit can be managed by applying an index to the array. Assuming its size (or length) to be n bits, the array can be used to specify a subset of the domain (e.g. {0, 1, 2, ..., n−1}), where a 1-bit indicates the presence and a 0-bit the absence of a number in the set. This set data structure uses about n/w words of space, where w is the number of bits in each machine word. Whether the least significant bit (of the word) or the most significant bit indicates the smallest-index number is largely irrelevant, but the former tends to be preferred (on little-endian machines).
 – Wikipedia

BitSets are not efficient for storing large ranges of integer values. Scala provides mutable and immutable versions of BitSet.

Scala REPL
scala> val emptyMutableBits = mutable.BitSet.empty
emptyMutableBits: scala.collection.mutable.BitSet = BitSet()
scala>
val emptyImmBits = immutable.BitSet.empty emptyImmBits: scala.collection.immutable.BitSet = BitSet()

You can create BitSets easily; we will use these definitions in the next section on Sets. The data contains integers, but these integers are not the actual data, they are the indices for the 1 values; all other values are set to zero.

Scala REPL
scala> val immBits = immutable.BitSet(2, 3, 5, 7, 11)
immBits: scala.collection.immutable.BitSet = BitSet(2, 3, 5, 7, 11)
scala>
val mutBits = mutable.BitSet(0, 2, 4, 6, 8, 10) mutBits: scala.collection.mutable.BitSet = BitSet(0, 2, 4, 6, 8, 10)

Here is an example of concatenating two immutable BitSets together to form a new BitSet. As we saw before, the type of result collection is determined from the type of the left-most collection.

Scala REPL
scala> val funnyBits = immBits ++ mutBits
funnyBits: scala.collection.immutable.BitSet = BitSet(0, 2, 3, 4, 5, 6, 7, 8, 10, 11) 

Map

The sample code for these sections is provided in the MapFun object of HashFun.scala.

A map is an collection of key-value pairs, and Scala Maps are parametric in two types: the key type and the value type. The Scala Map trait is subclassed by a number of concrete classes, most of which we will discuss in this lecture. The most commonly used concrete Map classes are mutable.HashMap and immutable.HashMap. All Scala Maps can be constructed from tuples of arity 2. For the examples which follow for most of the rest of this lecture I will use the data stored in listOfTuples, and the placeholder splat syntax introduced in the Classes Part 2 lecture of the Introduction to Scala course.

Scala REPL
scala> val listOfTuples = List(2 -> "b", 1 -> "a", 3 -> "c")
listOfTuples: List[(Int, String)] = List((2,b), (1,a), (3,c)) 

For example, the following defines an immutable.HashMap called immMap with Int keys and String values, using the data stored in listOfTuples.

Scala REPL
scala> val immMap: Map[Int, String] =
  immutable.HashMap(listOfTuples: _*)
immMap: scala.collection.Map[Int,String] = HashMap(1 -> a, 2 -> b, 3 -> c) 

We just defined a map whose declared type is neither mutable nor immutable, even though it was constructed as an instance of immutable.HashMap. I don’t recommend you do this. Instead, explicitly specify whether the Map is mutable or immutable, like this.

Scala REPL
scala> val immMap: immutable.HashMap[Int, String] =
  immutable.HashMap(listOfTuples: _*)
immMap: scala.collection.immutable.HashMap[Int,String] = HashMap(1 -> a, 2 -> b, 3 -> c) 

If you do not declare the type, the compiler will be able to figure out the type signature of the map from the constructor used and the types of the key/value tuples.

Creating an Empty Map

The compiler completely relaxes the types of the keys and values (by setting their types to Nothing) if they are not specified when creating an empty HashMap. You almost never want this behavior. Recall from the Type Hierarchy and Equality lecture of the Introduction to Scala course that Nothing is a subtype of all other types.

Scala REPL
scala> val immEmpty = immutable.HashMap.empty
immEmpty: scala.collection.immutable.Map[Nothing,Nothing] = Map()
scala>
val mutEmpty = mutable.HashMap.empty mutEmpty: scala.collection.immutable.Map[Nothing,Nothing] = Map()

The type of the Map keys and values should either be specified on the left hand side of the equals sign or on the right hand side.

Scala REPL
scala> val immEmpty: immutable.HashMap[Int, String] =
  immutable.HashMap.empty
immEmpty: scala.collection.immutable.HashMap[Int,String] = HashMap()
scala>
val immEmpty = immutable.HashMap.empty[Int, String] immEmpty: scala.collection.mutable.Map[Nothing,Nothing] = Map()

Map.get vs Map.apply

Notice the difference between Map.get and the default method Map.apply:

Scala REPL
scala> immMap.get(1)
res0: Option[String] = Some(a )
scala>
immMap(1) res1: String = eh

Map.get returns an Option of the map element type, so if there is no value for the specified key, None is returned. In contrast, Map.apply returns the value for the given key, or throws a NoSuchElementException if the specified key has no value.

Scala REPL
scala> immMap.get(0)
res2: Option[String] = None
scala>
immMap(0) java.util.NoSuchElementException: key not found: 0

Default Values

Depending on your application, you might want to provide a default value, which will be returned if a key is not found. Here is a mutable map created with a permanent default value.

Scala REPL
scala> val mutMap = mutable.HashMap(listOfTuples: _*).withDefaultValue("eh")
map: scala.collection.mutable.Map[Int,String] = Map(1 -> a, 2 -> b, 3 -> c)
scala>
mutMap(0) res3: String = eh

If the map already exists, or you want to specify a different default value, you can provide a one-time default value.

Scala REPL
scala> mutMap.getOrElse(0, "defaultValue")
res4: String = defaultValue 

Note that the default value is returned verbatim, and is not wrapped in Some. This makes Map.getOrElse similar to Map.apply with a one-time default value.

Adding Key/Value Pair(s) to a Map

You can create a new immutable.Map that includes the data from an existing immutable.Map, with a new key/value tuple appended, by using the + operator.

Scala REPL
scala> immMap + (3 -> "c")
res3: scala.collection.immutable.HashMap[Int,String] = HashMap(1 -> a, 2 -> b, 3 -> c) 

For mutable.Map, use the += method, which is an alias for addOne.

Scala REPL
scala> mutMap += (3 -> "c")
res5: mutMap.type = Map(1 -> a, 2 -> b, 3 -> c)
scala>
mutMap.addOne(4 -> "d") res6: mutMap.type = Map(1 -> a, 2 -> b, 3 -> c, 4 -> d)

You can also append multiple tuples at once by first converting them to a Map. Specifying a tuple that has a key that is already present in the map causes the old value to be replaced with the new value.

Scala REPL
scala> mutMap ++= Map(3 -> "sea", 5 -> "e")
res7: mutMap.type = Map(1 -> a, 2 -> b, 3 -> sea, 4 -> d, 5 -> e) 

I prefer to use concat instead of ++=.

Scala REPL
scala> mutMap concat Map(6 -> "f")
res8: scala.collection.mutable.Map[Int,String] = Map(1 -> a, 2 -> b, 3 -> c, 5 -> e, 6 -> f) 

Maps can be combined with the ++ method, which is an alias for addAll.

Scala REPL
scala> mutMap ++ immMap
map9: scala.collection.mutable.Map[Int,String] = Map(1 -> a, 2 -> b, 3 -> c, 4 -> d, 5 -> e)
scala>
immMap ++ mutMap res10: scala.collection.immutable.HashMap[Int,String] = HashMap(1 -> q, 2 -> b, 3 -> c)
scala>
mutMap.addAll(Map(6 -> "f", 7 -> "g")) res11: mutMap.type = Map(1 -> a, 2 -> b, 3 -> sea, 4 -> d, 5 -> e, 6 -> f, 7 -> g)

Keys and Values

You can obtain all of the keys from a Map.

Scala REPL
scala> mutMap.keys
res10: Iterable[Int] = Set(1, 2, 3, 4, 5, 6, 7) 

Notice that the keys property is defined as conforming to the Iterable trait, which is implemented as a Set.

Scala REPL
scala> mutMap.keySet
res11: scala.collection.Set[Int] = Set(1, 2, 3, 4, 5, 6, 7) 

The keySet property is defined as being an immutable.Set[Int], which makes for more stable code; no worries about a future revision of the collection classes being implemented such that the keys property returning something other than Set. I suggest you use the keySet property instead of the keys property for that reason.

You can get the collection of values as a lazy collection; in fact this is a view. We discussed views in the preceding Collections Overview lecture.

Scala REPL
scala> mutMap.values
res12: Iterable[String] = View(<not computed>) 

You can convert the lazy collection to a strict collection with a converter of your choice, here I used toList. See the upcoming To Converters lecture for more information about converters.

Scala REPL
scala> mutMap.values.toList
res13: List[String] = List(a, b, c, d, e, f, g) 

You can test to see if a key exists for a map.

Scala REPL
scala> immMap.isDefinedAt(42)
res14: Boolean = false
scala>
immMap isDefinedAt 42 // infix notation res15: Boolean = false
scala>
mutMap.isDefinedAt(2) res16: Boolean = true

Transforming a Map

The mapValues transforms a Map by applying a function to every value in the collection. We need a collection to work with for demonstration purposes. Lets pretend that we built this collection earlier in the program, and now we need to modify it if some condition occurs.

Scala REPL
scala> val fruit = List("apricot", "banana", "clementine", "durian",
"fig", "guava", "jackfruit", "kiwi", "lime", "mango")
fruit: List[String] = List(apricot, banana, clementine, durian, fig, guava, jackfruit, kiwi, lime, mango)
scala>
val fruitTuples = fruit.zipWithIndex.map(x => x._2 -> x._1) fruitTuples: List[(Int, String)] = List((0,apricot), (1,banana), (2,clementine), (3,durian), (4,fig), (5,guava), (6,jackfruit), (7,kiwi), (8,lime),(9,mango))

We’ll discuss how combinators like zipWithIndex and map work in the Combinators lecture.

Scala REPL
scala> val zippedFruit = immutable.HashMap(fruitTuples: _*)
zippedFruit: scala.collection.immutable.HashMap[Int,String] = HashMap(0 -> apricot, 5 -> guava, 1 -> banana, 6 -> jackfruit, 9 ->mango, 2 -> clementine, 7 -> kiwi, 3 -> durian, 8 -> lime, 4 -> fig) 

Let’s say for some reason we now want to capitalize the first letter of every value in the zippedFruit immutable.HashMap[Int, String]. This means we need to capitalize the names of each fruit. Prior to Scala 2.13 this was how you might use mapValues, and it returned a strict collection. Since Scala 2.13 it returns a lazy collection and generates a warning.

Scala REPL
scala> zippedFruit.mapValues(_.capitalize)
^
       warning: method mapValues in trait MapOps is deprecated (since 2.13.0): Use .view.mapValues(f).
A future version will include a strict version of this method (for now, .view.mapValues(f).toMap).
res15: scala.collection.MapView[Int,String] = MapView(<not computed>) 

This is the incantation for Scala 2.13+. The lambda function capitalizes every value. We discussed lambda functions in the More Fun With Functions and the Lambda Review & Drill lecture of the Introduction to Scala course.

Scala REPL
scala> zippedFruit.view.mapValues(_.capitalize).toMap
res16: scala.collection.immutable.Map[Int,String] = HashMap(5 -> Fig, 10 -> Mango, 1 -> Apricot, 6 -> Guava, 9 -> Lime, 2 -> Banana, 7 -> Jackfruit, 3 -> Clementine, 8 -> Kiwi, 4 -> Durian) 

The collection returned by mapValues is converted to a strict collection by toMap.

Updating a Map

You can update a specific key. For an immutable map, this means that a modified copy of the original map is returned.

Scala REPL
scala> immMap.updated(1, "q")
res17: scala.collection.immutable.HashMap[Int,String] = HashMap(1 -> q, 2 -> b, 3 -> c) 

Mutable maps want to be treated slightly differently.

Scala REPL
scala> mutMap.updated(1, "q")
^
warning: method updated in trait MapOps is deprecated (since 2.13.0): Use m.clone().addOne((k,v)) instead of m.updated(k, v)
res60: scala.collection.mutable.HashMap[Int,String] = HashMap(1 -> q, 2 -> b, 3 -> c) 

We’ve seen this before:

Scala REPL
scala> mutMap.addOne(1, "q")
res61: mutMap.type = HashMap(1 -> q, 2 -> b, 3 -> c) 

But Wait! There is More!

There are lots more methods defined for Map. The Combinators lecture has a section on Map. The Scala Collections documentation has even more information.

HashMap

HashMap is an implementation of the Map trait. HashMap is very similar to its Java counterpart, with the behavior of IterableOnce mixed in. We have already discussed how HashMap is provided in mutable and immutable versions. Once again, here is how to create empty HashMaps.

Scala REPL
scala> val immMap = immutable.HashMap.empty[Int, String]
immMap: scala.collection.immutable.HashMap[Int,String] = Map()
scala>
val mutMap = mutable.HashMap.empty[Int, String] mutMap: scala.collection.mutable.HashMap[Int,String] = Map()

Constructing mutable and immutable HashMaps is done the same way:

Scala REPL
scala> val immMap = immutable.HashMap(listOfTuples: _*)
immMap: scala.collection.immutable.HashMap[Int,String] = HashMap(1 -> a, 2 -> b, 3 -> c)
scala>
val mutMap = mutable.HashMap(listOfTuples: _*) mutMap: scala.collection.mutable.HashMap[Int,String] = HashMap(1 -> a, 2 -> b, 3 -> c)

The Scala documentation has a good discussion on mutable HashMaps.

Immutable HashMaps are implemented differently, as described in the Scala documentation. For a detailed explanation of how they were implemented for Scala 2.13 please see Optimizing Hash-Array Mapped Tries forFast and Lean Immutable JVM Collections.

SeqMap

scala.collection.SeqMap is a generic trait for ordered maps. LinkedHashMap is the default mutable.SeqMap implementation, and TreeSeqMap is the default immutable.SeqMap implementation.

The sample code for these sections is provided in the SeqMapFun object of HashFun.scala.

LinkedHashMap

Both HashMap and LinkedHashMap implement AbstractMaps using hash tables. While HashMap is available in mutable and immutable versions, LinkedHashMap is only available as a mutable collection. LinkedHashMap maintains FIFO ordering, while HashMap is unordered.

Scala REPL
scala> mutable.LinkedHashMap(listOfTuples: _*)
res1: scala.collection.mutable.LinkedHashMap[Int,String] = LinkedHashMap(2 -> b, 1 -> a, 3 -> c) 

Adding an element to a mutable hash table (sets and maps) can normally be done in constant time.

Scala REPL
scala> val mutLhm = mutable.LinkedHashMap.empty[Int, String]
mutLhm: scala.collection.mutable.LinkedHashMap[Int,String] = LinkedHashMap()
scala>
mutLhm += 13 -> "Buy a dog" res2: map.type = LinkedHashMap(13 -> Buy a dog)
scala>
mutLhm += 31 -> "Sell the cat" res3: map.type = LinkedHashMap(13 -> Buy a dog, 31 -> Sell the cat)

TreeSeqMap

[TreeSeqMap] implements an immutable map that preserves order using a hash map for the key to value mapping to provide efficient lookup, and a tree for the ordering of the keys to provide efficient insertion/modification order traversal and destructuring.
Scala REPL
scala> val tsm = immutable.TreeSeqMap(listOfTuples: _*)
tsm: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c) 

As we already know adding a new element or modifying an existing element in an immutable collection creates a new copy of the collection with the desired modifications.

Scala REPL
scala> tsm.updated(13, "m")
res5: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c, 13 -> m)
scala>
tsm ++ Map(14 -> "n") res6: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c, 13 -> m)

TreeSeqMap.OrderBy

The TreeSeqMap’s Scaladoc goes on to say:

By default insertion order (TreeSeqMap.OrderBy.Insertion) is used, but modification order (TreeSeqMap.OrderBy.Modification) can be used instead if so specified at creation.

The orderingBy(orderBy: TreeSeqMap.OrderBy): TreeSeqMap[K, V] method can be used to switch to the specified ordering for the returned map.

A key can be manually refreshed (i.e. placed at the end) via the refresh(key: K): TreeSeqMap[K, V] method (regardless of the ordering in use).

To summarize, the default is to update items in place, but specifying OrderBy.Modification causes modified items to be placed at the end. Let’s try this out. First we will create two instances of immutable.TreeSeqMap. We’ll call the instance with the default immutable.TreeSeqMap.OrderBy.Insertion ordering insertionOrdered, while we’ll call the one with immutable.TreeSeqMap.OrderBy.Modification ordering modificationOrdered.

Scala REPL
scala> import immutable.TreeSeqMap.OrderBy
import immutable.TreeSeqMap.OrderBy
scala>
val insertionOrdered: immutable.TreeSeqMap[Int, String] = immutable.TreeSeqMap. newBuilder(OrderBy.Insertion). // OrderBy.Insertion is default addAll(listOfTuples). result insertionOrdered: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c)

Note that the incantation to create insertionOrdered yields the same result as the previous, and more simply constructed version called tsm, which is simpler to construct because it uses default values and the apply method instead of the more flexible newBuilder method. Here it is again:

Scala REPL
scala> val tsm = immutable.TreeSeqMap(listOfTuples: _*)
tsm: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c) 

Now lets create modificationOrdered:

Scala REPL
scala> val modificationOrdered: immutable.TreeSeqMap[Int, String] =
  immutable.TreeSeqMap.
  newBuilder(OrderBy.Modification).
  addAll(listOfTuples).
  result
modificationOrdered: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> a, 3 -> c) 

Now we can modify a value in each TreeSeqMap and see where the new values appear. Because TreeSeqMap is immutable, a new collection is returned when adding new items to the collection.

Scala REPL
scala> insertionOrdered + (1 -> "eh")
res7: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> eh, 3 -> c)
scala>
modificationOrdered + (1 -> "eh") res8: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 3 -> c, 1 -> eh)

This confirms that OrderBy.Insertion preserves the order across updates, while OrderBy.Modification appends modified values to the end of the TreeSeqMap.

Finally, we can user orderingBy to specify a one-time OrderBy:

Scala REPL
scala> insertionOrdered.orderingBy(immutable.TreeSeqMap.OrderBy.Modification) + (1 -> "q")
res9: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 3 -> c, 1 -> q)
scala>
modificationOrdered.orderingBy(immutable.TreeSeqMap.OrderBy.Insertion) + (1 -> "q") res10: scala.collection.immutable.TreeSeqMap[Int,String] = TreeSeqMap(2 -> b, 1 -> q, 3 -> c)

Sorted Sets and Maps

The sample code for these sections is provided in the SortFun object of HashFun.scala.

Scala collection traits exist for sorted sets and maps. The traits start with the prefix Sort (for example, immutable.SortedSet, mutable.SortedSet, immutable.SortedMap and mutable.SortedMap), but the concrete implementations start with the Tree prefix (for example immutable.TreeSet, mutable.TreeSet, immutable.TreeMap and mutable.TreeMap).

TreeSet

A TreeSet represents a sorted set. Mutable and immutable versions of TreeSet exist in order to represent sorted sets. You can call the default apply method for the desired trait, which acts as a factory for a concrete subclass of the trait.

Scala REPL
scala> immutable.TreeSet(3, 1, 2)
res0: scala.collection.immutable.SortedSet[Int] = TreeSet(1, 2, 3)
scala>
mutable.TreeSet(3, 1, 2) res1: scala.collection.mutable.SortedSet[Int] = TreeSet(1, 2, 3)

The default SortedSet is a TreeSet; this is true for mutable and immutable sets.

You can create a sorted set from an existing unsorted set by using the to method, which accepts a type parameter. The Collection Converters lecture talks more about the to method.

Scala REPL
scala> immutable.HashSet(1, 2, 3).to(immutable.TreeSet)
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(1, 2, 3)
scala>
immutable.HashSet(1, 2, 3).to(immutable.SortedSet) // I do not like using this syntax res3: scala.collection.immutable.SortedSet[Int] = TreeSet(1, 2, 3)
scala>
mutable.HashSet(1, 2, 3).to(mutable.TreeSet) res4: scala.collection.mutable.SortedSet[Int] = TreeSet(1, 2, 3)
scala>
mutable.HashSet(1, 2, 3).to(mutable.SortedSet) // I do not like using this syntax res5: scala.collection.mutable.SortedSet[Int] = TreeSet(1, 2, 3)

TreeMap

A TreeMap represents a map sorted by key. Mutable and immutable versions of TreeMap exist in order to represent sorted maps.

Here we call the default apply method for an immutable.TreeMap and a mutable.TreeMap.

Scala REPL
scala> immutable.TreeMap(tuples: _*)
res7: scala.collection.immutable.TreeMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)
scala>
mutable.TreeMap(tuples: scala> _*) res8: scala.collection.mutable.TreeMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)

The default SortedMap is a TreeMap; this is true for mutable and immutable maps.

You can create a sorted map from an existing unsorted map by using the to method, which accepts a type parameter. The Collection Converters lecture talks more about the to method.

Scala REPL
scala> immutable.HashMap(listOfTuples: _*).to(immutable.TreeMap)
res2: scala.collection.immutable.TreeMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)
scala>
mutable.HashMap(listOfTuples: _*).to(mutable.TreeMap) res3: scala.collection.mutable.TreeMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)
scala>
immutable.HashMap(listOfTuples: _*).to(immutable.SortedMap) // I do not like using this syntax res4: scala.collection.immutable.SortedMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)
scala>
mutable.HashMap(listOfTuples: _*).to(mutable.SortedMap) // I do not like using this syntax res5: scala.collection.mutable.SortedMap[Int,String] = TreeMap(1 -> a, 2 -> b, 3 -> c)

Defining Custom String Interpolation to Access a Map

Given a Map, would it not be nice to have a shorthand way of looking up a value from a key, and to provide a default value? If the key is a String then we can use custom string interpolation to define our own syntax for the lookup. We first encountered String interpolation in the Learning Scala Using the REPL lecture of the Introduction to Scala course.

Scala 2.10 introduced processed strings by enhancing the Scala compiler such that when it encounters a string prefix it looks for a method with a matching name in the StringContext class. StringContext defines the s, f and raw prefixes. Click on those prefixes to learn more about those methods.

Here are examples of the s and f prefixes in action:

Scala REPL
scala> val x = "blah"
x: String = blah
scala>
s"Value of x is $x" res0: String = Value of x is blah
scala>
val y = 123.45 y: Double = 123.45
scala>
f"Value of y is $y%.4f" res1: String = Value of y is 123.4500

You can define an implicit class to augment the StringContext class methods as discussed in the Implicit Classes lecture earlier in this course. This will cause the Scala compiler to recognize additional string prefixes for new types of String interpolation. Assuming that a custom String interpolator triggered by a dollar sign ($) is stored in a file called StrInterp.scala, and that it also defines an internal Map[String, Int], here is an example of how we could use it. The Map defines the a key but not the z key.

Scala REPL
$ scala> scala -i src/main/scala/StrInterp.scala
Loading src/main/scala/StrInterp.scala...
defined object StrInterp
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67). Type in expressions to have them evaluated. Type :help for more information.
scala>
import StrInterp._ import StrInterp._
scala>
$"a" res2: Int = 1
scala>
$"z" res3: Int = 0

The code to implement this is quite short. The StrInterp object merely exists to hold the MapLookup implicit class because implicit classes are not allowed to be top-level definitions. Notice that the MapLookup implicit class accepts a StringContext, which is the basis of Scala’s string interpolation feature; you can define your own string interpolator by extending the StringContext method. The MapLookup implicit class contains a Map[String, Int] that can be looked up by invoking the $ method. If the lookup key is not defined by the Map, the $ method returns 0.

Scala code
object StrInterp {
  implicit class MapLookup(val sc: StringContext) {
    val map = Map(("a", 1), ("b", 2), ("c", 3)).withDefaultValue(0)
def $(args: Any*): Int = { val orig = sc.s(args: _*) map(orig) } } } }

Two points about the $ method:

  1. The method accepts a variable number of arguments of Any type, denoted by the Any* type notation. These arguments are then supplied to the StringContext.s method as they were received by notation args: _*.
  2. The call to StringContext.s is not necessary unless you want to be able to support variable substitution in the key, like this:
    Scala REPL
    scala> val variable = "b"
    variable: String = b
    scala>
    $"$variable" res4: Int = 2
    or:
    Scala REPL
    scala> $"prefix${variable}suffix"
    res5: Int = 0 

    or even this nightmarish recursive application:
    Scala REPL
    scala>  $"""${"abc".substring($"a", 2)}"""
    res6: Int = 2 

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.