Published 2019-08-06.
Last modified 2019-08-07.
Time to read: 4 minutes.
Scala supports UTF-8 data and UTF-8 characters in source code. This lecture discusses how to work with UTF-8 and Scala for Mac, Linux and Windows users.
Scala is written in Unicode, specifically UTF-8, according to the rules defined in the Lexical Syntax section of the Scala Language Specification:
Scala programs are written using the Unicode Basic Multilingual Plane (BMP) character set; Unicode supplementary characters are not presently supported.
This chapter defines the two modes of Scala’s lexical syntax, the Scala mode and the XML mode.
If not otherwise mentioned, the following descriptions of Scala tokens refer to Scala mode,
and literal characters ‘c’ refer to the ASCII fragment \u0000
– \u007F
.
In Scala mode, Unicode escapes are replaced by the corresponding Unicode character with the given hexadecimal code.
UnicodeEscape ::= ‘\’ ‘u’ {‘u’} hexDigit hexDigit hexDigit hexDigit
hexDigit ::= ‘0’ | … | ‘9’ | ‘A’ | …
| ‘F’ | ‘a’ | … | ‘f’
To construct tokens, characters are distinguished according to the following classes (Unicode general category given in parentheses):
Whitespace characters. \u0020 | \u0009 | \u000D | \u000A.
Letters, which include lower case letters (Ll), upper case letters (Lu), titlecase letters (Lt),
other letters (Lo), modifier letters (Ml), letter numerals (Nl) and the two characters \u0024 ‘$’ and \u005F ‘_’.
Digits ‘0’ | … | ‘9’.
Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
Delimiter characters ‘`’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’.
Operator characters. These consist of all printable ASCII characters (\u0020 - \u007E) that are
in none of the sets above, mathematical symbols (Sm) and other symbols (So).
Arrows
Scala 2.13 deprecated the use of UTF-8 special characters such as left arrow (←
), right arrow,
(=>
), and the right double arrow, also known as the rocket operator, (=>
).
Multicharacter symbol equivalents are encouraged now: <-
, ->
and =>
.
UTF-8 Characters
Scala has full support for UTF-8, including using UTF-8 in variable and method names. You may find it desirable to use UTF-8 characters in your code. UTF-8 characters are especially useful when writing Domain Specific Languages (DSLs) in Scala. A detailed list of Unicode Hex keycodes is shown in the third column of this page.
The suggested configuration for SBT, discussed in a few lectures, automatically enables UTF-8 characters in the Scala REPL,
discussed in the next lecture.
If you want to be able to paste in UTF-8 characters into a Scala REPL without using SBT,
define the -Dfile.encoding=utf8
Java system variable when invoking Scala.
One way of doing that is to set up an alias in ~/.bash_aliases
:
alias scala="scala -Dfile.encoding=utf8"
Linux
Press the Ctrl+Shift+u keys, release, type the four hex digits, and press Enter or Spacebar. On some systems, press and release Shift or Ctrl. For example:
-
To enter a left arrow (
←
), press the Ctrl+Shift+u keys, release, type2190
and then press Spacebar or Enter. -
To enter a right arrow (
=>
), press the Ctrl+Shift+u keys, release, type2192
and then press Spacebar or Enter. -
To enter a right double arrow (
=>
), press the Ctrl+Shift+u keys, release, type21D2
and press Spacebar or Enter.
Mac
To enable this for Mac:
- System Preferences / Keyboard / Input Sources
- Ensure Show Input menu in menu bar is enabled
- Click +
- Browse to the bottom language ("Others")
- Select Unicode Hex Input
- Click Add.
- On the menu bar, find the icon for the language input and change it to Unicode Hex Input
For example:
-
To enter a left arrow (
←
), type Option–2190
(hold down Option while typing the four numbers). -
To enter a right arrow (
=>
), type Option–2192
. -
To enter a right double arrow (
=>
), type Option–21D2
.
Windows
BabelMap is much better than the standard character map that comes with Windows, and offers many more features.
BabelMap also correctly finds all Unicode characters (unlike the Windows Character Map). Once you have installed BabelMap:
- Open the BabelMap Application (you may want to pin it to the Task Bar and Start menu, and assign it a keyboard shortcut).
- Enter the Unicode code in the Go to Code Point box and click Go.
- Click on the selected character.
- Copy the character to the clipboard with the Copy button, and paste it into the program text that you are editing.
UTF-8 and Backquotes
You can break the naming convention rules by enclosing the rebel string in backquotes.
In this example, I define a method whose name (¯⧵_(ツ)_/¯
) must be enclosed in backquotes
because the characters would not otherwise be parsed as a single token.
The method calls
sys.error
,
which is provided by Scala runtime library;
the method outputs a message to stderr
and terminates the program.
scala> def `¯⧵_(ツ)_/¯`(msg: String) = sys.error(s"¯⧵_(ツ)_/¯ $msg") $u00AF$u29F5_$u0028ツ$u0029_$div$u00AF: (msg: String)Nothing
Notice the REPL shows the hex values of the UTF-8 characters preceded by $u
.
-
The first and last characters in the method name are called
Macron (¯),
and has hex value
\u00AF
. Macrons are normally used as a diacritic overlay on another character, for example ā, ē, ī, ō, ū and ӯ. - The second character is called Reverse Solidus (⧵), and it is a mathematical symbol.
- The third and seventh characters are underbars, a character known for so many reasons to Scala programmers. These are the only normally permissable characters in the method name.
- The fourth and sixth characters are open and close parentheses, and are not normally permissable within the name of a Scala method or variable.
- The fifth character is the katakana letter tu (ツ). It is often used to depict a happy face.
To invoke this awkwardly-named method, you again need to enclose its name in back quotes.
scala> `¯⧵_(ツ)_/¯`("Oops, I did it again!") java.lang.RuntimeException: ¯⧵_(ツ)_/¯ Oops, I did it again! at scala.sys.package$.error(package.scala:27) at .$u00AF$u29F5_$u0028ツ$u0029_$div$u00AF(<console>:15) ... 43 elided
It would be better to give a name to the method that does not require special handling, such as shrug
:
scala> def shrug(msg: String) = sys.error(s"¯⧵_(ツ)_/¯ $msg") shrug: (msg: String)Nothing
scala> shrug("Oops, I did it again!") java.lang.RuntimeException: ¯⧵_(ツ)_/¯ Oops, I did it again! at scala.sys.package$.error(package.scala:27) at .shrug(<console>:15) ... 43 elided
© Copyright 1994-2024 Michael Slinn. All rights reserved.
If you would like to request to use this copyright-protected work in any manner,
please send an email.
This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.