Almond - a Scala kernel for Jupyter

Published 2013-12-01. Last modified 2019-08-12.
Time to read: 5 minutes.

Jupyter is a novel way to combine documentation with live code, which might run on powerful distributed systems like Apache Spark, Flink and Scalding. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. This open source project supports interactive data science and scientific computing with over 40 programming languages. Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer. Although Jupyter has its roots in big data, Jupyter is generally useful for all computing needs.

This lecture starts with instructions for installing Jupyter on Mac and Ubuntu. This lecture then demonstrates how to install and work with the Almond Scala kernel for Jupyter, so students can use Scala with the Jupyter Notebook as well as the more traditional console REPL. The information provided is also largely applicable to JupyterHub, a multi-user server for Jupyter notebooks.

The Almond Scala kernel is useful for experimenting with code examples, documenting them and sharing them. This kernel provides support for various big data frameworks to be added on demand to Jupyter. From the docs.

all the Ammonite niceties
an API that libraries can rely on to interact with Jupyter front-ends
extensible plotting support
extensible support for big data libraries
Spark support, relying on ammonite-spark, extended to get progress bars among others.

It also provides libraries allowing one to write custom Jupyter kernels in Scala.

Almond is not tied to a specific version of big data frameworks. On 2019-08-13 Almond was available for Scala 2.11, 2.12 and 2.13; Scala 2.11 support will be dropped soon. Several versions of Almond can be installed side-by-side, so one notebook can support a certain version of Spark, and another notebook running simultaneously can support another version of Spark.

In addition to supporting Jupyter’s browser-based Scala REPL, Almond also supports the more traditional Scala console REPL Ammonite, which we discussed in the Ammonite lecture.

There are several other notebook user interfaces and Jupyter kernels that support Scala. This lecture does not discuss them beyond just mentioning them here and providing links.

Apache Zeppelin, a JVM-based alternative to Jupyter, with some support for Spark, Flink, and Scalding.
Apache Toree (incubating, a Jupyter kernel with Spark support.
IScala and ISpark, which added some Spark support. These projects have not been updated in several years and are should be considered dead.
Bridgewater scala-notebook (this project also seems dead) and spark-notebook, which updated portions and added Spark support.

BTW, nbviewer.jupyter.org is a popular sharing mechanism for Jupyter code examples.

October 25, 2019: Polynote was just released. If you like Jupyter, you’ll love Polynote.

Installation

Jupyter requires Python. Once Python and Jupyter are installed, we will install Scala support for Jupyter.

Mac and Linux computers normally have Python installed, and Python is now available from the Windows store for free.

If for some reason you find pip is not installed, here are the official instructions to install pip. For WSL, type.

Shell

$ sudo apt install python3-pip

Windows Subsystem for Linux (WSL)

Jupyter needs the gcc compiler, the ZeroMQ development package and the Python development package. Install all of these dependencies by typing.

Shell

$ yes | sudo apt install build-essential python3-dev python3-pip libzmq3-dev

Installing Jupyter

This is easy.

Mac

For Mac with Python 2.7, type:

Shell

$ pip install jupyter

For Python 3.x on Mac, type:

Shell

$ pip3 install jupyter

Ubuntu and Windows Subsystem for Linux

If you installed Python 2.7, type:

Shell

$ yes | sudo -H pip install jupyter

If you installed Python 3.x, type:

Shell

$ yes | sudo -H pip3 install jupyter

All OSes

Check the version of Jupyter that was just installed:

Shell

$ jupyter --version
jupyter core     : 4.5.0
jupyter-notebook : 6.0.0
qtconsole        : 4.5.2
ipython          : 7.7.0
ipykernel        : 5.1.2
jupyter client   : 5.3.1
jupyter lab      : not installed
nbconvert        : 5.6.0
ipywidgets       : 7.5.1
nbformat         : 4.4.0
traitlets        : 4.3.2

If the jupyter core version is less than 4.0, upgrade Jupyter.

Upgrading Jupyter

For Mac, upgrade by typing:

Shell

$ pip install --upgrade jupyter

For Ubuntu and Windows Subsystem for Linux, upgrade by typing:

Shell

$ yes | sudo -H pip install --upgrade jupyter

For any OS, if you need to rerun the installation, use the -I switch.

Shell

$ sudo -H pip install -I jupyter

Installing Coursier

The Almond documentation says that Coursier must be installed. I suspect that Coursier gets installed automatically along with Almond, but have not been able to verify that yet. Coursier was discussed in the SBT Global Setup lecture. To install onto Linux and Windows Subsystem for Linux, follow the instructions in that lecture’s transcript.

For Mac, type:

Shell

$ brew install --HEAD paulp/extras/coursier

Installing Almond

It is simple to build the Almond kernel using the following bash script, which is provided in the git repository for this course as buildAlmond. I was able to install this on Mac and Ubuntu without any problem. For Windows 10 I used the Windows Subsystem for Linux to run the script.

Let’s look at the buildAlmond help.

Shell

$ ./buildAlmond -h
buildAlmond - Build Almond installer and execute it, then deletes installer and lists the Jupyter kernels.
almond is a Scala kernel for Jupyter.
See https://almond.sh/docs
Options:
  -a  Specify Almond version (default is Almond 0.4.0)
  -d  Debug mode
  -f  Force overwrite of previously built kernel of the same name
  -h  Show help message
  -s  Specify Scala version (default is Scala 2.12.9)

Support Matrix

The matrix of available combinations of Scala version (-s switch) and Almond version (-a switch) is sparse. Here are the latest versions of each valid combination as of August 13, 2019.

Scala Version (`-s`)	Almond Version (`-a`)
2.11.12	0.6.0
2.12.9	0.4.0
2.13.0	0.7.0

Installing Kernels

Shell

$ buildAlmond -f  # Overwrite any previous version of the same name
Building Almond 0.4.0 for Scala 2.12.9
Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.12.9
 
$ buildAlmond -s 2.11.12 -a 0.4.0
Building Almond 0.4.0 for Scala 2.11.12
Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.11.12
 
$ buildAlmond -s 2.13.0 -a 0.7.0
Building Almond 0.7.0 for Scala 2.13.0
Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.13.0

For Mac and Ubuntu and Windows Subsystem for Linux, verify that a Jupyter kernels for Python and Scala are available now.

Shell

$ jupyter kernelspec list
Available kernels:
  scala2.11.12    /home/mslinn/.local/share/jupyter/kernels/scala2.11.12
  scala2.12.8     /home/mslinn/.local/share/jupyter/kernels/scala2.12.8
  scala2.13.0     /home/mslinn/.local/share/jupyter/kernels/scala2.13.0
  python2    /usr/local/share/jupyter/kernels/python2
  python3    /usr/local/share/jupyter/kernels/python3

Shell

$ almond --help
Usage: almond [options]
  --usage  <bool>
        Print usage and exit
  --help | -h  <bool>
        Print help message and exit
  --install  <bool>
  --force  <bool>
        erase any previously existing kernel with the same id
  --id  <string?>
        id for this kernel, instead of the default one
  --display-name  <string?>
        name for this kernel, instead of the default one
  --global  <bool>
        whether to install this kernel globally
  --jupyter-path  <string?>
  --logo  <string?>
        path to a 64x64 PNG logo for this kernel
  --arg  <string*>
        command to launch this kernel, specified argument per argument, like --arg /foo --arg some-arg
  --command  <string?>
        command to launch this kernel, as one block (then split, takes precedence over --arg)
  --interrupt-via-message  <bool>
        whether to request frontends to interrupt this kernel via a message
  --copy-launcher  <bool?>
        Whether to copy the kernel launcher in the kernelspec directory (default: false if --arg or --command specified, true else)
  --extra-repository  <string*>
  --banner  <string?>
  --link  <string*>
  --predef-code  <string>
  --predef  <string*>
  --auto-dependency  <string*>
  --force-property  <string*>
        Force Maven properties during dependency resolution
  --profile  <string*>
        Enable Maven profile (start with ! to disable)
  --log  <string>
        Log level (one of none, error, warn, info, debug)
  --log-to  </path/to/log-file>
        Send log to a file rather than stderr
  --connection-file  <string?>
  --specific-loader  <bool>
        Use class loader that loaded the api module rather than the context class loader
  --metabrowse  <bool>
        Start a metabrowse server for go to source navigation (linked from Jupyter inspections)
  --trap-output  <bool>
        Trap what user code sends to stdout and stderr
  --disable-cache  <bool>
        Disable ammonite compilation cache

Configuring (Optional)

Jupyter can be configured in many ways. Here is how to set an access token / password.

Create a default configuration file:

Shell

$ jupyter notebook --generate-config
Writing default config to: /home/mslinn/.jupyter/jupyter_notebook_config.py

Generate a password for Jupyter:

Shell

$ python
Python 2.7.12+ (default, Sep 17 2016, 12:08:02)
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> {% noselect scala> from notebook.auth import passwd; passwd()
Enter password:
Verify password:
’sha1:42728e3a3ac3:b77ba8516cefc2d6c33gswra4b14b5b1657f88ee’%}

Edit the configuration file you created above (~/.jupyter/jupyter_notebook_config.py) and save the password you generated in line 196.
Make any other changes you desire to the configuration file.
Save the configuration file.

Running a Command-Line Console

Run the command-line console this way. Notice that the Compiling messages make it unclear that the program is waiting for you to type something. I started by typing 1+1.

Scala REPL

$ scala> jupyter console --kernel scala scala> scala2.13.0
Compiling (synthetic)/ammonite/predef/interpBridge.sc
Jupyter console 5.2.0

Almond 0.7.0
Ammonite 1.6.9-15-6720d42
Scala library version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.
Java 11.0.3

In [1]: Compiling (synthetic)/ammonite/predef/replBridge.sc
Compiling (synthetic)/ammonite/predef/kernelBridge.sc
Compiling (synthetic)/ammonite/predef/defaultPredef.sc

You might need to press Enter now in order to see a prompt.

Jupyter REPL

In [1]: 1+1
Out[1]: res0: Int = 2

In [2]: val x = 3
Out[2]: x: Int = 3

In [3]: x*x
Out[3]: res2: Int = 9

In [4]: ^D
Do you really want to exit ([y]/n)? y
Shutting down kernel

Running the Web-Based Notebook

The complete documentation is here. You can view the notebook help by using the -h switch.

Shell

$ jupyter notebook -h
The Jupyter HTML Notebook.

This launches a Tornado based HTML Notebook Server that serves up an
HTML5/Javascript Notebook client.

Subcommands
-----------

Subcommands are launched as `jupyter-notebook cmd [args]`.
For information on
using subcommand ’cmd’, do: `jupyter-notebook cmd -h`.

list
    List currently running notebook servers.
stop
    Stop currently running notebook server for a given port
password
    Set a password for the notebook server.

Options
-------

Arguments that take values are actually convenience aliases to full
Configurables, whose aliases are listed on the help line.
For more information
on full configurables, see ’--help-all’.

--debug
    set log level to logging.DEBUG (maximize logging output)
--generate-config
    generate default config file
-y
    Answer yes to any questions instead of prompting.
--no-browser
    Don’t open the notebook in a browser after startup.
--pylab
    DISABLED: use %pylab or %matplotlib in the notebook to enable matplotlib.
--no-mathjax
    Disable MathJax

    MathJax is the javascript library Jupyter uses to render math/LaTeX.
It is
    very large, so you may want to disable it if you have a slow internet
    connection, or for offline use of the notebook.

    When disabled, equations etc.
will appear as their untransformed TeX source.
--allow-root
    Allow the notebook to be run from root user.
--script
    DEPRECATED, IGNORED
--no-script
    DEPRECATED, IGNORED
--log-level=<Enum> (Application.log_level)
    Default: 30
    Choices: (0, 10, 20, 30, 40, 50, ’DEBUG’, ’INFO’, ’WARN’, ’ERROR’, ’CRITICAL’)
    Set the log level by value or name.
--config=<Unicode> (JupyterApp.config_file)
    Default: ’’
    Full path of a config file.
--ip=<Unicode> (NotebookApp.ip)
    Default: ’localhost’
    The IP address the notebook server will listen on.
--port=<Int> (NotebookApp.port)
    Default: 8888
    The port the notebook server will listen on.
--port-retries=<Int> (NotebookApp.port_retries)
    Default: 50
    The number of additional ports to try if the specified port is not
    available.
--transport=<CaselessStrEnum> (KernelManager.transport)
    Default: ’tcp’
    Choices: [’tcp’, ’ipc’]
--keyfile=<Unicode> (NotebookApp.keyfile)
    Default: ’’
    The full path to a private key file for usage with SSL/TLS.
--certfile=<Unicode> (NotebookApp.certfile)
    Default: ’’
    The full path to an SSL/TLS certificate file.
--client-ca=<Unicode> (NotebookApp.client_ca)
    Default: ’’
    The full path to a certificate authority certificate for SSL/TLS client
    authentication.
--notebook-dir=<Unicode> (NotebookApp.notebook_dir)
    Default: ’’
    The directory to use for notebooks and kernels.
--browser=<Unicode> (NotebookApp.browser)
    Default: ’’
    Specify what command to use to invoke a web browser when opening the
    notebook.
If not specified, the default browser will be determined by the
    `webbrowser` standard library module, which allows setting of the BROWSER
    environment variable to override it.
--pylab=<Unicode> (NotebookApp.pylab)
    Default: ’disabled’
    DISABLED: use %pylab or %matplotlib in the notebook to enable matplotlib.
--gateway-url=<Unicode> (GatewayClient.url)
    Default: None
    The url of the Kernel or Enterprise Gateway server where kernel
    specifications are defined and kernel management takes place.
If defined,
    this Notebook server acts as a proxy for all kernel management and kernel
    specification retrieval.
  (JUPYTER_GATEWAY_URL env var)

To see all available configurables, use `--help-all`

Examples
--------

    jupyter notebook                       # start the notebook
    jupyter notebook --certfile=mycert.pem # use SSL/TLS certificate
    jupyter notebook password              # enter a password to protect the server

Mac and Ubuntu

Run the web-based notebook by typing:

Shell

$ jupyter notebook
[I 20:14:42.225 NotebookApp] The port 8888 is already in use, trying another port.
[I 20:14:42.229 NotebookApp] Serving notebooks from local directory: /var/work/course_scala_intro_code
[I 20:14:42.229 NotebookApp] 0 active kernels
[I 20:14:42.230 NotebookApp] The Jupyter Notebook is running at: https://localhost:8889/?token=55c770ee6d0b0310979bd0d66681d0d640d88c7f34523def
[I 20:14:42.230 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Created new window in existing browser session.

Use the --port option to run Jupyter Notebook on another port. For example, here is how to run it on port 9000:

Shell

$ jupyter notebook --port 9000

Here is the script that I use to launch jupyter, called runJupyter. It filters out warning messages that I do not care about.

runJupyter

#!/bin/bash

NOTEBOOK_DIR="$HOME/.jupyterNotebooks"
mkdir -p "$NOTEBOOK_DIR"
jupyter notebook --notebook-dir "$NOTEBOOK_DIR" |& \
  grep -v "Xlib:  extension" |& \
grep -v "browser_gpu_channel_host_factory.cc" &

Windows

If you try to run Jupyter notebook the same way as for Mac and Ubuntu, the text-based Lynx web browser starts up. This is horrible. You need a Windows browser instead. To launch a Windows program from WSL,.

Shell

$ jupyter notebook --no-browser
[I 16:52:20.884 NotebookApp] Serving notebooks from local directory: /home/mslinn
[I 16:52:20.884 NotebookApp] 0 active kernels
[I 16:52:20.885 NotebookApp] The Jupyter Notebook is running at: https://localhost:8888/?token=0954635c4075f23bcb0da7f12b05310130dc7f668ed6ff07
[I 16:52:20.886 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Working with Almond

I created a notebook file called Test.ipynb in the root of this course’s git project. If you launch Jupyter Notebook from that directory a new web browser window should open, showing a directory. If it does not open when running Ubuntu, right-click on the URL in the console and select Open URL from the context menu. Load the notebook in the web browser by double-clicking on it in the directory listing.

The notebook has a few cells which demonstrates that the Ammonite REPL works in the web browser in a similar fashion to how it works on the console. You can edit the contents of text cells by double-clicking on them.

Each cell can contain Scala code, Python code, or comments. There is a pull-down menu in the toolbar that allows you to set the type of contents for each cell. Use the Enter key to create a new line in a cell, and execute the contents of the cell with Ctrl-Enter.

Example: No Dependencies

Here is a method definition and usage.

Scala code

def repeat(x: String, n: Int) = x*n
val x = repeat("oink ", 3)

Here is how the output looks in Jupyter:

Output

defined function repeat
x: String = "oink oink oink "

Adding Dependencies

We learned how to add dependencies to an Ammonite script in the Ammonite lecture. Almond is built upon Ammonite, so the syntax is the same. Other syntaxes are supported, and you may prefer them.

Example: NScala-Time

Scala code

import $ivy.`com.github.nscala-time::nscala-time:2.14.0`
import com.github.nscala_time.time.Imports._
DateTime.now + 2.monthsCtrl-Enter
Downloading {% href https://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom 
Downloading https://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom.sha1
Downloaded https://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom.sha1
Downloaded https://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom
Downloading https://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom
Downloading https://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom.sha1
Downloading https://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom.sha1
Downloaded https://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom
Downloaded https://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom.sha1
Downloaded https://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom.sha1
Downloaded https://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom

res0_2: org.joda.time.DateTime = 2017-02-13T15:04:31.465-08:00 %}

Example: Apache Commons IO

Scala code

import $ivy.`commons-io:commons-io:2.5`
import org.apache.commons.io.FileUtils
import java.io.File
val str = FileUtils.readFileToString(new File("/etc/passwd"), "UTF-8")
import $ivy.$

  import org.apache.commons.io.FileUtils

  str: String = """
  root:x:0:0:root:/root:/bin/bash
  daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
  bin:x:2:2:bin:/bin:/usr/sbin/nologin
  sys:x:3:3:sys:/dev:/usr/sbin/nologin
  sync:x:4:65534:sync:/bin:/bin/sync
  games:x:5:60:games:/usr/games:/usr/sbin/nologin
  man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
  lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
  mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
  news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
  uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin"""

Once again, once we press Ctrl-Enter, the dependencies are resolved and downloaded prior to executing the code.

Example: Apache Spark

Put this into a cell to define a Spark Context called sc.

Any subsequent cells that you define can reference the Spark context.

Next the notebook has a big code cell that creates a Spark context called sc.

Previous lecture: Ammonite

Next lecture: tmux

© Copyright 1994-2024 Michael Slinn. All rights reserved.
If you would like to request to use this copyright-protected work in any manner, please send an email.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.