Published 2013-12-01.
Last modified 2019-08-12.
Time to read: 5 minutes.
Jupyter is a novel way to combine documentation with live code, which might run on powerful distributed systems like Apache Spark, Flink and Scalding. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. This open source project supports interactive data science and scientific computing with over 40 programming languages. Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer. Although Jupyter has its roots in big data, Jupyter is generally useful for all computing needs.
This lecture starts with instructions for installing Jupyter on Mac and Ubuntu. This lecture then demonstrates how to install and work with the Almond Scala kernel for Jupyter, so students can use Scala with the Jupyter Notebook as well as the more traditional console REPL. The information provided is also largely applicable to JupyterHub, a multi-user server for Jupyter notebooks.
The Almond Scala kernel is useful for experimenting with code examples, documenting them and sharing them. This kernel provides support for various big data frameworks to be added on demand to Jupyter. From the docs.
- all the Ammonite niceties
- an API that libraries can rely on to interact with Jupyter front-ends
- extensible plotting support
- extensible support for big data libraries
- Spark support, relying on ammonite-spark, extended to get progress bars among others.
It also provides libraries allowing one to write custom Jupyter kernels in Scala.
Almond is not tied to a specific version of big data frameworks. On 2019-08-13 Almond was available for Scala 2.11, 2.12 and 2.13; Scala 2.11 support will be dropped soon. Several versions of Almond can be installed side-by-side, so one notebook can support a certain version of Spark, and another notebook running simultaneously can support another version of Spark.
In addition to supporting Jupyter’s browser-based Scala REPL, Almond also supports the more traditional Scala console REPL Ammonite, which we discussed in the Ammonite lecture.
There are several other notebook user interfaces and Jupyter kernels that support Scala. This lecture does not discuss them beyond just mentioning them here and providing links.
- Apache Zeppelin, a JVM-based alternative to Jupyter, with some support for Spark, Flink, and Scalding.
- Apache Toree (incubating, a Jupyter kernel with Spark support.
- IScala and ISpark, which added some Spark support. These projects have not been updated in several years and are should be considered dead.
- Bridgewater scala-notebook (this project also seems dead) and spark-notebook, which updated portions and added Spark support.
BTW, nbviewer.jupyter.org
is a popular sharing mechanism for Jupyter code examples.
October 25, 2019: Polynote was just released. If you like Jupyter, you’ll love Polynote.
Installation
Jupyter requires Python. Once Python and Jupyter are installed, we will install Scala support for Jupyter.
Mac and Linux computers normally have Python installed, and Python is now available from the Windows store for free.
If for some reason you find pip
is not installed,
here
are the official instructions to install pip
.
For WSL, type.
$ sudo apt install python3-pip
Windows Subsystem for Linux (WSL)
Jupyter needs the gcc
compiler, the ZeroMQ development package and the Python development package.
Install all of these dependencies by typing.
$ yes | sudo apt install build-essential python3-dev python3-pip libzmq3-dev
Installing Jupyter
This is easy.
Mac
For Mac with Python 2.7, type:
$ pip install jupyter
For Python 3.x on Mac, type:
$ pip3 install jupyter
Ubuntu and Windows Subsystem for Linux
If you installed Python 2.7, type:
$ yes | sudo -H pip install jupyter
If you installed Python 3.x, type:
$ yes | sudo -H pip3 install jupyter
All OSes
Check the version of Jupyter that was just installed:
$ jupyter --version jupyter core : 4.5.0 jupyter-notebook : 6.0.0 qtconsole : 4.5.2 ipython : 7.7.0 ipykernel : 5.1.2 jupyter client : 5.3.1 jupyter lab : not installed nbconvert : 5.6.0 ipywidgets : 7.5.1 nbformat : 4.4.0 traitlets : 4.3.2
If the jupyter core version is less than 4.0, upgrade Jupyter.
Upgrading Jupyter
For Mac, upgrade by typing:
$ pip install --upgrade jupyter
For Ubuntu and Windows Subsystem for Linux, upgrade by typing:
$ yes | sudo -H pip install --upgrade jupyter
For any OS, if you need to rerun the installation, use the -I
switch.
$ sudo -H pip install -I jupyter
Installing Coursier
The Almond documentation says that Coursier must be installed. I suspect that Coursier gets installed automatically along with Almond, but have not been able to verify that yet. Coursier was discussed in the SBT Global Setup lecture. To install onto Linux and Windows Subsystem for Linux, follow the instructions in that lecture’s transcript.
For Mac, type:
$ brew install --HEAD paulp/extras/coursier
Installing Almond
It is simple to build the Almond kernel using the following bash script,
which is provided in the git repository for this course as buildAlmond
.
I was able to install this on Mac and Ubuntu without any problem.
For Windows 10 I used the Windows Subsystem for Linux to run the script.
Let’s look at the buildAlmond
help.
$ ./buildAlmond -h buildAlmond - Build Almond installer and execute it, then deletes installer and lists the Jupyter kernels. almond is a Scala kernel for Jupyter. See https://almond.sh/docs Options: -a Specify Almond version (default is Almond 0.4.0) -d Debug mode -f Force overwrite of previously built kernel of the same name -h Show help message -s Specify Scala version (default is Scala 2.12.9)
Support Matrix
The matrix of available combinations of Scala version (-s
switch) and
Almond version (-a
switch) is sparse.
Here are the latest versions of each valid combination as of August 13, 2019.
Scala Version (-s )
| Almond Version (-a )
|
---|---|
2.11.12 | 0.6.0 |
2.12.9 | 0.4.0 |
2.13.0 | 0.7.0 |
Installing Kernels
$ buildAlmond -f # Overwrite any previous version of the same name Building Almond 0.4.0 for Scala 2.12.9 Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.12.9
$ buildAlmond -s 2.11.12 -a 0.4.0 Building Almond 0.4.0 for Scala 2.11.12 Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.11.12
$ buildAlmond -s 2.13.0 -a 0.7.0 Building Almond 0.7.0 for Scala 2.13.0 Installed scala kernel under /home/mslinn/.local/share/jupyter/kernels/scala2.13.0
For Mac and Ubuntu and Windows Subsystem for Linux, verify that a Jupyter kernels for Python and Scala are available now.
$ jupyter kernelspec list Available kernels: scala2.11.12 /home/mslinn/.local/share/jupyter/kernels/scala2.11.12 scala2.12.8 /home/mslinn/.local/share/jupyter/kernels/scala2.12.8 scala2.13.0 /home/mslinn/.local/share/jupyter/kernels/scala2.13.0 python2 /usr/local/share/jupyter/kernels/python2 python3 /usr/local/share/jupyter/kernels/python3
$ almond --help Usage: almond [options] --usage <bool> Print usage and exit --help | -h <bool> Print help message and exit --install <bool> --force <bool> erase any previously existing kernel with the same id --id <string?> id for this kernel, instead of the default one --display-name <string?> name for this kernel, instead of the default one --global <bool> whether to install this kernel globally --jupyter-path <string?> --logo <string?> path to a 64x64 PNG logo for this kernel --arg <string*> command to launch this kernel, specified argument per argument, like --arg /foo --arg some-arg --command <string?> command to launch this kernel, as one block (then split, takes precedence over --arg) --interrupt-via-message <bool> whether to request frontends to interrupt this kernel via a message --copy-launcher <bool?> Whether to copy the kernel launcher in the kernelspec directory (default: false if --arg or --command specified, true else) --extra-repository <string*> --banner <string?> --link <string*> --predef-code <string> --predef <string*> --auto-dependency <string*> --force-property <string*> Force Maven properties during dependency resolution --profile <string*> Enable Maven profile (start with ! to disable) --log <string> Log level (one of none, error, warn, info, debug) --log-to </path/to/log-file> Send log to a file rather than stderr --connection-file <string?> --specific-loader <bool> Use class loader that loaded the api module rather than the context class loader --metabrowse <bool> Start a metabrowse server for go to source navigation (linked from Jupyter inspections) --trap-output <bool> Trap what user code sends to stdout and stderr --disable-cache <bool> Disable ammonite compilation cache
Configuring (Optional)
Jupyter can be configured in many ways. Here is how to set an access token / password.
-
Create a default configuration file:
Shell$ jupyter notebook --generate-config Writing default config to: /home/mslinn/.jupyter/jupyter_notebook_config.py
-
Generate a password for Jupyter:
Shell
$ python Python 2.7.12+ (default, Sep 17 2016, 12:08:02) [GCC 6.2.0 20160914] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> {% noselect scala> from notebook.auth import passwd; passwd() Enter password: Verify password: ’sha1:42728e3a3ac3:b77ba8516cefc2d6c33gswra4b14b5b1657f88ee’%}
-
Edit the configuration file you created above (
~/.jupyter/jupyter_notebook_config.py
) and save the password you generated in line 196. - Make any other changes you desire to the configuration file.
- Save the configuration file.
Running a Command-Line Console
Run the command-line console this way.
Notice that the Compiling
messages make it unclear that the program is waiting for you to type something.
I started by typing 1+1
.
$ scala> jupyter console --kernel scala scala> scala2.13.0 Compiling (synthetic)/ammonite/predef/interpBridge.sc Jupyter console 5.2.0
Almond 0.7.0 Ammonite 1.6.9-15-6720d42 Scala library version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc. Java 11.0.3
In [1]: Compiling (synthetic)/ammonite/predef/replBridge.sc Compiling (synthetic)/ammonite/predef/kernelBridge.sc Compiling (synthetic)/ammonite/predef/defaultPredef.sc
You might need to press Enter now in order to see a prompt.
In [1]: 1+1 Out[1]: res0: Int = 2
In [2]: val x = 3 Out[2]: x: Int = 3
In [3]: x*x Out[3]: res2: Int = 9
In [4]: ^D Do you really want to exit ([y]/n)? y Shutting down kernel
Running the Web-Based Notebook
The complete documentation is here.
You can view the notebook help by using the -h
switch.
$ jupyter notebook -h The Jupyter HTML Notebook.
This launches a Tornado based HTML Notebook Server that serves up an HTML5/Javascript Notebook client.
Subcommands -----------
Subcommands are launched as `jupyter-notebook cmd [args]`. For information on using subcommand ’cmd’, do: `jupyter-notebook cmd -h`.
list List currently running notebook servers. stop Stop currently running notebook server for a given port password Set a password for the notebook server.
Options -------
Arguments that take values are actually convenience aliases to full Configurables, whose aliases are listed on the help line. For more information on full configurables, see ’--help-all’.
--debug set log level to logging.DEBUG (maximize logging output) --generate-config generate default config file -y Answer yes to any questions instead of prompting. --no-browser Don’t open the notebook in a browser after startup. --pylab DISABLED: use %pylab or %matplotlib in the notebook to enable matplotlib. --no-mathjax Disable MathJax
MathJax is the javascript library Jupyter uses to render math/LaTeX. It is very large, so you may want to disable it if you have a slow internet connection, or for offline use of the notebook.
When disabled, equations etc. will appear as their untransformed TeX source. --allow-root Allow the notebook to be run from root user. --script DEPRECATED, IGNORED --no-script DEPRECATED, IGNORED --log-level=<Enum> (Application.log_level) Default: 30 Choices: (0, 10, 20, 30, 40, 50, ’DEBUG’, ’INFO’, ’WARN’, ’ERROR’, ’CRITICAL’) Set the log level by value or name. --config=<Unicode> (JupyterApp.config_file) Default: ’’ Full path of a config file. --ip=<Unicode> (NotebookApp.ip) Default: ’localhost’ The IP address the notebook server will listen on. --port=<Int> (NotebookApp.port) Default: 8888 The port the notebook server will listen on. --port-retries=<Int> (NotebookApp.port_retries) Default: 50 The number of additional ports to try if the specified port is not available. --transport=<CaselessStrEnum> (KernelManager.transport) Default: ’tcp’ Choices: [’tcp’, ’ipc’] --keyfile=<Unicode> (NotebookApp.keyfile) Default: ’’ The full path to a private key file for usage with SSL/TLS. --certfile=<Unicode> (NotebookApp.certfile) Default: ’’ The full path to an SSL/TLS certificate file. --client-ca=<Unicode> (NotebookApp.client_ca) Default: ’’ The full path to a certificate authority certificate for SSL/TLS client authentication. --notebook-dir=<Unicode> (NotebookApp.notebook_dir) Default: ’’ The directory to use for notebooks and kernels. --browser=<Unicode> (NotebookApp.browser) Default: ’’ Specify what command to use to invoke a web browser when opening the notebook. If not specified, the default browser will be determined by the `webbrowser` standard library module, which allows setting of the BROWSER environment variable to override it. --pylab=<Unicode> (NotebookApp.pylab) Default: ’disabled’ DISABLED: use %pylab or %matplotlib in the notebook to enable matplotlib. --gateway-url=<Unicode> (GatewayClient.url) Default: None The url of the Kernel or Enterprise Gateway server where kernel specifications are defined and kernel management takes place. If defined, this Notebook server acts as a proxy for all kernel management and kernel specification retrieval. (JUPYTER_GATEWAY_URL env var)
To see all available configurables, use `--help-all`
Examples --------
jupyter notebook # start the notebook jupyter notebook --certfile=mycert.pem # use SSL/TLS certificate jupyter notebook password # enter a password to protect the server
Mac and Ubuntu
Run the web-based notebook by typing:
$ jupyter notebook [I 20:14:42.225 NotebookApp] The port 8888 is already in use, trying another port. [I 20:14:42.229 NotebookApp] Serving notebooks from local directory: /var/work/course_scala_intro_code [I 20:14:42.229 NotebookApp] 0 active kernels [I 20:14:42.230 NotebookApp] The Jupyter Notebook is running at: https://localhost:8889/?token=55c770ee6d0b0310979bd0d66681d0d640d88c7f34523def [I 20:14:42.230 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). Created new window in existing browser session.
Use the --port
option to run Jupyter Notebook on another port.
For example, here is how to run it on port 9000:
$ jupyter notebook --port 9000
Here is the script that I use to launch jupyter, called runJupyter
.
It filters out warning messages that I do not care about.
#!/bin/bash NOTEBOOK_DIR="$HOME/.jupyterNotebooks" mkdir -p "$NOTEBOOK_DIR" jupyter notebook --notebook-dir "$NOTEBOOK_DIR" |& \ grep -v "Xlib: extension" |& \ grep -v "browser_gpu_channel_host_factory.cc" &
Windows
If you try to run Jupyter notebook the same way as for Mac and Ubuntu, the text-based Lynx web browser starts up. This is horrible. You need a Windows browser instead. To launch a Windows program from WSL,.
$ jupyter notebook --no-browser [I 16:52:20.884 NotebookApp] Serving notebooks from local directory: /home/mslinn [I 16:52:20.884 NotebookApp] 0 active kernels [I 16:52:20.885 NotebookApp] The Jupyter Notebook is running at: https://localhost:8888/?token=0954635c4075f23bcb0da7f12b05310130dc7f668ed6ff07 [I 16:52:20.886 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Working with Almond
I created a notebook file called Test.ipynb
in the root of this course’s git project.
If you launch Jupyter Notebook from that directory a new web browser window should open, showing a directory.
If it does not open when running Ubuntu, right-click on the URL in the console and select Open URL from the context menu.
Load the notebook in the web browser by double-clicking on it in the directory listing.
The notebook has a few cells which demonstrates that the Ammonite REPL works in the web browser in a similar fashion to how it works on the console. You can edit the contents of text cells by double-clicking on them.
Each cell can contain Scala code, Python code, or comments. There is a pull-down menu in the toolbar that allows you to set the type of contents for each cell. Use the Enter key to create a new line in a cell, and execute the contents of the cell with Ctrl-Enter.
Example: No Dependencies
Here is a method definition and usage.
def repeat(x: String, n: Int) = x*n val x = repeat("oink ", 3)
Here is how the output looks in Jupyter:
defined function repeat x: String = "oink oink oink "
Adding Dependencies
We learned how to add dependencies to an Ammonite script in the Ammonite lecture. Almond is built upon Ammonite, so the syntax is the same. Other syntaxes are supported, and you may prefer them.
Example: NScala-Time
import $ivy.`com.github.nscala-time::nscala-time:2.14.0` import com.github.nscala_time.time.Imports._ DateTime.now + 2.monthsCtrl-Enter Downloading {% href https://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom Downloadinghttps://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom.sha1
Downloadedhttps://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom.sha1
Downloadedhttps://repo1.maven.org/maven2/com/github/nscala-time/nscala-time_2.11/2.14.0/nscala-time_2.11-2.14.0.pom
Downloadinghttps://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom
Downloadinghttps://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom.sha1
Downloadinghttps://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom.sha1
Downloadedhttps://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom
Downloadedhttps://repo1.maven.org/maven2/org/joda/joda-convert/1.2/joda-convert-1.2.pom.sha1
Downloadedhttps://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom.sha1
Downloadedhttps://repo1.maven.org/maven2/joda-time/joda-time/2.9.4/joda-time-2.9.4.pom
res0_2: org.joda.time.DateTime = 2017-02-13T15:04:31.465-08:00 %}
Example: Apache Commons IO
import $ivy.`commons-io:commons-io:2.5`
import org.apache.commons.io.FileUtils
import java.io.File
val str = FileUtils.readFileToString(new File("/etc/passwd"), "UTF-8")
import $ivy.$
import org.apache.commons.io.FileUtils
str: String = """
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin"""
Once again, once we press Ctrl-Enter, the dependencies are resolved and downloaded prior to executing the code.
Example: Apache Spark
Put this into a cell to define a Spark Context called sc
.
Any subsequent cells that you define can reference the Spark context.
Next the notebook has a big code cell that creates a Spark context called sc
.
© Copyright 1994-2024 Michael Slinn. All rights reserved.
If you would like to request to use this copyright-protected work in any manner,
please send an email.
This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.