Connecting to the cluster
SSH
SSH (for Secure SHell) is a network protocol, that is a procedure enabling computers to communicate between them. Its purpose is to let a user on a machine open a shell/terminal on another machine. If the program called ssh
is installed on your machine you can use it to connect to machines of the cluster by running the command
where <address>
needs to be replaced by the address of the machine you want to connect to.
Which machines should I connect to?
There are two main entry points:
pbil-deb
is the submission node. If you want to run a program on the cluster, you need to use a system called SLURM and that can only be done frompbil-deb
. Note that you can connect topbil-deb
only from the University network because the machine is not reachable (= visible) from outside.pbil-gates
is the only machine that can be accessed from outside via SSH. If you're somewhere outside the University network (that includes WIFI or phone connections, even you're physically located at the University) you can connect topbil-gates
and then from this machine connect topbil-deb
(see below for a convenient way to do so).pbil-gates
is NOT meant to run calculations, it serves only as an entry point for the cluster from outside.
From the University network, you can connect to pbil-deb
by running:
If you're outside first connect to pbil-gates
:
and from pbil-gates
connect to pbil-deb
as above.
The principle is to connect to pbil-deb
(the submission or frontend node) to submit jobs with Slurm. These jobs will be run on one or some of the compute nodes on the cluster.
Warning
Do not run intensive computing tasks on pbil-deb
. The submission node should only be used for light data and project management tasks.
Key based authentication
By default you can connect to pbil-deb
and pbil-gates
with the login and password given to you when your account has been created.
It is possible to enable a more practical key-based authentication if you have already created an SSH key pair.
To enable key-based authentication on pbil-deb
, you can do:
Same thing on pbil-gates
:
Note
Using ssh-copy-id
is equivalent to copy-pasting you public key into ~/.ssh/authorized_keys
on the remote machine.
Easier connection
Here is an example configuration you can append to your ~/.ssh/config
on your personal computer (replace <user>
with your cluster user name):
Host pbil
User <user>
HostName pbil-deb.univ-lyon1.fr
Host pbil-gates
User <user>
HostName pbil-gates.univ-lyon1.fr
Host pbil-ext
User <user>
HostName pbil-deb.univ-lyon1.fr
ProxyJump pbil-gates
With this configuration you can connect to a front node with ssh pbil
if you are on the university network, or ssh pbil-ext
if you are on another network and must connect via pbil-gates
.
VSCode
It is possible to connect to the cluster frontend directly from inside Visual Studio Code by using its remote development feature.
- First, be sure you have a working SSH access to
pbil-deb
. For easier setup, you can addpbil-deb
in your~/.ssh/config
with something like:
- It is also highly recommended to setup key based authentication to avoid entering your password multiple times.
- After that you have to install the Remote SSH VSCode extension.
Once everything is setup, you can run the command Remote-SSH: Connect to host...
in VSCode and either select pbil
if it appears in the list, or choose Add New SSH Host
. After that, VSCode should open a new window directly connected to pbil
(it should show something like SSH: pbil
in the bottom left corner of the window). The file explorer will then show the remote files, and if you open a terminal it will run on the remote machine.
Warning
If you use this method to work on the cluster, it is highly recommended to regularly backup your files locally on your machine.
For more details see the Remote development using SSH VSCode documentation.
Transferring data
To transfer data to/from the cluster, you can use scp
or rsync
.
Danger
scp
and rsync
will replace or modify existing files without warnings, so be careful to avoid any data loss.
scp
scp
is the equivalent of the cp
command, but allows to copy files and folders to a remote machine via SSH.
Note
The -r
argument allows to recursively copy folders and their content.
# Copy a folder to pbil-deb
scp -r project_folder login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/
# Copy everything fom the current directory to pbil-deb
scp -r * . login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/project_folder/
# Copy back data from pbil-deb to the current directory
scp -r login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/project_folder/out/* .
rsync
rsync
is more efficient and versatile than scp
. In particular, it only copies differences between the source and the destination data, minimizing transfers.
Here are some sample usage:
# Synchronize a local folder to a remote one
rsync -Pavz project_folder login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/
# Synchronize every files in the current directory with a remote one
rsync -Pavz * login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/project_folder/
# Synchronize a local folder to a remote one, excluding some subfolders
rsync -Pavz project_folder --exclude .git --exclude out login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/
One way to work with rsync
is to create two shell scripts, one to synchronize code between local and remote (excluding results), and another one to synchronize results from remote to local. Something like the following:
Tip
One thing to know about rsync
is the difference of behavior wether you append a trailing slash to the source folder or not:
- without a trailing slash, the folder itself is synced
- with a trailing slash, the content of the folder is synced
So for example:
# The following will create a "project" folder inside "/beegfs/data/login/myproject" and sync
# its content
rsync project login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/myproject/
# The following will sync the content of local "project" folder with remote
# "/beegfs/data/login/myproject" folder
rsync project/ login@pbil-deb.univ-lyon1.fr:/beegfs/data/login/myproject/