A guide to using NYU's high performance computing (for NYU students/staff)
There are two main ways of transferring data to and from HPC. The first is transferring data to and from your computer, using scp
(secure copy), which works over ssh
. The second method is by transferring to and from Google Drive (or other cloud storage solutions) – which can be handy when you have limited space available on your computer.
The first step is to create an SSH tunnel. You only need to do this once
On your computer, check if you have an .ssh
directory already
ls ~/.ssh
If you see some filenames printed, skip to the next step.
Else, if this returns no such file or directory
,
create the directory like so mkdir ~/.ssh
,
and then set the permissions - chmod 700 ~/.ssh
,
and create a new file called config : touch ~/.ssh/config
Open the config
file in a text editor (for eg, vim ~/.ssh/config
) and add this, replacing NETID
with your NYU Net ID:
# first we create the tunnel, with instructions to pass incoming
# packets on ports 8024, 8025 and 8026 through it and to specific
# locations
Host hpcgwtunnel
HostName gw.hpc.nyu.edu
ForwardX11 no
LocalForward 8026 prince.hpc.nyu.edu:22
User NETID
# next we create an alias for incoming packets on the port. The
# alias corresponds to where the tunnel forwards these packets
Host prince
HostName localhost
Port 8026
ForwardX11 yes
User NETID
**Transferring **
Once this is done, you’re ready to transfer files between your computer and HPC.
Open a terminal window and create a tunnel
ssh hpcgwtunnel
Open a new terminal window to transfer files
The general format of the command to transfer files is
scp {SOURCE PATH} {DESTINATION PATH}
Let’s assume you have a file called myfile.txt
on your Desktop (on a mac), that you want to transfer to the scratch
file system on HPC, in a folder you called mydata
. The command will be (replace NETID
with your NYU NetID) –
scp ~/Desktop/myfile.txt NETID@prince:/scratch/NETID/mydata/
If you wanted to transfer the file back to your Desktop,
scp NETID@prince:/scratch/NETID/mydata/myfile.txt ~/Desktop/
If you want to transfer a directory to your mydata
folder on HPC, you need to include the -r
option (for recursive
), like so:
scp -r ~/Desktop/myfolder NETID@prince:/scratch/NETID/mydata
Sometimes its easier to store your files on Google Drive, since there is large amounts of space available on Drive with NYU accounts. You could collect the dataset on your computer and then store it on Drive to free up space on your machine.
In order to get this working, you need to configure HPC to communicate with your Drive account ( you only need to do this once )
Log into HPC
Instead of scp
, we will use rclone
. We can install this like so (more on modules in the XYZ file)
module load rclone/1.38
Start the configuration tool
rclone config
You will see this output:
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q>
Type in n
to create a New remote, which should prompt you for a name. You can name your remote whatever you like (as long as you remember it!) (mygoogledrive
in this case)
name> mygoogledrive
You will see a whole bunch of options, with numbers next to them:
Type of storage to configure.
Choose a number from below, or type in your own value
1 / Amazon Drive
\ "amazon cloud drive"
2 / Amazon S3 (also Dreamhost, Ceph, Minio)
\ "s3"
3 / Backblaze B2
\ "b2"
4 / Box
\ "box"
5 / Dropbox
\ "dropbox"
6 / Encrypt/Decrypt a remote
\ "crypt"
7 / FTP Connection
\ "ftp"
8 / Google Cloud Storage (this is not Google Drive)
\ "google cloud storage"
9 / Google Drive
\ "drive"
Storage>
Type in the number corresponding to Google Drive
Storage> 9
You can leave the next two prompts blank (for client_id
and client_secret
, and just hit enter
Google Application Client Id - leave blank normally.
client_id>
Google Application Client Secret - leave blank normally.
client_secret>
The tool will now ask you if you want to use auto config
, select n
since you are working on a remote or headless machine
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n
The tool will now spit out a long URL at you – your browser may or may not open automatically. If it doesn’t, navigate to the link on your browser, and click on Allow Access
Paste the code back in the terminal
Enter verification code> YOURCODE
The tool will ask about configuring this as a team drive, you can say no
Configure this as a team drive?
y) Yes
n) No
y/n> n
It will then show you some details (the name you selected and a token), and you can confirm this is okay
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
You’re done with the setup! Quit the config tool
q) Quit config
e/n/d/r/c/s/q> q
Transferring
To transfer with rclone
, you must load the module first. (If you’re doing this right after the setup, you already loaded the module)
module load rclone/1.38
The format for transferring with rclone
is similar to scp
–
rclone copy {SOURCE_PATH} {DEST_PATH}
Create a folder on your Google Drive to transfer the file to, for eg hpc_uploads
To transfer a file (myfile.txt
) in your $HOME
directory on HPC to Google drive:
rclone copy /home/NETID/myfile.txt mygoogledrive:hpc_uploads
Replace NETID
with your NYU Net ID, and replace mygoogledrive
with the name you decided on in step 4 during the setup phase.
To transfer a directory, one does not need to do anything different –
rclone copy /home/NETID/mydirectory mygoogledrive:hpc_uploads
To download file from Google drive to HPC, we just change the source and destination paths:
rclone copy mygoogledrive:hpc_uploads/mydata /scratch/NETID/mydata