Main content

Compute Canada

Menu

Loading wiki pages...

View
Wiki Version:
Compute Canada ---------- ---------- Compute Canada is Canada's national high-performance compute (HPC) system. The system gives users access to both storage and compute resources to make running large data analyses more efficient than on a standard desktop computer. Users can access Compute Canada from any computer with an active internet connection. The following information walks users through setting up a new account with Compute Canada and submitting jobs to ```graham```, the national system hosted by SHARCNET (compute consortium headquartered at Western University). There is also information on how existing Compute Canada users can request a role on another allocation. @[toc] ## Setting up an account All non-Faculty user accounts must be approved by a Sponsor (Faculty member) who already has at least a default Compute Canada resource allocation. A resource allocation dictates how much storage and compute resources a Sponsor (and all sponsored users) have access to on a given Compute Canada national system. (Faculty members can apply for additional resources through Compute Canada's [Resources for Research Groups Competition][1].) Create an account at: [https://ccdb.computecanada.ca/security/login][2] **Compute Canada requires that you use your institutional (e.g., UWO) email address for accounts.** You will need to know the CCI (abc-123-01) of your Sponsor. If unsure about who to specify as your Sponsor, please contact Suzanne Witt (switt4 AT uwo DOT ca). The following page walks users through setting up a new Compute Canada account: [https://www.computecanada.ca/research-portal/account-management/apply-for-an-account/][3] Additional information on CCDB can be found here: [https://docs.computecanada.ca/wiki/Frequently_Asked_Questions_about_the_CCDB][4] As part of Western University, your account will fall under [SHARCNET][5]. SHARCNET is a consortium of 18 Ontario universities (headquartered at Western University) and currently runs the national system, ```graham```. *New users can register for a ['New User Seminar'][6] hosted by SHARCNET. This webinar covers a number of basic concepts about Compute Canada systems, focusing on graham, as well as more advanced topics like submitted parallel and array jobs.* **N.B.:** It is no longer required apply for a SHARCNET consortium account in order to access and use ```graham```. ---------- ## Existing users: Requesting role on another allocation Compute Canada only allows **one account per user**. If you already have an account sponsored by one Faculty member but would like to be added as a user on another Faculty's allocation, you can request access through CCBD. Log into CCBD with your existing Compute Canada username/password. From the 'My Account' pull-down menu, select 'Add Role'. Fill out the requested information, including the new Faculty's (Sponsor's) CCI. Once the new Faculty has approved your new role, you will receive an email from Compute Canada and can then access the new allocation from ```~/projects```. (Faculty who already have a Compute Canada allocation and wish to be added to another Faculty member's allocation should follow the same procedures as above but apply using a Sponsored User role such as 'Researcher'.) ---------- ## Logging in We are currently set up to assist users on the national system, ```graham```. Windows users will need to use some sort of SSH client, such as [PuTTY][7], pointing the SSH client to: ``` graham.computecanada.ca ``` Mac and Unix/Linux users should just be able to use a standard Terminal window to ssh in: ``` ssh <username>@graham.computecanada.ca ``` ---------- ## Standard account set up Each user's account is associated with a number of standard directories (folders). These directories can be accessed directly from the user's home directory. ### Home directory ``` ~/ ``` Each user is given home directory with a default maximum storage of 50G/500k files. This is an ideal place to store cloned git repositories, license files, scripts, etc. Due to the limited amount of storage, home directories should not really be used to store data or analysis results. ### Projects directory ``` ~/projects ``` The projects directory is where users can access their Sponsor's storage allocation. This storage allocation will either be named ```rrg-<SPONSORNAME>``` or ```def-<SPONSORNAME>``` depending on whether the Sponsor has been awarded additional storage during a resource allocation competition. All sponsored users should be able to read/write to the Sponsor's storage allocation. The default allocation (``` def- ```) size is 1T/500k files. Faculty who were awarded storage during a Compute Canada RRG competition can check the amount of storage in their allocation by entering the following at a command prompt: ``` diskusage_report ``` Each user will also have his/her own directory within the Sponsor's allocation. Unless otherwise specified, only the user has read/write access to this directory. ### Scratch directory ``` ~/scratch ``` The scratch directory is a large storage partition granted to each user for storing active and/or temporary files and analysis results. **Files and directories stored in scratch are deleted after 62 days of inactivity.** (Users will receive at least one email from Compute Canada alerting them prior to any files or directories being deleted.) The scratch directory size is 20T/1000k files. ### Nearline directory ``` ~/nearline ``` The nearline directory is set up in the same way as the projects directory. [Nearline storage][8] is a disk-tape hybrid filesystem that can virtualize files by moving them to a tape-based storage system using criteria such as age and size. Files in nearline can be retrieved again upon read or recall operations. Nearline storage is a way to manage infrequently-used files. Files stored in nearline do not count against a resource allocation's storage quota. Users should note that access to files in nearline will be slower, so do not use nearline for frequently accessed or active files. **N.B.:** Files stored in nearline cannot be accessed by the compute nodes. ### How to best make use of Compute Canada storage Compute Canada storage is allocated based both on file size and number of files. Exceeding the number of allowed files is equivalent to exceeding the number of terabytes in an allocation. Do **not** store larege numbers of files (e.g., DICOM files) in a Sponsor's allocated storage unless they have been compressed into a tarball/gzip file. Use ```~/scratch``` to store large numbers of uncompressed files, instead. Make sure that the group ownership of the file/directory is the same as the Sponsor's storage allocation. If the user is specified as both the owner and group, the file/directory will be counted against the user's much more limited storage allocation. ---------- ## Jobs ```graham``` does not have a graphical user interface (GUI) and instead uses a command line interface. Commands are generally not run directly but rather submitted as a job via a job file to a [scheduler][9]. (Compute Canada uses SLURM as its scheduler.) A job is submitted as a simple text file that contains information about which allocation to run the job on, how many compute nodes the job needs, how much memory the job needs, and how long the job will take to run. Additionally information about where to get input and where to output the results of the job can be supplied. Finally, each job file contains a series of commands to run. The main point of a job file is to call a separate Matlab/python/bash/etc. script or Singularity container in a **single** command line. The scheduler then puts the job in a queue until the requested resources are available. It is important not to request significantly more resources than the job requires, as Compute Canada may choose to place jobs requesting excessive resources in a separate queue with significantly longer wait times. Additionally, the job will run for as long as the job file indicates. (The clock starts when the job begins to run, not while it is pending.) So, if a job file requests 3 hours and the job is still running at the end of the 3 hours, the scheduler will kill the job instead of letting it finish. The log file for the job will indicate a 'time-out' error. When running a new a command in a job file, it is recommended to run it on a single dataset to gauge what resources the job needs (e.g., compute nodes, memory, time) before deploying on all datasets. ---------- ## Submitting SLURM jobs Compute Canada provides instructions for submitting and monitoring SLURM jobs. [https://docs.computecanada.ca/wiki/Running_jobs][10] Drawing from the Compute Canada website, a general template for a simple slurm job file is something like: ``` #!/bin/bash #SBATCH --account=def-<SPONSORNAME> #SBATCH --time=00:01:00 #SBATCH --job-name=test #SBATCH --output=%x-%j.out echo 'Hello, world!' ``` Compute Canada also provides [support for running Matlab code][11] (either within or outside the MCR environment). Users based at Western University (or Sponsored by Western University Faculty) wishing to use Matlab on Compute Canada should contact Compute Canada's technical support (support AT computecanada DOT ca) and request to be added to Western University's Matlab license access list. Support will ask you to verify your Sponsor before submitting a request to WTS to add you to the license access list. (Users based at other institutions or sponsored by non-Western University Faculty should check with Compute Canada support to see if their institution has a site license or if they need to supply their own Matlab license.) A job file for submitting a Matlab job might look something like: ``` #!/bin/bash -l #SBATCH --job-name=sc1sc2_connectivity_test #SBATCH --account=def-<SPONSORNAME> #SBATCH --time=0-01:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=4000 #SBATCH --mail-user=<USER>@uwo.ca #SBATCH --mail-type=ALL # Load matlab module module load matlab/2018a # matlab command srun matlab -nodisplay -singleCompThread -r "addpath(genpath('./')); sc1sc2_connectivity('test')" ``` Some helpful SLURM commands: ``` sbatch <jobfile.sl> # submit job file to SLURM scheduler squeue -u <username> # what jobs are currently queued/running for a specific user sshare -U <username> # check the share usage for a specific user scancel <jobid> # cancel a specific job scancel -u <username> # cancel all queued and running jobs for a specific user sacct -j <jobid> --format JobID,ReqMem,MaxRSS,Timelimit,Elapsed # checked completed job resource usage ``` ### Do not try to circumvent the scheduler! Per [Compute Canada][12]: **All jobs must be submitted via the scheduler!** **Exceptions are made for compilation and other tasks not expected to consume more than about 10 CPU-minutes and about 4 gigabytes of RAM. Such tasks may be run on a login node. In no case should you run processes on compute nodes except via the scheduler.** It is possible to be granted a time-limited [interactive session][13] during which jobs may be submitted directly to the compute nodes, outside of the scheduler. Users who have cloned the git repository, [neuroglia-helpers][14], can make use of the ```regularInteractive``` command to automatically request a 3-hour interactive session. **N.B.:** Interactive sessions count against the compute allocation resources in the same way that jobs submitted via the SLURM scheduler do. ---------- ## Running BIDS apps on Compute Canada Please consult the wiki page on [neuroglia-helpers][15] for documentation on how to make use of Khan Lab wrapper scripts for submitting parallelized BIDS app jobs. [1]: https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/rrg/ [2]: https://ccdb.computecanada.ca/security/login [3]: https://www.computecanada.ca/research-portal/account-management/apply-for-an-account/ [4]: https://docs.computecanada.ca/wiki/Frequently_Asked_Questions_about_the_CCDB [5]: https://www.sharcnet.ca/my/front/ [6]: https://www.sharcnet.ca/my/news/events [7]: https://www.putty.org/ [8]: https://docs.computecanada.ca/wiki/Using_nearline_storage [9]: https://docs.computecanada.ca/wiki/What_is_a_scheduler? [10]: https://docs.computecanada.ca/wiki/Running_jobs [11]: https://docs.computecanada.ca/wiki/MATLAB [12]: https://docs.computecanada.ca/wiki/Running_jobs [13]: https://docs.computecanada.ca/wiki/Running_jobs#Interactive_jobs [14]: https://osf.io/k89fh/wiki/neuroglia-helpers/ [15]: https://osf.io/k89fh/wiki/neuroglia-helpers/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.