Idbool

From Digitalis

Revision as of 14:24, 29 April 2015 by Neyron (Talk | contribs)
Jump to: navigation, search
| Introduction | Usage | Idfreeze | Idgraf | Idphix | Idbool | Idkat | Idcin | Idarm | Ppol | Grimage |

Contents

Overview

Idbool is a CC-NUMA system of 192 cores using the Numascale Numaconnect interconnect.

Technically, the machine is composed of 4 chassis/motherboards, each equipped with 3 AMD Opteron(tm) Processor 6376 (Abu Dhabi, 16 cores) and interconnected to the other nodes using the Numaconnect interconnect in a tore configuration with double links.

This Numaconnect interconnect provides a full hardware Single System Image (SSI) with a single memory space with cache coherency. As a result the system appears as a single Linux system.

Currently the systems is powered by a Ubuntu 14.04 LTS.

Technical documentations and other resources

For questions related to the performance achievable on this machine, please look at:

Installation notes

Instruction to install the system with Ubuntu 14.04 LTS

  • Alter /etc/sysct.conf and kernel parameters (taken from the Numascale Wiki: https://wiki.numascale.com/tips/os-tips)
  • Apply the patch from http://askubuntu.com/questions/468466/why-this-occurs-error-diskfilter-writes-are-not-supported due to software raid
  • Remove irqbalance, suggested by Numascale
  • Disable selinux and apparmor in /etc/default/grub, after that update-grub. Also disabled apparmor startup script
  • Blacklist the edac drivers, because they caused and error during boot seen in dmesg ( /etc/modprobe.d/blacklist.conf )
    • Not recommended by NumaScale, therefore reverted the steps above again. The traces can be considered as warning
    • This is due to scalability in the kernel, which should be fixed with the NumScale provided kernel
  • Install the linux-image-3.15.10-numascale17+_3.15.10-numascale17+-2_amd64.deb patch:
    • Works perfectly, scales pretty good: but swap is not in the kernel, so no swap space is usable. But swapping on a Numasystem does not make sense at all, because this slows down even more than on a normal system

How to experiment

Reserving and accessing idbool

By default OAR only gives access to 1 or the 4 hosts (motherboards) of the machine
[pneyron@digitalis ~]$ oarsub -I -p "machine='idbool'"
Properties: machine='idbool'
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=8348
Interactive mode : waiting...
Starting...

Connect to OAR job 8348 via the node idbool.grenoble.grid5000.fr
[OAR] OAR_JOB_ID=8348
[OAR] Your nodes are:
      idbool-1.grenoble.grid5000.fr*48

[pneyron@idbool ~](8348-->60mn)$ 

Then see:

[pneyron@idbool ~](8348-->57mn)$ cat /dev/cpuset/$(grep -o "/oar/.*" /proc/self/cgroup)/cpus
0-47
[pneyron@idbool ~](8348-->57mn)$ cat /dev/cpuset/$(grep -o "/oar/.*" /proc/self/cgroup)/mems
0-5

This job only gives access to the resources of the first host (motherboard) of the machine: logical CPUS (core) 0 to 47 and Numa nodes 0 to 5. Other resources of the machine can be seen (e.g. in `top') but are not reachable because isolated by the linux container cpuset of your job.


To reserve the complete machine, one must specify -l machine=1.

Furthermore, we request a 4 hours job in the example below:

[pneyron@digitalis ~]$ oarsub -I -p "machine='idbool'" -l machine=1,walltime=4
Properties: machine='idbool'
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=8349
Interactive mode : waiting...
Starting...

Connect to OAR job 8349 via the node idbool.grenoble.grid5000.fr
[OAR] OAR_JOB_ID=8349
[OAR] Your nodes are:
      idbool-1.grenoble.grid5000.fr*48
      idbool-2.grenoble.grid5000.fr*48
      idbool-3.grenoble.grid5000.fr*48
      idbool-4.grenoble.grid5000.fr*48

[pneyron@idbool ~](8349-->239mn)$

Privileged commands

Currently, the following commands can be run via sudo in exclusive jobs:

  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/opcontrol
  • sudo /usr/bin/perf
  • sudo /usr/bin/lstopo

Commands in the following directories are also runnable as sudo (with no requirement for an exclusive job for now, so please mind what you are doing with regard to other concurrent users):

  • /opt/nc-utils/os/nc_test/nc_perf/*
  • /opt/nc-utils/os/nc_test/nc_log_d/*
  • /opt/nc-utils/os/nc_test/nc_stat/*
  • /opt/nc-utils/os/nc_test/nc_stat_d/
  • /opt/nc-utils/os/numaplace/
  • /opt/nc-utils/tools/
  • /usr/local/bin/likwid-perfctr

Mind the fact that those commands might have side-effects, so watch out and inform other via the mailing list if relevant.

Performances

In order to get performance using the whole machine (see the case "machine=1" above), a special care must be taken with regard to data placement in memory vs. cpus. Indeed the numa factor between numa nodes from one motherboard to another motherboard is very high. A typical bandwidth might be as low as 90MB/s if accessing from one CPU, memory of a remote numa nodes. Numascale strongly advises to read https://resources.numascale.com/numascale-scaling-best-practice.pdf.

Personal tools
platforms