Usage

From Digitalis

Revision as of 12:08, 23 April 2013 by Neyron (Talk | contribs)
Jump to: navigation, search

Contents

Overview

Technically speaking, the Digitalis platform is composed of the hardware machines described below. Some of them are managed by the Grid'5000 team (national service), some of them are managed locally.

This page decribes how to use the locally managed machines.

Hardware description

Grid'5000 Grenoble clusters

Grenoble Grid'5000 site is composed of 3 clusters (as of 2012-03): genepi, edel and adonis. More information can be found on Grid'5000 Grenoble site pages. Those machines are handled by the Grid'5000 global (national) system. One must then refer to the Grid'5000 documentation to know how to use them. The remaining of this page is mostly not relevant to those clusters.

Grimage cluster

Grimage 10GE network

The Grimage cluster was originally dedicated to connect the Grimage platform hardware (cameras, etc) and process its data (videos captures, etc).

More recently, 10GE ethernet cards were added to some nodes for a new project, making the cluster a mutualized platform (multi-project). Currently, at least 4 projects are using the cluster, requiring the resource management system and deployment system adapted to a experimental platform, just like Grid'5000.

The hardware configuration of the grimage nodes may change
  • new video (GPU) cards may be installed over time
  • 10GE network connections may change
  • ...
Current 10GE network setup is as follows
  • One Myricom dual port card is installed on each of grimage-{4,5,7,8}
  • One Intel dual port card is installed on each of grimage-{2,5,6,7}

Connexions are point to point (NIC to NIC, no switch) as follows:

  • Myricom: grimage-7 <-> grimage-8 <-> grimage-4 <-> grimage-5
  • Intel: grimage-2 <=> grimage-5 et grimage-6 <=> grimage-7 (double links)

Special machines

Those machines are resources co-funded by several teams in order to provide experimental platforms for problems such as:

  • large and complex SMP configurations
  • complex processor/cache architecture analysis
  • multi-GPU configurations
  • etc

Currenlty the following machines are available

idgraf

  • 2x Intel Xeon X5650 (Westmere, 6 cores each, total 12 cores)
  • 72 GB RAM
  • 8x Nvidia Tesla C2050

idfreeze

  • 4x AMD Opteron 6174 (total 48 cores)
  • 256 GB RAM

Hardware summary table

Platform: Grid'5000 -> access via frontend.grenoble.grid5000.fr
Machine CPU RAM GPU Network Other
genepi-[1-34].grenoble.grid5000.fr2x Intel E5420 (8C)8GB DDR2IB DDR
edel-[1-72].grenoble.grid5000.fr2x Intel E5520 (8C)24GB DDR3IB QDR
adonis-[1-10].grenoble.grid5000.fr2x Intel E5520 (8C)24GB DDR31/2x S1070 (2GPU)IB QDR
Platform: Digitalis -> access via digitalis.grenoble.grid5000.fr
Machine CPU RAM GPU Network Other
grimage-1.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR31x GTX-680 (1GPU)IB DDRKeyboard/Mouse/Screen attached (4/3 screen, on the left, same as grimage-7)
grimage-2.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR3IB DDR + 1x 10GE (DualPort)2x Camera (firewire)
grimage-3.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR31x GTX-680 (1GPU)IB DDRKeyboard/Mouse/Screen attached (16/9 screen, on the right) + 2x cameras (firewire)
grimage-4.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR3IB DDR + 1x 10GE (DualPort)2x Camera (firewire)
grimage-5.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR3IB DDR + 2x 10GE (DualPort)2x Camera (firewire)
grimage-6.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR3IB DDR + 1x 10GE (DualPort)
grimage-7.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR31x GTX-580 (1GPU)IB DDR + 2x 10GE (DualPort)Keyboard/Mouse/Screen attached (4/3 screen, on the left, same as grimage-1)
grimage-8.grenoble.grid5000.fr2x Intel E5530 (8C)12GB DDR3IB DDR + 1x 10GE (DualPort)
grimage-9.grenoble.grid5000.fr2x Intel E5620 (8C)24GB DDR31x GTX-295 (2GPU)IB DDR
grimage-10.grenoble.grid5000.fr2x Intel E5620 (8C)24GB DDR31x GTX-295 (2GPU)IB DDR
idgraf.grenoble.grid5000.fr2x Intel X5650 (12C)72GB DDR38x Tesla C2050 (8GPU)
idfreeze.grenoble.grid5000.fr4x AMD 6174 (48C)256GB DDR3
Platform: DMZ@ID -> access via incas.imag.fr
Machine CPU RAM GPU Network Other
idkoiff.imag.fr8x AMD 875 (16C)32GB DDR21x GTX-280 (1GPU)

Services

Dedicated services

Dedicated services are provided for the management of our machines. Indeed, our machines couldn't fit in Grid'5000 model, due to their special characteristics and usage: The Grimage cluster is special in the fact that it operates the Grimage platform with cameras and other equipments attached, making it's hardware configuration different. Other local machines are special in the fact that they are unique resources, which make their model of usage very different from the one of a cluster of many identical machines as found with Grid'5000 clusters.

As a result, a dedicated resource management system (OAR) is provided to manage the access to the machines, with special mechanics (different from the ones provided in Grid'5000). A dedicated deployment system (kadeploy) is also provided to handle user's customized operating systems that can be deployed on the machines. Even if different from the main Grid'5000 tools, many of the documented information for the Grid'5000 tools also apply to our dedicated services. This document actually only explains their specificities.

OAR and Kadeploy frontend for our machines is the machine named digitalis.grenoble.grid5000.fr.

Mutualised services (services provided by Grid'5000)

Many services we use on our local machines are provided by the Grid'5000 infrastructure, from a national perspective. For instance, the following services are provided for Grid'5000 but also serve our local purposes (by courtasy) :

  • access machines
  • NFS storage
  • proxying
  • and more.

Please mind the fact that all services are not dedicated to our local needs.

Terms of service

Grid'5000 services are handled nationaly for the global platform (11 sites, France-wide). As a result, some aspects may seam more complex than the should from a local perspective. Please mind the fact that some services are not for our local conveniance only. Furthermore, the local platform is to be seen as an extension to the main Grid'5000 platform, that is not supported by the Grid'5000 staff, even if we can freely benefit from some services they provide.

As a result, we are subject to rules edicted by the Grid'5000 platform:

  • Security policies: restricted access to the network, output traffic filtering.
  • Maintenance schedules: Thursday is the maintenance day, do not be surprised if interruption of services happen on that day !
  • Rules of good behavior within the large Grid'5000 user community (reading the mailing lists is a must)

If one is using the "official" Grid'5000 nodes, one must comply to the UserCharter (as approved by every user when requesting a Grid'5000 account)

Data integrity

There is not guarantee provided against data loss on the Grid'5000 NFS (home directories), nor on machines local hard drives. No backup is performed, so in case of an incident, the Grid'5000 staff will not be able to provide you any way to get back any data.

As a result, if you have data you really care about, and cannot reproduce with an acceptable cost (time of computation) with regard to risks of data loss (which rarely happens), it is strongly suggested you back them up elsewhere.

(NFS storages uses RAID to overcome a disk failure, but RAID is not backup)

Platform usage

Machine access

Access to the machine is controlled by the resource manager. This means that users cannot just ssh to a machine and have processes indefinitely running on them (e.g. vi process).

Any user must instead book the machine for a period of time (a job), during which access will be granted to him, maybe with some other privileges (depending on the requested type of job). Once the period of time is ended, all rights are revoked, and all processes of the user are killed.

By default users are not root on the machines. Some privileged commands may however be permitted (e.g. schedtool). Default access to a machine is not exclusive, which means that many users can have processes on the machine at a same time, unless a user requested an exclusive access.

Special use cases also require full access to the machine: one want to be root, to be able to reboot the machine, or even to be able to install software or a different operating system. Just like on Grid'5000, this is possible, at the cost of the use of kadeploy.

The frontend machine to access to the resources is: digitalis.grenoble.grid5000.fr

Use cases

I want to access a machine

To access a specific machine, just provide the machine name in the oarsub command:

pneyron@digitalis:~$ oarsub -I -p "machine = 'idgraf'"
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: .ssh/id_rsa
OAR_JOB_ID=1122
Interactive mode : waiting...
Starting... 

Connect to OAR job 1122 via the node idgraf.grenoble.grid5000.fr
pneyron@idgraf:~$ 

You then get access to the machine for 1 hour by default (add -l walltime=4 for 4 hours).

Note that if the machine is not available (e.g. an exclusive job is already running), you will have to wait until it is freed up (see the resource usage visualization tools).

If no machine is specified, you get access to one of the grimage nodes.

You can use the oarsh command to open other shells to the machine, as long as the job is still running.

Please read OAR's documentation for more details.

I want to gain exclusive access to a machine for N hours

To access to a machine and be alone (to avoid noises of other users), give the exclusive type to your job:

pneyron@digitalis:~$ oarsub -I -p "machine = 'idgraf'" -t exclusive -l walltime=N
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: .ssh/id_rsa
OAR_JOB_ID=1122
Interactive mode : waiting...
Starting... 

Connect to OAR job 1122 via the node idgraf.grenoble.grid5000.fr
pneyron@idgraf:~$ 

You then get access to the machine for N hours, nobody else can access the machine during your job.

Note that if the machine is not available, you will have to wait until it is free (see the resource usage visualization tools).

Also, some privileged command can be run via sudo in exclusive jobs (see below).

I want to execute privileged commands on my node

Within a exclusive job, some privileged commands can be run via sudo. Those authorized privileged commands typically have an impact on other users, hence they require an exclusive access (job) to the machine.

Currently, the following commands can be run via sudo in exclusive jobs:

on idgraf
  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /sbin/reboot
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/nvidia-smi (please notify other users via the digitalis mailing list if you change parameters on GPUs that will not be reset to default after a reboot, e.g. the memory ECC configuration)
  • sudo /usr/local/bin/ipmi-reset
on idgraf
  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /sbin/reboot
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/opcontrol
on grimage
  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /sbin/reboot
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/nvidia-smi

If the privileged command you need is not available (available commands run without any sudo password prompt), you can ask your administrator whether it's possible to enable it, but command considered harmful to the system will not made available. Please mind deploying your own operating system on the machine to get full privileges.

I want to be able to reboot a node without loosing my reservation

Rebooting a node kills jobs, therefor a special job type is provided to overcome this and allow rebooting nodes while keeping them booked. Unsurprisingly, this job type is named reboot (-t reboot). This type of job does not provide a shell on a node but on the frontend instead (just like deploy jobs). To get access to the nodes, the user must then run an exclusive job concurrently, and possibly several of them if they get interrupted by reboots.

Example of use:

pneyron@digitalis:~$ oarsub -I -t reboot -p "host like 'grimage-4.%'"
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=1129
Interactive mode : waiting...
Starting...

Connect to OAR job 1129 via the node 127.0.0.1
pneyron@digitalis:~$ 

Note that you get a shell on digitalis instead of on a grimage-4, unlike with an exclusive job.


While such a job is running, reboot can be performed either from the node (from the shell of an exclusive job) or from the frontend (digitalis).

Reboot from the node, as follows
pneyron@digitalis:~$ oarsub -I -t exclusive -p "host like 'grimage-4.%'"
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=1130
Interactive mode : waiting...
Starting...

Connect to OAR job 1130 via the node grimage-4.grenoble.grid5000.fr
pneyron@grimage-4:~$ 
pneyron@grimage-4:~$ sudo reboot
The system is going down for reboot NOW!enoble.grid5000.fr (pts/0) (Fri Jul 2
pneyron@grimage-4:~$ Connection to grimage-4.grenoble.grid5000.fr closed by remote host.
Connection to grimage-4.grenoble.grid5000.fr closed.
[ERROR] An unknown error occured : 65280
Disconnected from OAR job 1130
pneyron@digitalis:~$

(the interruption of the job due to the reboot causes some error that can be ignored of course.)

Reboot from the frontend as follows
pneyron@digitalis:~$ sudo node-reboot grimage-4.grenoble.grid5000.fr
[sudo] password for pneyron: 
*** Checking if pneyron is allowed to reboot grimage-4.grenoble.grid5000.fr
OK, you have a job of type "reboot" on the node, firing a reboot command !
--- switch_pxe (grimage cluster)
  >>>  grimage-4.grenoble.grid5000.fr
--- reboot (grimage cluster)
  >>>  grimage-4.grenoble.grid5000.fr
  *** A soft reboot will be performed on the nodes grimage-4.grenoble.grid5000.fr
-------------------------
CMD: ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -o ConnectTimeout=2 -o UserKnownHostsFile=/dev/null -i /etc/kadeploy3/keys/id_deploy root@grimage-4.grenoble.grid5000.fr "nohup /sbin/reboot -f &>/dev/null &"
grimage-4.grenoble.grid5000.fr -- EXIT STATUS: 0
-------------------------
--- set_vlan (grimage cluster)
  >>>  grimage-4.grenoble.grid5000.fr
  *** Bypass the VLAN setting

NB: Please note that reboot jobs are exclusive.

Once rebooted, the user can get a new shell on the node by resubmitting an exclusive job, thanks to the reboot job which guarantees that no other user reserved the nodes in the meantime.

I want to change the system (OS, software) on the machine

Use the deploy type. See Grid'5000 documentation about kadeploy. The kadeploy instance on digitalis works the same way.

I want to book the machine for next night

OAR allows advance reservations

pneyron@digitalis:~$ oarsub -r "2012-04-01 20:00:00" -l walltime=4 -p "machine='idgraf'"
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: .ssh/id_rsa
OAR_JOB_ID=1125
Reservation mode : waiting validation...
Reservation valid --> OK
pneyron@digitalis:~$ 

Once your job starts (on April 1st, 8pm), you will be able to oarsh to the node.

See OAR's documentation for more information.

Tips and tricks

I want to access to digitalis directly without having to go first to the access machine

Add to you ssh configuration on your workstation (~/.ssh/config):

cat <<EOF >> .ssh/config
Host *.g5k
ProxyCommand ssh pneyron@access.grid5000.fr "nc -q 0 `basename %h .g5k` %p"
User pneyron
ForwardAgent no
EOF

(replace pneyron by your Grid'5000 login)

Make sure you pushed your SSH public key to Grid'5000. see https://api.grid5000.fr/sid/users/_admin/index.html

Then you should be able to ssh to digitalis directly:

neyron@workstation:~$ ssh digitalis.grenoble.g5k
Linux digitalis.grenoble.grid5000.fr 2.6.26-2-xen-amd64 #1 SMP Tue Jan 25 06:13:50 UTC 2011 x86_64
[...]
Last login: Thu Mar 22 14:36:05 2012 from access.grenoble.grid5000.fr
pneyron@digitalis:~$

I want to ssh directly from my workstation to my experimentation machine

(Note: This does not apply to the case of deploy jobs)

Make sure that jobs you create use a job key. For that, create a public/private key pair on digitalis (with no passphrase):

pneyron@digitalis:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pneyron/.ssh/id_rsa):
[...]

(Don't use your existing SSH keys, located on your workstation and protected by a passphrase, for security concerns)

Then add to your .bashrc in your home (on digitalis for instance):

cat <<EOF >> ~/.bashrc
export OAR_JOB_KEY_FILE=~/.ssh/id_rsa
EOF

(Make sure your .bashrc is sourced upon login, or look at your .profile...)

The oarsub command will now use this key for your jobs.

pneyron@digitalis:~$ oarsub -I
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=1119
[...]

Copy your keys on your worskation:

scp digitalis.grenoble.g5k:.ssh/id_rsa ~/.ssh/id_rsa_g5k
scp digitalis.grenoble.g5k:.ssh/id_rsa.pub ~/.ssh/id_rsa_g5k.pub

Add to your ssh configuration on your workstation (~/.ssh/config):

cat <<EOF >> .ssh/config
Host *.g5koar
ProxyCommand ssh pneyron@access.grid5000.fr "nc -q 0 `basename %h .g5koar` 6667"
User oar
IdentityFile ~/.ssh/id_rsa_g5k
ForwardAgent no
EOF

(replace pneyron by your Grid'5000 login)

Then you should be able to ssh directly to a machine that you previously reserved in a OAR job:

neyron@workstation:~$ ssh idgraf.grenoble.g5koar
Linux idgraf.grenoble.grid5000.fr 3.2.0-2-amd64 #1 SMP Sun Mar 4 22:48:17 UTC 2012 x86_64
[...]
pneyron@idgraf:~$

I want my code to be pushed automatically to the machine

One can use inotifywait for instance.

To push files edited by vi for instance:

while f=$(inotifywait . --excludei '(\.swp)|(~)$' -e modify --format %f); do rsync -av $f remote_machine:remote_dir/; done

see

man ionotifywait

A node is marked Absent or Suspected, how to fix it ?

Nodes stay Absent sometime after deploy jobs. While a short Absent time is normal during the reboot phase that follows the termination of a deploy job, having a long Absent time (more than 15 minutes) usually reveals a failed reboot. If you detect such a problem, please feel free to reboot the node again, from the frontend as follows:

pneyron@digitalis:~$ sudo node-reboot grimage-9.grenoble.grid5000.fr
[sudo] password for pneyron: 
*** Checking if pneyron is allowed to reboot grimage-9.grenoble.grid5000.fr
OK, node is absent or suspected, firing a reboot command !
--- switch_pxe (grimage cluster)
  >>>  grimage-9.grenoble.grid5000.fr
--- reboot (grimage cluster)
  >>>  grimage-9.grenoble.grid5000.fr
  *** A soft reboot will be performed on the nodes grimage-9.grenoble.grid5000.fr
-------------------------
CMD: ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -o ConnectTimeout=2 -o UserKnownHostsFile=/dev/null -i /etc/kadeploy3/keys/id_deploy root@grimage-9.grenoble.grid5000.fr "nohup /sbin/reboot -f &>/dev/null &"
grimage-9.grenoble.grid5000.fr -- EXIT STATUS: 0
-------------------------
--- set_vlan (grimage cluster)
  >>>  grimage-9.grenoble.grid5000.fr
  *** Bypass the VLAN setting
pneyron@digitalis:~$

Rarely, nodes can also be marked as Suspected for an unknown reason. If a node stays Suspected for a long time, you can also try to reboot it, using the same command.

kaconsole3 is not working on idgraf

The IPMI stack of the BMC of idgraf is buggy. If you want to use the console but see that it is broken (no prompt), you can try to fix the BMC.

This is possible if you are in an exclusive job, by running:

sudo ipmi-reset

This is also possible if you are root (i.e. in a deploy job), by running

ipmitool mc reset cold

(Please do not play with other IPMI commands, since this will break the system).

NB: this reset takes a few minutes to complete.

How do I exit kaconsole3 ??

type "&."

What is x2x and how to use it

Note

This tip is only useful for people that have to work in the Grimage room, with a screen attached to a Grimage machine

x2x allows to control the mouse pointer and keyboard input of a remote machine over the network (X11 protocol). In the case of the Grimage nodes which have a screen attached, it is very practical because it allows to not use the USB mouse and keyboard, which are sometime buggy (because of the out of norm USB cable extension).

To use x2x:

  1. login locally on the machine (gdm)
  2. run xhost + to allow remote X connections.
  3. from you workstation: run
ssh pneyron@grimage-1.grenoble.g5k -X x2x -to grimage-1:0 -west
NB
  • replace pneyron by your username
  • replace the 2 occurences of grimage-1 by the name of the Grimage node you actually use.
  • make sure you get the ssh configuration to get the *.g5k trick to work (see the tip above)

Access seems to be broken, what can I do ?

If access to the Grid'5000 network is broken, e.g. the access machine is not reachable:

  1. Check the Grid'5000 incident page
  2. Check your emails about possible outage or maintenance (planned or exceptional)
  3. Try other access paths to the grid'5000 network:
    1. access-north.grid5000.fr > digitalis.grenoble.grid5000.fr
    2. access-south.grid5000.fr > digitalis.grenoble.grid5000.fr
    3. navajo.imag.fr > access.grenoble.grid5000.fr > digitalis.grenoble.grid5000.fr
    4. bastion.inrialpes.fr > access.grenoble.grid5000.fr > digitalis.grenoble.grid5000.fr

Any other question ?

Please visite the Grid'5000 website: http://www.grid5000.fr

Resource usage visualization tools

2 tools are available to see how resources are or will be used:

chandler

Chandler is command line tool, to run on digitalis. It gives a view of the current usage of the machines.

pneyron@digitalis:~$ chandler

4 jobs, 92 resources, 60 used
         grimage-1 	TTTTTTTT grimage-2 	TTTTTTTT grimage-3 	
TTTTTTTT grimage-4 	TTTTTTTT grimage-5 	         grimage-6 	
         grimage-7 	JJJJJJJJ grimage-8 	JJJJJJJJ grimage-9 	
         grimage-10 	TTTTTTTTTTTT idgraf 	

 =Free  =Standby J=Exclusive job T=Timesharing job S=Suspected A=Absent D=Dead

grimage-2.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-3.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-4.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-5.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-8.grenoble.grid5000.fr
  [1115] pneyron (reboot)

grimage-9.grenoble.grid5000.fr
  [1115] pneyron (reboot)

idgraf.grenoble.grid5000.fr
  [1113] jvlima (shared)
  [1114] pneyron (shared)

Drawgantt

Drawgantt give a view of the past, current and future usage of the machines.

https://intranet.grid5000.fr/oar/grenoble/digitalis/drawgantt.html

Other OAR tools

All OAR command are available, see OAR's documentation.

  • oarstat: list current jobs
  • oarnodes: list the resources with their properties
  • etc.

Platform information and technical contact

Mailing lists

Dedicated list

A mailing list is dedicated to the communication about the locally managed machine: digitalis@lists.grid5000.fr. You'll get information through emails sent to this list, and you can also write to this list if you have to communicate something to the other users of the local machines.

You must be a member of the digitalis group (see/edit your affiliation in Grid'5000 users management system) to receive/send e-mails from/to this mailing list.

Grid'5000 lists

Grid'5000 provide many mailing lists which any Grid'5000 user automatically receives (e.g. users@lists.grid5000.fr). Since the local machines benefit from global Grid'5000 services, you should keep an eye on information sent on those mailing lists to be aware of potential exceptional maintenance schedules for instance.

Be aware that Thursday is the maintenance day. Regular maintenances are programmed which may for instance impact the NFS service.

Please do not use the users@lists.grid5000.fr list for issue related to the local machines, since the Grid'5000 staff is not in charge of those machines.

Grid'5000 Platform Events

Please also bookmark Grid'5000 platform events page, which list futurs events programmed for the platform. You can also subscribe to the RSS feed.

Jabber

For any issue with the platform, you can contact me using Grid'5000 jabber. Feed free to add me to your buddy list: pneyron@jabber.grid5000.fr

Personal tools
platforms