Grimage

From Digitalis

(Difference between revisions)
Jump to: navigation, search
 
(17 intermediate revisions not shown)
Line 1: Line 1:
{{Tabs}}
{{Tabs}}
 +
===== Overview =====
===== Overview =====
-
[[File:Grimage10GE.png|200px|thumb|right|Grimage 10GE network]]
 
The Grimage cluster was originally dedicated to support the Grimage VR platform: handle hardware (cameras, etc) and process data (videos captures, etc).  
The Grimage cluster was originally dedicated to support the Grimage VR platform: handle hardware (cameras, etc) and process data (videos captures, etc).  
Line 12: Line 12:
* ...
* ...
-
; Current 10GE network setup is as follows:
+
== Updates ==
-
* One Myricom dual port card is installed on each of grimage-{4,5,7,8}
+
; 2014-05
-
* One Intel dual port card is installed on each of grimage-{2,5,6,7}
+
Intel 10GE cards are removed from the nodes to be used in the ppol nodes.
-
Connexions are point to point (NIC to NIC, no switch) as follows:
+
 
-
* Myricom: grimage-7 <-> grimage-8 <-> grimage-4 <-> grimage-5
+
; 2015-03
-
* Intel: grimage-2 <=> grimage-5 et grimage-6 <=> grimage-7 (double links)
+
The grimage machines in the F110 machine room have to be shut-down to lower noises in the next room. As a result grimage-1 to 8 are not available anymore.
 +
 
 +
Grimage-9 and Grimage-10 which are hosted in the F212 room are still available.
 +
 
 +
; 2015-08
 +
A Nvidia Tesla K40c GPU is installed in grimage-9. Grimage-10 now has 2 Geforce GTX295 = 4 GPUs.
 +
 
 +
== Machines ==
 +
{|
 +
|-align="center" bgcolor="#E3E3F9"
 +
| '''Machine''' || '''CPU''' || '''RAM''' || '''GPU''' || '''Network''' || '''Other'''
 +
{{GrimageHardware}}
 +
|}
 +
 
 +
Some references to the models of GPUs:
 +
* https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Tesla
== How to experiment ==
== How to experiment ==
 +
[[File:Grimage10GE.png|200px|thumb|right|Grimage 10GE network]]
The default system of the grimage node is design to operate the Grimage VR room.
The default system of the grimage node is design to operate the Grimage VR room.
Line 49: Line 65:
* replace the 2 occurences of grimage-1 by the name of the Grimage node you actually use.  
* replace the 2 occurences of grimage-1 by the name of the Grimage node you actually use.  
* make sure you get the ssh configuration to get the *.g5k trick to work (see the tip above)
* make sure you get the ssh configuration to get the *.g5k trick to work (see the tip above)
 +
 +
=== 10GE network setup ===
 +
* One Myricom dual port card is installed on each of grimage-{4,5,7,8}
 +
* One Intel dual port card is installed on each of grimage-{2,5,6,7}
 +
Connexions are point to point (NIC to NIC, no switch) as follows:
 +
* Myricom: grimage-7 <-> grimage-8 <-> grimage-4 <-> grimage-5
 +
* Intel: grimage-2 <=> grimage-5 et grimage-6 <=> grimage-7 (double links)
 +
 +
 +
== Crash course ==
 +
To book a single grimage machine, grimage-10 for instance.
 +
=== Normal (no deploy) jobs ===
 +
; book the machine for 4 hours, to run a script
 +
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" /path/to/my/script
 +
 +
; book the machine for 4 hours, for an interactive session
 +
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" -I
 +
 +
; get an exclusive access:
 +
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" -t exclusive /path/to/my/script
 +
 +
 +
=== Deploy jobs ===
 +
For changing the OS (install specific software, newer versions. etc)
 +
 +
; book the machine for 2 hours, for changing the OS:
 +
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=2 -p "host like 'grimage-10.%'" -t redeploy "sleep 4h"
 +
 +
; Look at the jobs
 +
[pneyron@digitalis ~]$ chandler
 +
 +
; deploy
 +
[pneyron@digitalis ~]$ kadeploy3 -m grimage-10.grenoble.grid5000.fr -e idALL-default -u root -k
 +
 +
; get access to the unix console of the machine (mostly useful for low level debuging, e.g. seeing boot/kernel messages)
 +
[pneyron@digitalis ~]$ kaconsile -m grimage-10.grenoble.grid5000.fr
 +
 +
; ssh to the machine as root
 +
[pneyron@digitalis ~]$ ssh grimage-10 -l root
 +
 +
; modify the operating system of the machine
 +
...
 +
 +
; save the modifications
 +
root@grimage-10:~# tgz-g5k /tmp/idALL-my.tgz
 +
 +
; then from the frontend:
 +
[pneyron@digitalis ~]$ scp root@grimage-10:/tmp/idALL-my.tgz ~/
 +
 +
* create the environment file
 +
[pneyron@digitalis ~]$ cat <<EOF > idALL-my.env
 +
---
 +
name: idALL-my
 +
version: 20150925
 +
description: my env
 +
author: ...
 +
visibility: shared
 +
destructive: false
 +
os: linux
 +
image:
 +
  file: $HOME/idALL-my.tgz
 +
  kind: tar
 +
  compression: gzip
 +
postinstalls:
 +
- archive: /var/lib/deploy/environments/postinstalls/idall-default-postinst.tgz-20141216-1
 +
  compression: gzip
 +
  script: run /rambin
 +
boot:
 +
  kernel: /vmlinuz
 +
  initrd: /initrd.img
 +
filesystem: ext4
 +
partition_type: 131
 +
multipart: false
 +
 +
; then, instead of starting over for next deployments, do
 +
[pneyron@digitalis ~]$ kadeploy3 -m grimage-10.grenoble.grid5000.fr -a ~/idALL-my.env -k
== System changelog ==
== System changelog ==
-
To be completed.
+
; 2015-03
 +
Grimage-1 to grimage-8 are out of service (machine room shutdown). Grimage-9 and -10 which are not located in the grimage machine room are still operational.
 +
; 2015-01-25
 +
Grimage machines now use the idALL system environment (same as idgraf and idfreeze).
 +
 
 +
== Acknolegment ==
 +
The grimage machines were funded by the Grimage project.

Current revision as of 09:32, 29 August 2016

| Introduction | Usage | Idfreeze | Idgraf | Idphix | Idbool | Idkat | Idcin | Idarm | Ppol | Grimage |

Contents

Overview

The Grimage cluster was originally dedicated to support the Grimage VR platform: handle hardware (cameras, etc) and process data (videos captures, etc).

More recently, 10GE ethernet cards were added to some nodes for a new project, making the cluster a mutualized platform (multi-project). Currently, at least 4 projects are using the cluster, requiring a resource management system and deployment system adapted to an experimental platform, just like Grid'5000.

Grimage nodes have big computer cases (4U), with the purpose of being able to host various hardware.

By design, the hardware configuration of the Grimage nodes is subject to changes
  • new generation of video (GPU) cards may be installed over time
  • 10GE network connections may change
  • ...

Updates

2014-05

Intel 10GE cards are removed from the nodes to be used in the ppol nodes.

2015-03

The grimage machines in the F110 machine room have to be shut-down to lower noises in the next room. As a result grimage-1 to 8 are not available anymore.

Grimage-9 and Grimage-10 which are hosted in the F212 room are still available.

2015-08

A Nvidia Tesla K40c GPU is installed in grimage-9. Grimage-10 now has 2 Geforce GTX295 = 4 GPUs.

Machines

Machine CPU RAM GPU Network Other
grimage-1.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR31x GTX-680 (1GPU)IB DDRKeyboard/Mouse/Screen attached (4/3 screen, on the left, same as grimage-7)
grimage-2.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR3IB DDR + 1x 10GE (DualPort)2x Camera (firewire)
grimage-3.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR31x GTX-680 (1GPU)IB DDRKeyboard/Mouse/Screen attached (16/9 screen, on the right) + 2x cameras (firewire)
grimage-4.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR3IB DDR + 1x 10GE (DualPort)2x Camera (firewire)
grimage-5.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR3IB DDR + 2x 10GE (DualPort)2x Camera (firewire)
grimage-6.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR3IB DDR + 1x 10GE (DualPort)
grimage-7.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR31x GTX-580 (1GPU)IB DDR + 2x 10GE (DualPort)Keyboard/Mouse/Screen attached (4/3 screen, on the left, same as grimage-1)
grimage-8.grenoble.grid5000.fr2x Intel Xeon E5530 (16 cores)12GB DDR3IB DDR + 1x 10GE (DualPort)
grimage-9.grenoble.grid5000.fr2x Intel Xeon E5620 (16 cores)24GB DDR31x Tesla K40cIB DDR
grimage-10.grenoble.grid5000.fr2x Intel Xeon E5620 (16 cores)24GB DDR32x GTX-295 (4GPU)IB DDR

Some references to the models of GPUs:

How to experiment

Grimage 10GE network

The default system of the grimage node is design to operate the Grimage VR room.

Using kadeploy is required to adapt the system to other needs (if the default system is not sufficient).

Privileged commands

Currently, the following commands can be run via sudo in exclusive jobs:

  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /sbin/reboot
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/nvidia-smi

What is x2x and how to use it

This tip is useful for people that have to work in the Grimage room, with a screen attached to a Grimage machine.

x2x allows to control the mouse pointer and keyboard input of a remote machine over the network (X11 protocol). In the case of the Grimage nodes which have a screen attached, it is very practical because it allows to not use the USB mouse and keyboard, which are sometime buggy (because of the out of norm USB cable extension).

To use x2x:

  1. login locally on the machine (gdm)
  2. run xhost + to allow remote X connections.
  3. from you workstation: run
ssh pneyron@grimage-1.grenoble.g5k -X x2x -to grimage-1:0 -west
NB
  • replace pneyron by your username
  • replace the 2 occurences of grimage-1 by the name of the Grimage node you actually use.
  • make sure you get the ssh configuration to get the *.g5k trick to work (see the tip above)

10GE network setup

  • One Myricom dual port card is installed on each of grimage-{4,5,7,8}
  • One Intel dual port card is installed on each of grimage-{2,5,6,7}

Connexions are point to point (NIC to NIC, no switch) as follows:

  • Myricom: grimage-7 <-> grimage-8 <-> grimage-4 <-> grimage-5
  • Intel: grimage-2 <=> grimage-5 et grimage-6 <=> grimage-7 (double links)


Crash course

To book a single grimage machine, grimage-10 for instance.

Normal (no deploy) jobs

book the machine for 4 hours, to run a script
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" /path/to/my/script
book the machine for 4 hours, for an interactive session
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" -I
get an exclusive access
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=4 -p "host like 'grimage-10.%'" -t exclusive /path/to/my/script


Deploy jobs

For changing the OS (install specific software, newer versions. etc)

book the machine for 2 hours, for changing the OS
[pneyron@digitalis ~]$ oarsub -l node=1,walltime=2 -p "host like 'grimage-10.%'" -t redeploy "sleep 4h"
Look at the jobs
[pneyron@digitalis ~]$ chandler
deploy
[pneyron@digitalis ~]$ kadeploy3 -m grimage-10.grenoble.grid5000.fr -e idALL-default -u root -k
get access to the unix console of the machine (mostly useful for low level debuging, e.g. seeing boot/kernel messages)
[pneyron@digitalis ~]$ kaconsile -m grimage-10.grenoble.grid5000.fr
ssh to the machine as root
[pneyron@digitalis ~]$ ssh grimage-10 -l root
modify the operating system of the machine
...
save the modifications
root@grimage-10:~# tgz-g5k /tmp/idALL-my.tgz
then from the frontend
[pneyron@digitalis ~]$ scp root@grimage-10:/tmp/idALL-my.tgz ~/
  • create the environment file
[pneyron@digitalis ~]$ cat <<EOF > idALL-my.env
---
name: idALL-my
version: 20150925
description: my env
author: ...
visibility: shared
destructive: false
os: linux
image:
  file: $HOME/idALL-my.tgz
  kind: tar
  compression: gzip
postinstalls:
- archive: /var/lib/deploy/environments/postinstalls/idall-default-postinst.tgz-20141216-1
  compression: gzip
  script: run /rambin
boot:
  kernel: /vmlinuz
  initrd: /initrd.img
filesystem: ext4
partition_type: 131
multipart: false
then, instead of starting over for next deployments, do
[pneyron@digitalis ~]$ kadeploy3 -m grimage-10.grenoble.grid5000.fr -a ~/idALL-my.env -k

System changelog

2015-03

Grimage-1 to grimage-8 are out of service (machine room shutdown). Grimage-9 and -10 which are not located in the grimage machine room are still operational.

2015-01-25

Grimage machines now use the idALL system environment (same as idgraf and idfreeze).

Acknolegment

The grimage machines were funded by the Grimage project.

Personal tools
platforms