Usage

From Digitalis

Revision as of 12:53, 22 March 2012 by Neyron (Talk | contribs)
Jump to: navigation, search

Contents

Overview

Technically speeking, the Digitalis platform is composed of the hardware machines described below. Some of them are managed by the Grid'5000 team (national service), some of them are managed locally.

This page decribes how to use the locally managed machines.

Hardware description

Grid'5000 Grenoble clusters

Grenoble Grid'5000 site is composed of 3 clusters (as of 2012-03): genepi, edel and adonis. More information can be found on Grid'5000 Grenoble site pages. Those machines are handles by the Grid'5000 global (national) system. One must then refer to the Grid'5000 documentation to know how to use them. The remaining of this page is mostly not relevant to those clusters.

Grimage cluster

The Grimage cluster was originally dedicated to connect the Grimage platform hardware (cameras, etc) and process its data (videos captures, etc). More recently, 10GE ethernet cards were added to some nodes for a project, making the cluster a mutualized platform. Currently, at least 4 projects are using the cluster, requiring the resource management system and deployment system adapted to a experimental platform, just like Grid'5000.

Special machines

Those machines are resources co-funded by several teams in order to provide experimental platforms for problems such as:

  • large and complex SMP configurations
  • complex processor/cache architecture analysis
  • multi-GPU configurations
  • etc

Currenlty the following machines are available

idfreeze

FIXME: idfreeeze is not yet integrated to the platform

idgraf

  • 2x Intel Xeon X5650 (Westmere, 6 cores each)
  • 72 GB DDR3 RAM
  • 8x Nvidia Tesla C2050

Services

Dedicated services

Dedicated services are provided for the management of our machines. Indeed, our machines couldn't fit in Grid'5000 model, due to their special characteristics and usage: The Grimage cluster is special in the fact that it operates the Grimage platform with cameras and other equipments attached, making it's hardware configuration different. Other local machines are special in the fact that they are unique resources, which make their model of usage very different from the one of a cluster of many identical machines as found with Grid'5000 clusters.

As a result, a dedicated resource management system (OAR) is provided to manage the access to the machines, with special mechanics (different from the ones provided in Grid'5000). A dedicated deployment system (kadeploy) is also provided to handle user's customized operating systems that can be deployed on the machines. Even if different from the main Grid'5000 tools, many of the documented information for the Grid'5000 tools also apply to our dedicated services. This document actually only explains their specificities.

OAR and Kadeploy frontend for our machines is the machine named mu.grenoble.grid5000.fr.

Mutualised services (services provided by Grid'5000)

Many services we use on our local machines are provided by the Grid'5000 infrastructure, from a national perspective. For instance, the following services are provided for Grid'5000 but also serve our local purposes (by courtasy) :

  • access machines
  • NFS storage
  • proxying
  • and more.

Please mind the fact that all services are not dedicated to our local needs.

Terms of service

Grid'5000 services are handled nationaly for the global platform (11 sites, France-wide). As a result, some aspects may seam more complex than the should from a local perspective. Please mind the fact that some services are not for our local conveniance only. Furthermore, the local platform is to be seen as an extension to the main Grid'5000 platform, that is not supported by the Grid'5000 staff, even if we can freely benefit from some services they provide.

As a result, we are subject to rules edicted by the Grid'5000 platform:

  • Security policies: restricted access to the network, output traffic filtering).
  • Maintenance schedules: Thursday is the maintenance day, do not be surprised if interruption of services happen on that day !
  • Rules of good behavior within the large Grid'5000 user community (reading the mailing lists is a most)

If one is using the "official" Grid'5000 nodes, one must comply to the UserCharter (as approved by every user when requesting a Grid'5000 account)

Data integrity

There is not guaranty provided against data loss on the Grid'5000 NFS (home directories), nor on machines local hard drives. No backup is performed, so in case of an incident, the Grid'5000 staff will not be able to provide you any way to get back any data.

As a result, if you have data you really care about, and cannot reproduce with an acceptable cost (time of computation) with regard to risks of data loss (which rarely happens), it is strongly suggested you back them up elsewhere.

(NFS storages uses RAID to overcome a disk failure, but RAID is not backup)

Platform usage

Machine access

Access to the machine is controller by the resource manager. This means that users cannot just ssh to a machine and have processes indefinitely running on them (e.g. vi process).

Any user must instead book the machine for a period of time, during which access will be granted to him, maybe with some other privileges (depending on the requested type of job). Once the period of time is ended, all rights are revoked, and all processes of the user are killed.

By default users are not root on the machines. Some privileged commands may however be permitted (e.g. schedtool). By default access to a machine is not exclusive, which means that many users can have processes at a same time, except if a user requested an exclusive access.

Special usages also require full access to the machine: one want to be root, to be able de reboot the machine, or even to be able to install software or a different operating system. Just like on Grid'5000, this is possible, at the cost of the use of kadeploy.

Use cases

I want to access a machine

I want to gain exclusive access to a machine for H hours

I want to be able to reboot the machine

I want to change the system (OS, software) on the machine

I want to book the machine for next night

Tips and tricks

I want to ssh directly from my workstation

I want my code to be pushed automatically to the machine

Resource usage visualization tools

2 tools are available to see how resources are or will be used:

chandler

Chandler is command line tool, to run on mu. It gives a view of the current usage of the machines.

pneyron@mu:~$ chandler

4 jobs, 92 resources, 60 used
         grimage-1 	TTTTTTTT grimage-2 	TTTTTTTT grimage-3 	
TTTTTTTT grimage-4 	TTTTTTTT grimage-5 	         grimage-6 	
         grimage-7 	JJJJJJJJ grimage-8 	JJJJJJJJ grimage-9 	
         grimage-10 	TTTTTTTTTTTT idgraf 	

 =Free  =Standby J=Exclusive job T=Timesharing job S=Suspected A=Absent D=Dead

grimage-2.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-3.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-4.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-5.grenoble.grid5000.fr
  [1101] eamat (shared)

grimage-8.grenoble.grid5000.fr
  [1115] pneyron (reboot)

grimage-9.grenoble.grid5000.fr
  [1115] pneyron (reboot)

idgraf.grenoble.grid5000.fr
  [1113] jvlima (shared)
  [1114] pneyron (shared)

Drawgantt

Drawgantt give a view of the past, current and future usage of the machines.

https://helpdesk.grid5000.fr/oar/grenoble/digitalis/drawgantt.html

Technical contact

Jabber

Mailing lists

Personal tools
platforms