Idphix

From Digitalis

Revision as of 15:14, 19 February 2014 by Neyron (Talk | contribs)
Jump to: navigation, search

This page describes the first steps to play with the Xeon Phi installed in the Digitalis platform in idphix.

Contents

Overview

A Xeon Phi 5110P is installed in the idphix machine

The setup is the following:

We use an internal network on the host (idphix) for the Xeon Phi: 192.168.1.0/24.

This gives the following scheme:

                          =======================================
                        | -------------             -----------  |
                        ||    host     |           |  Xeon Phi | |
                        ||   idphix    |           |idphix-mic0| |
Grid5000 network <---> eth0           mic0 <----> mic0         | |
                    172.16.21.6 192.168.1.254  192.168.1.1     | |
                        ||             |           |           | |
                        | -------------             -----------  |
                          =======================================


How to experiment ?

For now (2013-09-13) the operating system is a Centos 6 (OS officially supported by Intel for the Xeon Phi).

Use the machine as follows:

Connect to digitalis/grid5000

workstation$ ssh digitalis.grenoble.g5k

In case of any issue (network issue, connection hanging...) please read more info here or ask on the Digitalis mailing list.

Reserve the machine/run a job

digitalis$ oarsub -p "machine='idphix'" -l walltime=2 -I

Setup your access to the Xeon Phi

(This might not be relevant if your are using some level of abstraction (COI, ...)

WARNING
there is currently a bug due to the last version of Intel MPSS, please report any issue to the mailing list.

Make sure you created your G5K ssh keys (keys without passephrase protection, dedicated to G5K and stored in your NFS home directory)

Run:

idphix$ sudo /usr/local/bin/mic-setup-my-user

If you get the error:

Error: Cannot get your SSH keys, please run 'chmod 755 ~/.ssh && chmod 644 ~/.ssh/{authorized_keys,id_*.pub}'

Please run:

idphix$ chmod 755 ~/.ssh && chmod 644 ~/.ssh/{authorized_keys,id_*.pub}

Output should be:

idphix$ sudo /usr/local/bin/mic-setup-my-user 
*** Configure user pneyron:users (10106:8000) on mic0
*** Copy user files to mic0
id_rsa.pub                                    100%  420     0.4KB/s   00:00    
authorized_keys                               100%  916     0.9KB/s   00:00    
.profile                                      100%   62     0.1KB/s   00:00    
*** Force creation of pneyron:users on mic0 and fix file ownership and permission
Done

You shoud now be able to ssh to the coprocessor:

idphix$ ssh mic0
idphix-mic0$ uname -a
Linux idphix-mic0.grenoble.grid5000.fr 2.6.38.8-g5f2543d #2 SMP Tue Apr 30 14:05:06 PDT 2013 k1om k1om k1om GNU/Linux

Intel compiler, compilation for the MIC

Intel compiler (icc & co) is installed in /opt/intel /applis/digitalis/intel.

It is configured to use Inria's license server (FlexLM).

This licence server provides 2 tokens which are shared among all Inria's users. Only 2 compilations can be performed at a time.

If icc complains about the license (error message talking about FlexLM), please report the issue to digitalis@lists.grid5000.fr

Rebooting the MIC

If needed, you can reboot the MIC from within a exclusive job (doing so in shared job session would risk annoying other users).

oarsub -t exclusive -p "machine='idphix'" ....

Then, you will be able to run

sudo /usr/local/bin/mic-reboot

(no password should be requested).

Rebooting the MIC takes some time. You can look at the output of dmesg or miccrtl -s meanwhile.

Data on the mic: no persistence

Please mind the fact that data pushed to the mic are not persistent. Your home directory on idphix-mic0 could be emptied for various reasons.

System changelogs

2013-09-26

System image
idphix-default 20130926
  • Centos 6
  • Linux 2.6.32-358.el6.x86_64
  • MPSS packages
intel-mic-flash-2.1.386-2.el6.x86_64
intel-mic-kmod-2.1.6720-13.2.6.32.358.el6.x86_64
intel-mic-sysmgmt-2.1.6720-13.el6.x86_64
intel-mic-gpl-2.1.6720-13.el6.x86_64
intel-mic-mpm-2.1.6720-13.el6.x86_64
intel-mic-2.1.6720-13.el6.x86_64
intel-mic-micmgmt-2.1.6720-13.2.6.32.358.el6.x86_64
intel-mic-perf-data-2.1.6720-13.el6.x86_64
intel-mic-gdb-2.1.6720-13.el6.x86_64
intel-mic-cdt-2.1.6720-13.el6.x86_64
intel-mic-perf-2.1.6720-13.el6.x86_64
  • Intel Cluster Studio XE 2013
  • MIC info
[root@idphix ~]# /opt/intel/mic/bin/micinfo 
MicInfo Utility Log

Created Mon Dec  9 20:13:37 2013


	System Info
		HOST OS			: Linux
		OS Version		: 2.6.32-358.el6.x86_64
		Driver Version		: 6720-13
		MPSS Version		: 2.1.6720-13
		Host Physical Memory	: 66077 MB

Device No: 0, Device Name: mic0

	Version
		Flash Version 		 : 2.1.02.0386
		SMC Firmware Version	 : 1.14.4616
		SMC Boot Loader Version	 : 1.8.4326
		uOS Version 		 : 2.6.38.8-g5f2543d
		Device Serial Number 	 : ADKC32100375

	Board
		Vendor ID 		 : 0x8086
		Device ID 		 : 0x2250
		Subsystem ID 		 : 0x2500
		Coprocessor Stepping ID	 : 3
		PCIe Width 		 : x16
		PCIe Speed 		 : 5 GT/s
		PCIe Max payload size	 : 256 bytes
		PCIe Max read req size	 : 512 bytes
		Coprocessor Model	 : 0x01
		Coprocessor Model Ext	 : 0x00
		Coprocessor Type	 : 0x00
		Coprocessor Family	 : 0x0b
		Coprocessor Family Ext	 : 0x00
		Coprocessor Stepping 	 : B1
		Board SKU 		 : INVALID SKU
		ECC Mode 		 : Enabled
		SMC HW Revision 	 : Product 225W Passive CS

	Cores
		Total No of Active Cores : 60
		Voltage 		 : 985000 uV
		Frequency		 : 1052631 kHz

	Thermal
		Fan Speed Control 	 : N/A
		Fan RPM 		 : N/A
		Fan PWM 		 : N/A
		Die Temp		 : 35 C

	GDDR
		GDDR Vendor		 : Elpida
		GDDR Version		 : 0x1
		GDDR Density		 : 2048 Mb
		GDDR Size		 : 7936 MB
		GDDR Technology		 : GDDR5 
		GDDR Speed		 : 5.000000 GT/s 
		GDDR Frequency		 : 2500000 kHz
		GDDR Voltage		 : 1501000 uV

2013-12-09

New system image
idphix-default 20131209
  • Updated Centos packages, except linux kernel
  • Moved Intel Cluster Studio XE 2013 to NFS /applis/digitalis/intel
  • No change to the MIC software (MPSS)

2013-12-13

New system image
idphix-default 20131213
  • Activated Hyperthreading
[root@idphix ~]# lstopo --no-caches
Machine (63GB) + Socket L#0
  Core L#0
    PU L#0 (P#0)
    PU L#1 (P#8)
  Core L#1
    PU L#2 (P#1)
    PU L#3 (P#9)
  Core L#2
    PU L#4 (P#2)
    PU L#5 (P#10)
  Core L#3
    PU L#6 (P#3)
    PU L#7 (P#11)
  Core L#4
    PU L#8 (P#4)
    PU L#9 (P#12)
  Core L#5
    PU L#10 (P#5)
    PU L#11 (P#13)
  Core L#6
    PU L#12 (P#6)
    PU L#13 (P#14)
  Core L#7
    PU L#14 (P#7)
    PU L#15 (P#15)
  • Updated MPSS 2.1 -> 3.1.1
[root@idphix ~]# rpm -qa | grep -e intel -e mpss
mpss-miccheck-3.1.1-r1.glibc2.12.2.x86_64
mpss-sysmgmt-python-3.1.1-1.glibc2.12.2.x86_64
mpss-sciftutorials-3.1.1-1.glibc2.12.2.x86_64
mpss-eclipse-cdt-mpm-3.1.1-1.glibc2.12.2.x86_64
mpss-myo-doc-3.1.1-1.glibc2.12.2.x86_64
mpss-coi-doc-3.1.1-1.glibc2.12.2.x86_64
mpss-flash-3.1.1-1.glibc2.12.2.x86_64
mpss-myo-3.1.1-1.glibc2.12.2.x86_64
mpss-metadata-3.1.1-1.glibc2.12.2.x86_64
mpss-myo-dev-3.1.1-1.glibc2.12.2.x86_64
intel-composerxe-compat-k1om-3.1.1-1.x86_64
mpss-micmgmt-doc-3.1.1-1.glibc2.12.2.x86_64
mpss-sysmgmt-micsmc-gui-3.1.1-1.glibc2.12.2.x86_64
mpss-miccheck-bin-3.1.1-r1.glibc2.12.2.x86_64
mpss-sysmgmt-micras-3.1.1-1.glibc2.12.2.x86_64
mpss-modules-2.6.32-358.el6.x86_64-3.1.1-1.el6.x86_64
mpss-modules-dev-2.6.32-358.el6.x86_64-3.1.1-1.el6.x86_64
mpss-sciftutorials-doc-3.1.1-1.glibc2.12.2.x86_64
mpss-license-3.1.1-1.glibc2.12.2.x86_64
mpss-rasmm-kernel-3.1.1-1.glibc2.12.2.x86_64
mpss-micmgmt-python-3.1.1-1.glibc2.12.2.x86_64
mpss-coi-3.1.1-1.glibc2.12.2.x86_64
mpss-modules-headers-3.1.1-1.glibc2.12.2.x86_64
mpss-sdk-k1om-3.1.1-1.x86_64
mpss-metadata-dev-3.1.1-1.glibc2.12.2.x86_64
mpss-daemon-dev-3.1.1-1.glibc2.12.2.x86_64
mpss-coi-dev-3.1.1-1.glibc2.12.2.x86_64
mpss-micmgmt-3.1.1-1.glibc2.12.2.x86_64
mpss-mpm-doc-3.1.1-1.glibc2.12.2.x86_64
mpss-mpm-3.1.1-1.glibc2.12.2.x86_64
mpss-boot-files-3.1.1-1.glibc2.12.2.x86_64
mpss-daemon-3.1.1-1.glibc2.12.2.x86_64
mpss-modules-headers-dev-3.1.1-1.glibc2.12.2.x86_64
  • Updated the MIC SMC and Flash (part of the MPSS upgrade)
[root@idphix ~]# micinfo 
MicInfo Utility Log

Created Fri Dec 13 19:00:30 2013


	System Info
		HOST OS			: Linux
		OS Version		: 2.6.32-358.el6.x86_64
		Driver Version		: 3.1.1-1
		MPSS Version		: 3.1.1
		Host Physical Memory	: 66076 MB

Device No: 0, Device Name: mic0

	Version
		Flash Version 		 : 2.1.02.0390
		SMC Firmware Version	 : 1.16.5078
		SMC Boot Loader Version	 : 1.8.4326
		uOS Version 		 : 2.6.38.8+mpss3.1.1
		Device Serial Number 	 : ADKC32100375

	Board
		Vendor ID 		 : 0x8086
		Device ID 		 : 0x2250
		Subsystem ID 		 : 0x2500
		Coprocessor Stepping ID	 : 3
		PCIe Width 		 : x16
		PCIe Speed 		 : 5 GT/s
		PCIe Max payload size	 : 256 bytes
		PCIe Max read req size	 : 512 bytes
		Coprocessor Model	 : 0x01
		Coprocessor Model Ext	 : 0x00
		Coprocessor Type	 : 0x00
		Coprocessor Family	 : 0x0b
		Coprocessor Family Ext	 : 0x00
		Coprocessor Stepping 	 : B1
		Board SKU 		 : B1PRQ-5110P/5120D
		ECC Mode 		 : Enabled
		SMC HW Revision 	 : Product 225W Passive CS

	Cores
		Total No of Active Cores : 60
		Voltage 		 : 1021000 uV
		Frequency		 : 1052631 kHz

	Thermal
		Fan Speed Control 	 : N/A
		Fan RPM 		 : N/A
		Fan PWM 		 : N/A
		Die Temp		 : 36 C

	GDDR
		GDDR Vendor		 : Elpida
		GDDR Version		 : 0x1
		GDDR Density		 : 2048 Mb
		GDDR Size		 : 7936 MB
		GDDR Technology		 : GDDR5 
		GDDR Speed		 : 5.000000 GT/s 
		GDDR Frequency		 : 2500000 kHz
		GDDR Voltage		 : 1501000 uV

2014-02-17

Installed Intel Cluster Studio XE 2013.1 in /grid5000/software/. It comes with icc 14.

Vtune needs more work to be fully functional.

2014-02-19

New system image
idphix-default 20140219
  • Fixed mic-setup-my-user which was broken since the upgrade to MPSS 3.1
  • Added mount point for /grid5000
  • Changed /opt/intel link to /grid5000/software/intel, which as of 2014-02-17 links to Intel Cluster Studio XE 2013.1
  • Added packages:
    • emacs-common-23.1-25.el6.x86_64
    • emacs-nox-23.1-25.el6.x86_64
    • autoconf-2.63-5.1.el6.noarch
    • automake-1.11.1-4.el6.noarch
    • automake16-1.6.3-18.el6.1.noarch
    • automake15-1.5-27.el6.1.noarch
    • automake14-1.4p6-19.2.el6.noarch
    • autoconf213-2.13-20.1.el6.noarch
    • autoconf-archive-2012.09.08-1.el6.noarch
    • tcl-8.5.7-6.el6.x86_64
    • expect-5.44.1.15-5.el6_4.x86_64
    • dejagnu-1.4.4-17.el6.noarch
    • tig-0.17-1.el6.x86_64

References

Personal tools
platforms