
IPMI is the Intelligent Platform Management Interface which is an open standard that is integrated into most systems by hardware manufacturers. IPMI can provide real time hardware information including system temperatures, voltages, fan speeds and hardware status, along with identification of the types of hardware installed within the system.
Using IPMI is simple. Once installed for your distribution, command line options can be passed to the ipmitool executable to obtain system information. One of the most useful uses of IPMI is to see system error messages that have occurred on the hardware . If you have ever used any Dell servers, IPMI will be able to tell you in detail what has caused the orange flashing light on your server. Some of the more useful commands are:
$ ipmitool sel list
This will provide a system event log similar to the following:
1 | 01/15/2008 | 11:37:43 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted 2 | Pre-Init Time-stamp | Power Supply #0x65 | Failure detected | Asserted 3 | Pre-Init Time-stamp | Power Supply #0x65 | Power Supply AC lost | Asserted 4 | 01/22/2010 | 14:27:29 | Physical Security #0x73 | General Chassis intrusion | Asserted 5 | 01/22/2010 | 14:31:18 | Physical Security #0x73 | General Chassis intrusion | Deasserted
In the above output you can see that power was lost to one power supply and at some point the cover was removed from the server. You will see other useful information in these logs such as if you are having errors with a specific DIMM:
5e | 04/06/2011 | 17:23:21 | Memory #0x1b | Transition to Non-critical from OK 5f | 04/06/2011 | 17:24:55 | Memory #0x1b | Transition to Critical from less severe
System temperature issues will also show up as well:
5c | Pre-Init Time-stamp | Temperature #0x76 | State Asserted 5d | 01/07/2011 | 15:13:19 | Temperature #0x76 | State Asserted
Over time this list will become very large, and if you fixed the error while the system is running you may have to manually clear the system log to make the flashing orange light go away. You can do this by running:
$ ipmitool sel clear
It will report the following when cleared:
Clearing SEL. Please allow a few seconds to erase.
IPMI can also provide real time system information by running:
$ ipmitool sdr
A list similar to the following will be provided:
Temp | -55 degrees C | ok Temp | -58 degrees C | ok Temp | 40 degrees C | ok Temp | 40 degrees C | ok Ambient Temp | 26 degrees C | ok CMOS Battery | 0x00 | ok ROMB Battery | 0x00 | ok VCORE | 0x01 | ok VCORE | 0x01 | ok CPU VTT | 0x01 | ok 1.5V PG | 0x01 | ok 1.8V PG | 0x01 | ok 3.3V PG | 0x01 | ok 5V PG | 0x01 | ok 1.5V PXH PG | 0x01 | ok 5V Riser PG | 0x01 | ok Backplane PG | 0x01 | ok Linear PG | 0x01 | ok 0.9V PG | 0x01 | ok 0.9V Over Volt | 0x01 | ok CPU Power Fault | 0x01 | ok FAN MOD 1A RPM | 7575 RPM | ok FAN MOD 1B RPM | 7650 RPM | ok …
If there are any specific hardware failures, you will be able to see them in the the above output. A simpler summary of the hardware status is also available with:
$ ipmitool chassis status System Power : on Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-off Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false Sleep Button Disable : not allowed Diag Button Disable : allowed Reset Button Disable : not allowed Power Button Disable : allowed Sleep Button Disabled: false Diag Button Disabled : true Reset Button Disabled: false Power Button Disabled: false
