This article explains the Dell implementation of new power management and Error recovery features which are introduced with Windows Server 2012.  These features are enabled on Dell 12 Generation PowerEdge platforms and were available at the Dell Windows Server 2012 release.

Power Management Features

Collaborative processor performance counters (CPPC)

The rising cost of power consumption in datacenters is becoming a concern for most organizations. Platforms and Operating systems both provide their own power management solutions. Dell Active Power Controller (DAPC) is a default enabled hardware based power management solution on Dell PowerEdge servers. 

In Windows Server 2012 Microsoft has identified the need for a CPPC solution which can optimize the power efficiency for the platform. This resulted into a Collaborative processor performance model via ACPI 5.0 specification. Dell has implemented this in firmware as ‘Collaborative CPU Performance control’. We will refer this feature as collaborative DAPC.

How does collaborative DAPC works:                                             

  • When collaborative DAPC is enabled on Dell platforms, during boot process the platform will detect whether Operating System supports CPPC or not.
  • The Operating System calculates the required performance for each Logical Processor and sends it to the platform using ACPI interface.
  • The platform reads the information from operating system and manages the power controls on the hardware components to deliver the requested performance. The resulted performance is then reported back to the operating system.

 

Customer can enable collaborative DAPC in BIOS by selecting custom under System Profile settings (see Figure 1).

DAPC (System Demand Based Power Management) is still the default setting for power management on 12thGeneration platforms.

 

Logical Processor Idling (LPI)

LPI is a collaborative interface between platform and Operating System that helps to improve the energy efficiency of a system. This feature is required in those cases where customer has a need to go for power budgeting. It uses Operating Systems core parking algorithm and parks some of the logical processors in the system which in turn lets the corresponding processor cores transition into a lower power idle state. If LPI is getting used instead of throttling^ we will get better performances when power budgeting is used.

 

Dell implemented LPI by enabling _PUR method in BIOS as per ACPI 4.0 specification. When LPI is enabled and power consumption hits the budget threshold, firmware sends request to Operating System with a number of processors to be sent to idle state based on the configured power budget. This request is given as input to Operating System core parking engine to park as many core as possible requested by LPI .

 

All Dell 12th Generation servers have support for this feature in current firmware. LPI is not enabled by default; you need to enable it under Processor settings in BIOS as shown in figure 2.

Note: You can use Dell iDRAC for configuring Power budgeting for your servers.

  

Error Recovery Features

A memory error in a production environment may lead to a system crash resulting into downtime of mission critical business applications.

A correctable error will be reported by hardware to operating system and it will be corrected and will not lead any impact on running application. Uncorrectable errors are handled by the Operating System to contain the impact. In a virtualized environment this is equivalent to restricting the impact within the affected virtual machine and not allowing it to impact other virtual machines or the host system.

Windows Server 2012 has improved error recovery to minimize and handle these memory errors more efficiently. Dell has done firmware enablement on 12G PowerEdge servers* to support these features.

Consumed Memory Error Recovery

Processor manufacturers have introduced a new capability in their latest processors (as an example Intel Sandy Bridge EP4P processor) which can enable the Operating System to recover from a hardware memory error that is consumed by processor. Any uncorrectable memory consumed, will be detected by processor and reported to operating system. Dell has enabled this feature in the BIOS to report the affected memory address to OS.

Microsoft has extended its MCA Error recovery support to Hyper-V in addition to Host Operating System. Hyper-V MCA Error recovery leverages the current WHEA architecture and capabilities of the host Operating System. When an uncorrectable error is detected, the processor interrupts the Hyper-V and passes the address of the faulty memory location. Hyper-V will identify the impacted virtual machine and restart it but the underlying host operating System and other virtual machines have no impact and continue running. This ensures that the host system remains active and other virtual machines remain running. In previous generation of Microsoft Operating System this error would have brought down the whole system.

Dell engineering team validated this feature by providing interface to inject the error (SET_ERROR_TYPE_WITH_ADDRESS) using (Error Injection table ACPI 5.0)

  

A complete process of error recovery is shown in the figure3 and error reporting by operating system is shown in Figure4.

Figure3: Error Recovery flow

Figure4: Error reporting by operating system

 

 Recovery Action in case of consumed memory error:

  • User Mode:
    • Terminate the process
    • Take memory offline
  • Kernel Mode:
    • Bugcheck System or virtual machine

 

*Logical processor idling feature is available in current BIOS

 

**Feature available on Dell’s 12th Generation 4 socket platforms with Intel Sandy Bridge EP4P processor. Few of the supported platforms are Dell PowerEdge R820, Dell PowerEdge M820.

 

^CPU throttling is a technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted "on the fly," either to conserve power or to reduce the amount of heat generated by the chip.