بررسی کامل ابزار بوت سیستم عامل HPE NS204i OS Boot

ابزار HPE NS204i OS Boot یک دستگاه ذخیره‌سازی بوت داخلی است که برای سرورهای HPE ProLiant طراحی شده و برای بوت سیستم‌عامل (OS) استفاده می‌شود. این ابزار مخصوصاً برای محیط‌هایی مفید است که نیاز به یک راهکار بوت سریع و قابل اعتماد دارند. در ادامه، ویژگی‌ها، کاربردها، مزایا و چالش‌های این ابزار را بررسی می‌کنیم.

مشخصات و ویژگی‌ها

  1. نوع ذخیره‌سازی:
    • HPE NS204i معمولاً از SSD های داخلی با سرعت بالا استفاده می‌کند که به طور اختصاصی برای بوت کردن سیستم‌عامل طراحی شده‌اند. این امر باعث می‌شود که زمان بوت سرور کاهش یابد و سیستم‌عامل با کارایی بهتری اجرا شود.
    • این ابزار به صورت ماژول M.2 در برخی از مدل‌ها عرضه می‌شود که معمولاً در اسلات‌های PCIe نصب می‌شود.
  2. حافظه اختصاصی برای بوت:
    • NS204i به طور خاص برای نگهداری و اجرای سیستم‌عامل‌های مجازی و سرورهای فیزیکی طراحی شده است. برخلاف روش‌های سنتی که از هارددیسک‌ها یا SSDهای اصلی استفاده می‌شود، NS204i به‌طور جداگانه تنها برای سیستم‌عامل استفاده می‌شود، که از تداخل در ذخیره‌سازی اصلی جلوگیری می‌کند.
  3. پشتیبانی از RAID:
    • این ابزار معمولاً از RAID 1 (Mirroring) پشتیبانی می‌کند که امنیت داده‌ها را برای بوت سیستم‌عامل تضمین می‌کند. در RAID 1، داده‌های سیستم‌عامل روی دو درایو آینه می‌شوند، به این معنی که اگر یکی از درایوها از کار بیافتد، دیگری به کار خود ادامه می‌دهد و مانع از خرابی سیستم می‌شود.
  4. طراحی فشرده و بهینه‌شده برای سرورهای HPE:
    • NS204i فضای بسیار کمی اشغال می‌کند، چرا که به عنوان یک ماژول کوچک درون سرور قرار می‌گیرد. این طراحی به بهینه‌سازی فضای داخلی سرور کمک کرده و از اسلات‌های ارزشمند برای دیگر استفاده‌ها محافظت می‌کند.
  5. پشتیبانی از سیستم‌عامل‌های مختلف:
    • این ابزار با اکثر سیستم‌عامل‌های سروری شامل Windows Server، Linux و همچنین برخی از سیستم‌های مجازی‌سازی مانند VMware ESXi سازگار است.

کاربردها

  • بوت سرورهای مجازی: HPE NS204i برای سرورهای مجازی‌سازی (مثل VMware) ایده‌آل است که نیاز به بوت سریع سیستم‌عامل و دسترسی مداوم به آن دارند.
  • سرورهای ابری و دیتاسنترها: در محیط‌های ابری و دیتاسنترها، بوت سریع و پایدار سیستم‌عامل بسیار مهم است. NS204i کمک می‌کند تا این نیاز برآورده شود.
  • راه‌اندازی سیستم‌های هایپروایزر: به عنوان یک دستگاه بوت پایدار، این ابزار می‌تواند برای نصب هایپروایزرهایی مانند Hyper-V یا XenServer استفاده شود.

مزایا

  1. افزایش سرعت بوت: SSDهای مورد استفاده در HPE NS204i، زمان بوت را به طور قابل توجهی کاهش می‌دهند. این امر به‌ویژه برای سرورهایی که نیاز به راه‌اندازی مکرر دارند، بسیار ارزشمند است.
  2. پایداری و امنیت داده‌ها: پشتیبانی از RAID 1 باعث می‌شود که در صورت خرابی یکی از درایوها، سیستم‌عامل بدون مشکل به کار خود ادامه دهد. این ویژگی بسیار مهمی برای محیط‌هایی است که نیاز به پایداری بالا دارند.
  3. افزایش بهره‌وری ذخیره‌سازی: استفاده از یک دستگاه بوت مجزا برای سیستم‌عامل، مانع از اشغال فضای ذخیره‌سازی اصلی سرور می‌شود و کارایی کلی سیستم را افزایش می‌دهد.
  4. طراحی بهینه برای سرورهای HPE: این ابزار به‌طور اختصاصی برای سرورهای HPE ProLiant طراحی شده و بهترین هماهنگی با این سرورها را دارد.
  5. کاهش مصرف انرژی و فضای فیزیکی: به دلیل استفاده از SSD و طراحی فشرده، این ابزار انرژی کمتری مصرف می‌کند و فضای کمتری اشغال می‌کند که برای دیتاسنترهای بزرگ که نیاز به بهینه‌سازی انرژی و فضا دارند، یک مزیت محسوب می‌شود.

چالش‌ها و معایب

  1. قابلیت استفاده محدود به سرورهای HPE: NS204i به‌طور خاص برای سرورهای HPE ProLiant طراحی شده است و ممکن است با سایر برندها یا مدل‌های سرور سازگار نباشد.
  2. ظرفیت محدود: با اینکه برای بوت سیستم‌عامل بهینه است، اما ظرفیت ذخیره‌سازی NS204i معمولاً به چند صد گیگابایت محدود است و برای استفاده به عنوان ذخیره‌ساز اصلی مناسب نیست.
  3. هزینه اضافی: با اینکه مزایای بسیاری دارد، اضافه کردن این ابزار به سرور ممکن است هزینه‌های اضافی به همراه داشته باشد، به خصوص در محیط‌هایی که نیاز به تعداد زیادی سرور دارند.
  4. نیاز به پشتیبانی فنی خاص: نصب و پیکربندی صحیح این ابزار نیازمند دانش فنی است و ممکن است به پشتیبانی خاص از طرف HPE نیاز داشته باشد.

نتیجه‌گیری

HPE NS204i OS Boot یک ابزار کارآمد و بهینه برای بوت سیستم‌عامل در سرورهای HPE ProLiant است. این ابزار با ارائه سرعت بالا، پایداری و امنیت در بوت سیستم‌عامل، مخصوصاً در محیط‌های سروری و مجازی‌سازی، به سازمان‌ها امکان مدیریت بهتر و بهینه‌تر سرورها را می‌دهد. با این حال، باید در نظر داشت که استفاده از آن به سرورهای HPE محدود است و نیاز به پیکربندی و پشتیبانی مناسب دارد.

 

 

DX320 Gen11 4LFF/DX360 Gen11 8SFF/DX360 Gen11 10NVMe/DX365 Gen11 10NVMe/DX380 Gen11 12LFF/DX380 Gen11 24SFF/DX385 Gen11 12LFF/DX4120 Gen11 24LFF

Last updated: 2023-11-21

Hypervisor Boot Drive Replacement

Overview

This document describes how to replace the hypervisor boot drive on HPE DX Gen11 platform.

Note:

  • For information on the required tools and supplies, see HPE Proliant documentation.

  • Contact HPE for failed components.

Warning:

  • All servicing must be done by a qualified service technician. During the procedure, wear a grounding wrist strap to avoid ESD damage to the component or system. Handle all components with care: place them on soft, static-free surfaces.
  • Coordinate service with the customer for any operation that involves the hypervisor host (ESXi, Hyper-V or AHV), virtual machines, or Nutanix software.
  • If you have locked down the cluster or hypervisor, you must enable ssh access again. See the Cluster Access Control topic in the Web Console Guide.

Hypervisor Boot Drive Failure

Every node contains a hypervisor boot drive. On HPE DX Gen11 platforms, NS204i-u is a universal installation hot plug OS boot device that includes two M.2 drives and auto-creates RAID 1 volume.

Figure. HPE NS204i-u Gen11 Boot Device

Click to enlargeHPE NS204i-u Gen11 Boot Device
  1. Drive carrier 1
  2. Universal Installation
  3. Bay 1 Fault LED
  4. Bay 2 Fault LED
  5. Drive carrier 2
Figure. Sample M.2 device used on HPE DX Gen11 platforms

Click to enlargeSample M.2 device

Failure indications

If a single drive on the HPE DX NS204i NVMe Boot Device has a problem, the Prism Web Console may display an alert to convey the same and the hypervisor may still be available with a degraded RAID1. More details about the drive can be obtained from the iLO web Interface of the node. Note the Bay Number of the drive to be replaced.

If both the drives on the HPE DX NS204i NVMe Boot Device have problems or the Boot Device itself has a problem, the hypervisor may become unavailable and will not restart.

The server video console might display messages synonymous to these:

  • No bootable devices detected.
  • Boot partition not found.
  • Reboot and Select proper Boot device.
  • Insert Boot Media in the selected Boot device and press a key.

Summary Overview

Hypervisor Boot Drive Replacement Overview

Before you begin

  • Ensure that the software and firmware versions on all the nodes of the cluster are as per the following table:
    Software/Firmware Version
    AOS 6.5.3.6 or higher
    Foundation 5.4.2 or higher
    Firmware SPP 2023.04.00.00 or higher
    NCC 4.6.6 or higher
  • Ensure the following for NS204i-u boot device (which is to be used for replacement):
    • The firmware version on the device is 1.2.14.1009 or higher, corresponding to SPP 2023.04.00.00.
    • The device contains two M.2 NVMe SSDs.
  • You might need to download hypervisor files from external web sites.
  • Follow the HPE Proliant documentation for all hardware procedures.
Note: IPv6 must be enabled on the customer’s network for the cluster to communicate with the node that is booted into Phoenix. If IPv6 is not enabled, contact Nutanix for support before you start.

About this task

This procedure provides the software steps for replacing a Hypervisor Boot Drive.

Procedure

  1. If the node is running, shut it down by following one of the hypervisor-specific procedures shown in Preparing for Hypervisor Boot Drive Replacement.
  2. Replace the hypervisor boot device by following Replacing the Hypervisor Boot Device.
  3. Verify if the node was previously removed from the cluster by following Verifying that the Node is Part of the Cluster.

    Note:

    • If you replaced both the drives on the HPE DX NS204i NVMe boot device in step 2, go to step 4.
    • If you replaced only a single drive on the HPE DX NS204i NVMe boot device or the boot device itself was replaced in step 2, start the node by following one of the hypervisor-specific procedures listed below. Refer to Starting a Node in a Cluster section under Node management corresponding to the respective environment. Then, skip to step 8.

      After replacing a single drive and starting the node, the boot device rebuilds the new drive automatically. Since the rebuild happens quickly, the status of RAID1 may be displayed as OK before you login to the iLO web interface to verify it.

  4. Begin the host boot disk repair by following Starting Host Boot Disk Repair (Failed Boot Device Procedure).
  5. Boot from phoenix by following Booting the Node from the Phoenix Image (HPE ProLiant DX Platform).
  6. Perform the host boot disk repair by following Finishing the Hypervisor Installation (Failed Boot Device Procedure).
  7. (vSphere only) Verify the CPU type and set the EVC mode by following Verifying and Setting the CPU Type (vSphere).
  8. Run NCC by following Node Startup Post-check.

Preparing to Replace a Hypervisor Boot Drive

Preparing for Hypervisor Boot Drive Replacement

Procedure

  1. Run the node shutdown prechecks described in Node Shutdown Prechecks.
  2. If you are replacing the node itself, make a note of any customized settings IPMI settings (IP, LDAP, SMTP, additional users, and so on.) These settings are not persistent, so you will need to re-apply them after replacing the node.
  3. Shut down the node by following the hypervisor-specific procedure.

Node Shutdown Prechecks

Check to make sure that there are no issues that might prevent the node from being shut down safely. (Even if the node does not need to be shutdown these checks are helpful.)

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Estimated time to complete: 5 minutes

Procedure

  1. In Prism, go to the Home page and make sure Data Resiliency Status displays a green OK.
  2. In Prism, go to the Health page and select Actions > Run NCC Checks.
  3. In the dialog box that appears, select All Checks and click Run.

    Alternatively, issue the following command from the CVM:

    nutanix@cvm$ ncc health_checks run_all

  4. If any checks fail, see the related KB article provided in the output and the Nutanix Cluster Check Guide: NCC Reference for information on resolving the issue.
  5. If you have any unresolvable failed checks, contact Nutanix Support before shutting down the node.
  6. Gather component details by running the NCC show_hardware_info command.
    nutanix@cvm$ ncc hardware_info show_hardware_info
  7. Save the output of the show_hardware_info command so that you can compare details when verifying the component replacement later.

Shutting Down a Node in a Cluster (vSphere Client)

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Caution: Verify the data resiliency status of your cluster. If the cluster only has replication factor 2 (RF2), you can only shut down one node for each cluster. If an RF2 cluster would have more than one node shut down, shut down the entire cluster.

Estimated time to complete: 15 minutes

Procedure

  1. Log on to vCenter with the vSphere Client.
    If vCenter is not available, log on to the ESXi host IP address with vSphere.
  2. If DRS is not enabled, manually migrate all the VMs except the Controller VM to another host in the cluster or shut down any VMs other than the Controller VM that you do not want to migrate to another host.
    If DRS is enabled on the cluster, you can skip this step.
  3. Right-click the host and select Maintenance Mode > Enter Maintenance Mode.
  4. In the Enter Maintenance Mode dialog box, click OK.
  5. In the Confirm Maintenance Mode dialog box, uncheck Move powered off and suspended virtual machines to other hosts in the cluster and click Yes.

    The host gets ready to go into maintenance mode, which prevents VMs from running on this host. DRS automatically attempts to migrate all the VMs to another host in the cluster.

    Note: If DRS is not enabled, manually migrate or shut down all the VMs excluding the Controller VM. The VMs that are not migrated automatically even when the DRS is enabled can be because of a configuration option in the VM that is not present on the target host.
  6. Turn on the chassis identifier lights on the front and back of the node that you want to remove, using one of the following options.
    • Use the IPMI / BMC web GUI interface of your platform to turn on the UID LEDs.
    • Log on to a Controller VM, and issue the following command.
      nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240

      Replace cvm_ip_addr with the Controller VM IP address for the node that you want to remove.

    • Log on to the hypervisor host (hypervisor_ip_addr) with SSH and issue the following command.
      root@ahv# /usr/bin/ipmitool chassis identify 240

      This command returns the following:

      Chassis identify interval: 240 seconds

  7. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with the customer before performing the step.

    Log on to the Controller VM with SSH and shut down the Controller VM.

    nutanix@cvm$ cvm_shutdown -P now
    Note: Do not reset or shutdown the Controller VM in any way other than the cvm_shutdown command to ensure that the cluster is aware that the Controller VM is unavailable
  8. After the Controller VM shuts down, wait for the host to go into maintenance mode.
  9. Right-click the host and select Shut Down.
    Wait until vCenter Server displays that the host is not responding, which may take several minutes. If you are logged on to the ESXi host rather than to vCenter Server, the vSphere client disconnects when the host shuts down.

Shutting Down a Node in a Cluster (vSphere Command Line)

Before you begin

If DRS is not enabled, manually migrate all the VMs except the Controller VM to another host in the cluster or shut down any VMs other than the Controller VM that you do not want to migrate to another host. If DRS is enabled on the cluster, you can skip this pre-requisite.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Caution: Verify the data resiliency status of your cluster. If the cluster only has replication factor 2 (RF2), you can only shut down one node for each cluster. If an RF2 cluster would have more than one node shut down, shut down the entire cluster.

Estimated time to complete: 10 minutes

You can put the ESXi host into maintenance mode and shut it down from the command line or by using the vSphere web client.

Procedure

  1. Turn on the chassis identifier lights on the front and back of the node that you want to remove, using one of the following options.
    • Use the IPMI / BMC web GUI interface of your platform to turn on the UID LEDs.
    • Log on to a Controller VM, and issue the following command.
      nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240

      Replace cvm_ip_addr with the Controller VM IP address for the node that you want to remove.

    • Log on to the hypervisor host (hypervisor_ip_addr) with SSH and issue the following command.
      root@esx# /ipmitool chassis identify 240

      This command returns the following:

      Chassis identify interval: 240 seconds
  2. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with the customer before performing the step.

    Log on to the Controller VM with SSH and shut down the Controller VM.

    nutanix@cvm$ cvm_shutdown -P now
  3. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2.
  4. Shut down the host.
    nutanix@cvm$ ~/serviceability/bin/esx-enter-maintenance-mode -s cvm_ip_addr

    If successful, this command returns no output. If it fails with a message like the following, VMs are probably still running on the host.

    CRITICAL esx-enter-maintenance-mode:42 Command vim-cmd hostsvc/maintenance_mode_enter failed with ret=-1

    Ensure that all VMs are shut down or moved to another host and try again before proceeding.

    nutanix@cvm$ ~/serviceability/bin/esx-shutdown -s cvm_ip_addr

    Replace cvm_ip_addr with the IP address of the Controller VM on the ESXi host (cvm_ip_addr from the worksheet).

    Alternatively, you can put the ESXi host into maintenance mode and shut it down using the vSphere Web Client.

    If the host shuts down, a message like the following is displayed.

    INFO esx-shutdown:67 Please verify if ESX was successfully shut down using ping hypervisor_ip_addr
  5. Confirm that the ESXi host has shut down.
    nutanix@cvm$ ping hypervisor_ip_addr

    Replace hypervisor_ip_addr with the IP address of the ESXi host.

    If no ping packets are answered, the ESXi host shuts down.

Shutting Down a Node in a Cluster (AHV)

Before you begin

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Estimated time to complete: 15 minutes

About this task

Caution: Verify the data resiliency status of your cluster. If the cluster only has replication factor 2 (RF2), you can only shut down one node for each cluster. If an RF2 cluster would have more than one node shut down, shut down the entire cluster.

You must shut down the Controller VM to shut down a node. When you shut down the Controller VM, you must put the node in maintenance mode.

When a host is in maintenance mode, VMs that can be migrated are moved from that host to other hosts in the cluster. After exiting maintenance mode, those VMs are returned to the original host, eliminating the need to manually move them.

If a host is put in maintenance mode, the following VMs are not migrated:

  • VMs with GPUs, CPU passthrough, PCI passthrough, and host affinity policies are not migrated to other hosts in the cluster. You can shut down such VMs by setting the non_migratable_vm_action parameter to acpi_shutdown. If you do not want to shut down these VMs for the duration of maintenance mode, you can set the non_migratable_vm_action parameter to block, or manually move these VMs to another host in the cluster.
  • Agent VMs are always shut down if you put a node in maintenance mode and are powered on again after exiting maintenance mode.

Perform the following procedure to shut down a node.

Procedure

  1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with the customer before performing the step.

    If the Controller VM is running, shut down the Controller VM.

    1. Log on to the Controller VM (cvm_ip_addr from the worksheet) with SSH.
    2. List all the hosts in the cluster.
      acli host.list

      Note value of Hypervisor address for the node you want to shut down.

    3. Put the node into maintenance mode.
      nutanix@cvm$ acli host.enter_maintenance_mode Hypervisor address [wait="{ true | false }" ] [non_migratable_vm_action="{ acpi_shutdown | block }" ]

      Replace Hypervisor address with either the IP address or host name of the AHV host you want to shut down.

      Set wait=true to wait for the host evacuation attempt to finish.

      Set non_migratable_vm_action=acpi_shutdown if you want to shut down VMs such as VMs with GPUs, CPU passthrough, PCI passthrough, and host affinity policies during the maintenance mode.

      If you do not want to shut down these VMs during maintenance mode, you can set the non_migratable_vm_action parameter to block, or manually move these VMs to another host in the cluster.

      If you set the non_migratable_vm_action parameter to block and the operation to put the host into the maintenance mode fails, exit the maintenance mode and then either manually migrate the VMs to another host or shut down the VMs by setting the non_migratable_vm_action parameter to acpi_shutdown.

    4. Shut down the Controller VM.
      nutanix@cvm$ cvm_shutdown -P now
  2. Log on to the AHV host with SSH.
  3. Shut down the host.
    root@ahv# shutdown -h now

Shutting Down a Node in a Cluster (Hyper-V)

Shut down a node in a Hyper-V cluster.

Before you begin

Shut down guest VMs that are running on the node, or move them to other nodes in the cluster.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Caution: Verify the data resiliency status of your cluster. If the cluster only has replication factor 2 (RF2), you can only shut down one node for each cluster. If an RF2 cluster would have more than one node shut down, shut down the entire cluster.

Estimated time to complete: 15 minutes

Perform the following procedure to shut down a node in a Hyper-V cluster.

Procedure

  1. Log on to the Hyper-V host with Remote Desktop Connection.
  2. Open the Failover Cluster Manager.
  3. Click on Nodes.
  4. Right-click on the node name.
  5. Under Pause click Drain Roles.

    Under Status the node will appear as Paused.

    Figure. Pause Node Options in Failover Cluster Manager

    Click to enlargeMenu shows Drain Roles and Do Not Drain Roles
  6. (Optional) At the bottom of the center pane click on the Roles tab. 
    Once all roles have moved off this node, it is safe to shut down the node.
  7. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with the customer before performing the step.

    Log on to the Controller VM with SSH and shut down the Controller VM.

    nutanix@cvm$ cvm_shutdown -P now

    Note:

    Always use the cvm_shutdown command to reset, or shutdown the Controller VM. The cvm_shutdown command notifies the cluster that the Controller VM is unavailable.

  8. Turn on the chassis identifier lights on the front and back of the node that you want to remove, using one of the following options.
    • Log on to the hypervisor host (hypervisor_ip_addr) using RDP or the IPMI console and issue the following command.
      > ipmiutil alarms -i 240

      This command returns the following:

      Setting ID LED to 240 …

  9. Log on to the Hyper-V host with Remote Desktop Connection and start PowerShell.
  10. Do one of the following to shut down the node.
    • > shutdown /s /t 0
    • Stop-Computer -ComputerName localhost

    See the Microsoft documentation for up-to-date and additional details about how to shut down a Hyper-V node.

Physically Replacing a Hypervisor Boot Drive

Replacing the Hypervisor Boot Device

About this task

Who is responsible: This procedure can be performed by any service technician that is qualified to service the HPE DX platform.

Estimated time to complete: 15 minutes

Procedure

  1. Locate the physical node for replacement. If you have not already illuminated the locator UID LED, use the IPMI / BMC web GUI interface of your platform to turn on the UID LEDs.
  2. Consult the appropriate manufacturer’s documentation for the server.
  3. Replace the drives (single drive or both the drives) or the HPE DX NS204i Boot Device.
    If a single drive fails, RAID will be in degraded state. After replacing the failing drive, RAID will be in OK state.
    Note: Nutanix recommends to use a new or clean boot drive for replacement.
  4. Re-assemble the server, place it in the rack, and cable the server in accordance with the manufacturer’s documentation.
  5. (Optional) You can turn off the UID LED now, using the button on the server, or turn off the UID LED later using the IPMI / BMC web GUI interface.

Verifying that the Node is Part of the Cluster

About this task

If you remove a node from the cluster using the web console or the node remove command, and then replace the node, then after you replace the node, you must expand the cluster. You must replace the node in the same position for all multiple-node blocks. For example, if the node was removed from position A in a block, the new node must be replaced in position A of the same block.

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Estimated time to complete: 5 minutes

Procedure

Log on to the web console and check if the node is part of the cluster under Hardware > Table.

  • If the node is part of the cluster, continue with the node replacement.
  • If the node is not part of the cluster, call Nutanix support.

Completing Hypervisor Boot Drive Replacement

Starting Host Boot Disk Repair (Failed Boot Device Procedure)

Boot devices can be an M.2 device or other boot device (sometimes called a host boot disk). If the hypervisor boot drive fails, then the hypervisor must be reinstalled.If both the hypervisor boot drives on the HPE DX NS204i-u boot device fail, then the hypervisor must be reinstalled. This procedure describes how to install the hypervisor on a single node. It works for all hypervisors. If you are replacing a boot drive that has not completely failed, use the procedure: Installing the hypervisor (proactive or graceful replacement procedure). If that graceful or proactive boot disk restoration procedure fails, then perform the following procedure.

Before you begin

Customers must provide the hypervisor ISO images; they are not provided by Nutanix. You must have the hypervisor image before you begin because these procedures prompt you to download the hypervisor ISO image. AHV ISO images might be present on each Controller VM in the cluster. If the procedure prompts to upload the AHV image, download it from the Nutanix portal.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Estimated time to complete: 45 to 90 minutes

Procedure

  1. From the Prism web console, go to Hardware > Diagram view.
    Then, select the failed node.
    Figure. Select Node for Repair

    Click to enlarge
  2. Click the Repair Host Boot Device button.
    Figure. Repair Host Boot Device option

    Click to enlarge
    The Repair Host Boot Disk window appears.
  3. From the window, select Continue without snapshot and click Next.
    Figure. Repair Host Boot Disk Window

    Click to enlargeSelect continue without snapshot.

    The Download Phoenix Image window is displayed.

    Figure. Download phoenix image window

    Click to enlargeOption to download the Phoenix image from Repair Host Boot Disk window

    You can use an existing ISO or download a new ISO.

  4. To download the phoenix.iso, click Download.

    Note:

    • Do not click Next until after you have booted the node with the downloaded phoenix.iso in the next step.
    • Phoenix is available for download in the Nutanix portal also. Login to the portal, go to Downloads > Phoenix, and download the appropriate phoenix.iso file from this page. For assistance, contact Nutanix Support.
  5. Bring up the node with the phoenix.iso file by following the procedure: “Booting the Node from the Phoenix Image.”

Booting the Node from the Phoenix Image (HPE ProLiant DX Platform)

This procedure describes how to boot the node from the Phoenix image.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Procedure

  1. Open a Web browser to the IPMI IP address of the node to be imaged.
  2. Enter the IPMI login credentials in the login screen.
    Figure. iLO Login Screen

    Click to enlarge

    The iLO console main screen appears.

    Note: The following steps might vary depending on the iLO version on the node.
  3. Select Administration > Boot Order > One-Time Boot Status > Select One-Time Boot Option > CD/DVD Drive > Apply.
    Figure. Boot Order Screen

    Click to enlarge
  4. Select Information from the left pane and then select the Overview tab. Under Information > Integrated Remote console, select one of the remote console options listed.
    Figure. IPMI Console Screen

    Click to enlarge
  5. From the Virtual Drives menu, select Virtual Drive > Image File CD/DVD-ROM.
    Figure. Virtual Drive Menu

    Click to enlarge
    The Choose Disk Image File window opens.
  6. In the Choose Disk Image File window, go to where the phoenix.iso file is located. Select that file, and then click the Open button.
  7. In the remote console, select Power Switch > Reset if the node is powered-on. If the node is powered-off, select Power Switch > Momentary Press.

    This causes the system to restart using the selected Phoenix image. The Nutanix Installer screen appears after restarting. Then, the system continues to boot into the Phoenix image with the Phoenix prompt displayed.

    Figure. Power Switch Reset (Power Control)

    Click to enlarge

Finishing the Hypervisor Installation (Failed Boot Device Procedure)

Before you begin

Verify which hypervisor is supported on your platform before you begin. Customers must provide the ESXi or Hyper-V hypervisor ISO images; they are not provided by Nutanix. You should have the hypervisor image before you begin because these procedures will prompt you to download the hypervisor ISO image. AHV hypervisor ISO images might be present on each Controller VM in the cluster. If the procedure prompts to upload the AHV image, please download it from the Nutanix portal.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Estimated time to complete: 45 to 90 minutes

Procedure

  1. Verify which hypervisor is supported on your platform before you begin. Only install the supported hypervisor.
  2. Resume the boot device repair by selecting the node under repair and clicking the Repair Host Boot Device button at the BOTTOM right part of the window.
  3. Click Next and upload the hypervisor by clicking the Choose file button.
    Figure. Choose file menu (AHV Hypervisor example shown)

    Click to enlarge
    Note: If the whitelist is out of date, you are prompted to upload the iso_whitelist.json file. You should select Choose File for that .json file and select the Choose File button for the appropriate .iso file.
    1. In the file search window or finder window, navigate to the location of the appropriate hypervisor installer ISO file and select the file.
      The file that you select appears in the Available Installers menu. The checksum completes.
    2. From the Available Installers menu, click Proceed.
      Figure. Available installers menu (AHV hypervisor example shown)

      Click to enlarge
  4. (Hyper-V) Repair the Boot Disk.
    1. Enter the domain username
    2. Enter the domain password.
      Figure. Hyper-V domain information window

      Click to enlarge
    3. Enter the Failover Cluster Name.
    4. Select the SKU.
    5. Click Yes.
  5. (vSphere) Repair the Boot Disk.

    If the vCenter login window is displayed, enter the IP address, username and password and click Yes.

    Figure. vCenter Login Window

    Click to enlarge
  6. If a new window is displayed, “Are you sure you wish to repair the boot disk for this node?,” click Yes to Repair the boot disk.
  7. Monitor the boot disk repair process.

    The window displays the text: Start host disk boot disk repair precheck.

    Figure. Start host bootdisk repair precheck window

    Click to enlarge

    The window displays the message: Waiting for imaging to complete. The imaging duration varies: AHV 30 to 40 minutes, Hyper-V 90 minutes, and ESXi 20 to 90 minutes.

    Figure. Waiting for imaging to complete window

    Click to enlarge

    You can check the status in the Tasks menu, in a console window, or by clicking Repair Host Boot Disk.

    Figure. Tasks menu status view

    Click to enlarge
    Figure. Imaging status screen

    Click to enlarge

    The host boot disk repair is successful.

    Figure. Successful boot disk repair window

    Click to enlarge
  8. (vSphere only) If the cluster has one or more NFS datastores, create the datastores on the host.
    1. Find the datastores that you intend to create (mount) on the new host.
      nutanix@cvm$ ncli datastore list | grep -A3 NFS
      NFS Datastore Name        : ctr1
      Host                      : 2
      Host Name                 : 10.4.176.80
      Container Name            : ctr1
      
    2. Create (mount) the datastore.
      nutanix@cvm$ ncli datastore create name=nfs_datastore_name ctr-name=container_name host-ids=host_id

      Replace both the nfs_datastore_name and the container_name with the names from the worksheet. (Enter both the datastore name and the container name, even if the container and datastore names are the same.) Replace host_id with the host ID of the new node.

    3. Verify that the datastore is visible to the cluster.
      nutanix@cvm$ ncli datastore list | grep -A3 NFS
    If the cluster has multiple NFS datastores, perform these steps once for each datastore.
  9. (All Hypervisors) If necessary, set the RAM allocation for the CVM.
    When a CVM is recreated, it is assigned the same RAM as its original size. If the RAM is not set correctly, set the RAM size to the preferred size.
  10. If necessary, review or change the settings below.
    1. Hypervisor host name: The AHV and ESXi hypervisor host name might change. This is expected and does not affect functionality. To change it back to the preferred host name:
      • (ESXi): Refer to the VMware documentation.
      • (AHV): Refer to the AHV Administration Guide topic “Changing the Acropolis Host Name” for the AHV version that you are using.
    2. Name of the CVM Virtual Machine (This change is expected and doesn’t affect cluster functionality.)
      • In ESXi, after repair, the original name “NTNX-<serial-number>-<node-position>-CVM” is suffixed with “(1)” such as “NTNX-<serial-number>-<node-position>-CVM (1)”. This is expected and used to differentiate the newly installed CVM from the previously orphaned VM in vCenter’s inventory. If necessary, please refer to the VMware documentation to remove the orphaned VM from inventory.

Verifying and Setting the CPU Type (vSphere)

If the new node also has a newer processor class than the existing nodes in the cluster, you must enable Enhanced vMotion Compatibility (EVC) with the lower feature set as the baseline before adding the node to vCenter. For example, if you add a node with a Haswell or Ivy Bridge CPU to a cluster that has older Sandy Bridge CPUs, you must set the EVC mode to the Sandy Bridge settings.

About this task

Warning: If you mix processor classes without enabling EVC, vMotion/live migration of VMs is not supported. If you add the host with the newer processor class to vCenter before enabling EVC and later, you need to enable EVC, cluster downtime will be required because all VMs (including the Controller VM) will need to be shut down.

If all CPUs in the cluster are the same type, you can skip this procedure.

To determine the processor class of a node, see the Block Serial field on the Diagram or Table view of the Hardware Dashboard in the Nutanix web console.

If you have set the vSphere cluster Enhanced vMotion Compatibility (EVC) level, the minimum level must be L4 – Sandy Bridge.

Estimated time to complete: 3 minutes

Procedure

  1. Log on to the host with the vSphere client.
  2. Click the Summary tab of the host and verify the Processor Type.
    • E5-2650v2
    • E5-2680v2
    • E5-2690v2
    • E5-2630v2
    • E5-2603v2

    The Haswell CPUs all have v3 in the processor type. The Ivy Bridge CPUs all have v2 in the processor type. Verify the actual CPU in each node and select it using the instructions below.

  3. Log on to vCenter with the vSphere client.
  4. Right click to select the cluster and select Edit Settings.
  5. Click on VMware EVC > Change EVC Mode.
    1. Select the EVC mode appropriate for the node.
    2. Click OK.

Node Startup Post-check

Check the health of the cluster after starting a node.

About this task

Who is responsible: This procedure can be performed by a customer or partner, under the guidance of a Nutanix support engineer if necessary.

Procedure

  1. In Prism, go to the Health page and select Actions > Run NCC Checks.
  2. In the dialog box that appears, select All Checks and click Run.

    Alternatively, issue the following command from the CVM:

    nutanix@cvm$ ncc health_checks run_all

  3. If any checks fail, see the related KB article provided in the output and the Nutanix Cluster Check Guide: NCC Reference for information on resolving the issue.
  4. If you have any unresolvable failed checks, contact Nutanix Support.

 

دیدگاهتان را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *