|
1 | | -Sometimes an Azure virtual machine (VM) may reboot for no apparent reason, with no evidence of a user initiating the reboot operation. This article lists the actions and events that can cause VMs to reboot and provides insight into how to avoid unexpected reboot issues or reduce the impact of the issue. |
| 1 | +Azure virtual machines (VMs) might sometimes reboot for no apparent reason, without evidence of your having initiated the reboot operation. This article lists the actions and events that can cause VMs to reboot and provides insight into how to avoid unexpected reboot issues or reduce the impact of such issues. |
2 | 2 |
|
3 | 3 | ## Configure the VMs for high availability |
4 | | -The best way to protect your application running on Azure against any type of VM reboots and downtime is to configure the VMs for high availability. |
| 4 | +The best way to protect an application that's running on Azure against VM reboots and downtime is to configure the VMs for high availability. |
5 | 5 |
|
6 | | -To provide this level of redundancy to your application, we recommend that you group two or more VMs in an availability set. This configuration ensures that during either a planned or unplanned maintenance event, at least one VM is available and meets the 99.95% [Azure SLA](https://azure.microsoft.com/support/legal/sla/virtual-machines/v1_5/). |
| 6 | +To provide this level of redundancy to your application, we recommend that you group two or more VMs in an availability set. This configuration ensures that during either a planned or unplanned maintenance event, at least one VM is available and meets the 99.95 percent [Azure SLA](https://azure.microsoft.com/support/legal/sla/virtual-machines/v1_5/). |
7 | 7 |
|
8 | | -For more information about availability set, see the following articles: |
| 8 | +For more information about availability sets, see the following articles: |
9 | 9 |
|
10 | 10 | - [Manage the availability of VMs](../articles/virtual-machines/windows/manage-availability.md) |
11 | 11 | - [Configure availability of VMs](../articles/virtual-machines/windows/classic/configure-availability.md) |
12 | 12 |
|
13 | | -## Resource Health Information |
14 | | -Azure Resource Health is a service that exposes the health of individual Azure resources and provides actionable guidance for troubleshooting problems. In a cloud environment where it isn’t possible to directly access servers or infrastructure elements, the goal for Resource Health is to reduce the time that customers spend on troubleshooting. Particularly, the time that spent determining if the root of the problem lies in the application or is caused by an event inside the Azure platform. For more information, see [Understand and use Resource Health](../articles/resource-health/resource-health-overview.md) |
| 13 | +## Resource Health information |
| 14 | +Azure Resource Health is a service that exposes the health of individual Azure resources and provides actionable guidance for troubleshooting problems. In a cloud environment where it isn’t possible to directly access servers or infrastructure elements, the goal of Resource Health is to reduce the time that you spend on troubleshooting. In particular, the aim is to reduce the time that you spend determining whether the root of the problem lies in the application or in an event inside the Azure platform. For more information, see [Understand and use Resource Health](../articles/resource-health/resource-health-overview.md). |
15 | 15 |
|
16 | 16 | ## Actions and events that can cause the VM to reboot |
17 | 17 |
|
18 | 18 | ### Planned maintenance |
19 | | -Microsoft Azure periodically performs updates across the globe to improve the reliability, performance, and security of the host infrastructure that underlies VMs. Many of these updates, including memory-preserving updates are performed without any impact to your VMs or cloud services. |
| 19 | +Microsoft Azure periodically performs updates across the globe to improve the reliability, performance, and security of the host infrastructure that underlies VMs. Many of these updates, including memory-preserving updates, are performed without any impact on your VMs or cloud services. |
20 | 20 |
|
21 | | -However, some updates do require a reboot. The VMs are shut down while we patch the infrastructure, and then the VMs are restarted. |
| 21 | +However, some updates do require a reboot. In such cases, the VMs are shut down while we patch the infrastructure, and then the VMs are restarted. |
22 | 22 |
|
23 | | -To understand what Azure planned maintenance is and how it can affect the availability of your Linux VMs, see the following articles. These articles provide background about the Azure planned maintenance process and how to schedule planned maintenance to further reduce the impact. |
| 23 | +To understand what Azure planned maintenance is and how it can affect the availability of your Linux VMs, see the articles listed here. The articles provide background about the Azure planned maintenance process and how to schedule planned maintenance to further reduce the impact. |
24 | 24 |
|
25 | 25 | - [Planned maintenance for VMs in Azure](../articles/virtual-machines/windows/planned-maintenance.md) |
26 | 26 | - [How to schedule planned maintenance on Azure VMs](../articles/virtual-machines/windows/planned-maintenance-schedule.md) |
27 | 27 |
|
28 | 28 | ### Memory-preserving updates |
29 | | -For this class of updates in Microsoft Azure, customers do not see any impact to their running VMs. Many of these updates are to components or services that can be updated without interfering with the running instance. Some are platform infrastructure updates on the host operating system that can be applied without a reboot of the VMs. |
| 29 | +For this class of updates in Microsoft Azure, users experience no impact on their running VMs. Many of these updates are to components or services that can be updated without interfering with the running instance. Some are platform infrastructure updates on the host operating system that can be applied without a reboot of the VMs. |
30 | 30 |
|
31 | | -These memory-preserving updates are accomplished with technology that enables in-place live migration. When updating, the VM is placed into a “paused” state, preserving the memory in RAM, while the underlying host operating system receives the necessary updates and patches. The VM is resumed within 30 seconds of being paused. After resuming, the clock of the VM is automatically synchronized. |
| 31 | +These memory-preserving updates are accomplished with technology that enables in-place live migration. When it is being updated, the VM is placed in a *paused* state. This state preserves the memory in RAM while the underlying host operating system receives the necessary updates and patches. The VM is resumed within 30 seconds of being paused. After the VM is resumed, its clock is automatically synchronized. |
32 | 32 |
|
33 | | -Not all updates can be deployed by using this mechanism, but given the short pause period, deploying updates in this way greatly reduces impact to VMs. |
| 33 | +Because of the short pause period, deploying updates through this mechanism greatly reduces the impact on the VMs. However, not all updates can be deployed in this way. |
34 | 34 |
|
35 | 35 | Multi-instance updates (for VMs in an availability set) are applied one update domain at a time. |
36 | 36 |
|
37 | | -> [!Note] |
38 | | -> Linux machines that have old kernel versions are affected by a kernel panic during this update method. To avoid this issue, update to kernel version 3.10.0-327.10.1 or a later version. For more information, see [An Azure Linux VM on a 3.10-based kernel panics after a host node upgrade](https://support.microsoft.com/help/3212236). |
| 37 | +> [!NOTE] |
| 38 | +> Linux machines that have old kernel versions are affected by a kernel panic during this update method. To avoid this issue, update to kernel version 3.10.0-327.10.1 or later. For more information, see [An Azure Linux VM on a 3.10-based kernel panics after a host node upgrade](https://support.microsoft.com/help/3212236). |
39 | 39 | |
40 | | -### User-initiated reboot/shutdown actions |
| 40 | +### User-initiated reboot or shutdown actions |
41 | 41 |
|
42 | | -If a reboot is performed from the Azure portal, Azure PowerShell, Command-Line interface, or Reset API, the event can be found in [Azure Activity Log](../articles/monitoring-and-diagnostics/monitoring-overview-activity-logs.md). |
| 42 | +If you perform a reboot from the Azure portal, Azure PowerShell, command-line interface, or Reset API, you can find the event in the [Azure Activity Log](../articles/monitoring-and-diagnostics/monitoring-overview-activity-logs.md). |
43 | 43 |
|
44 | | -If the action is performed from the VM's operation system, the event can be found in system logs. |
| 44 | +If you perform the action from the VM's operating system, you can find the event in the system logs. |
45 | 45 |
|
46 | | -Other scenario that usually causes the VM to reboot include multiple configuration change actions. Typically, the user sees a warning message indicating that executing a particular action will result in a reboot of the VM. Examples include any VM resize operations, changing the password of the administrative account and setting a static IP address. |
| 46 | +Other scenarios that usually cause the VM to reboot include multiple configuration-change actions. You'll ordinarily see a warning message indicating that executing a particular action will result in a reboot of the VM. Examples include any VM resize operations, changing the password of the administrative account, and setting a static IP address. |
47 | 47 |
|
48 | | -### Azure Security center and Windows Updates |
49 | | -Azure Security Center monitors daily Windows and Linux VMs for missing operating system updates. Security Center retrieves a list of available security and critical updates from Windows Update or Windows Server Update Services (WSUS), depending on which service is configured on a Windows VM. Security Center also checks for the latest updates for Linux systems. If your VM is missing a system update, Security Center will recommend that you apply system updates. Application of these system updates is controlled through the Security Center in Azure portal. After applying some updates, VM reboots may be required. For more information, see [Apply system updates in Azure Security Center](../articles/security-center/security-center-apply-system-updates.md). |
| 48 | +### Azure Security Center and Windows Update |
| 49 | +Azure Security Center monitors daily Windows and Linux VMs for missing operating-system updates. Security Center retrieves a list of available security and critical updates from Windows Update or Windows Server Update Services (WSUS), depending on which service is configured on a Windows VM. Security Center also checks for the latest updates for Linux systems. If your VM is missing a system update, Security Center recommends that you apply system updates. The application of these system updates is controlled through the Security Center in the Azure portal. After you apply some updates, VM reboots might be required. For more information, see [Apply system updates in Azure Security Center](../articles/security-center/security-center-apply-system-updates.md). |
50 | 50 |
|
51 | | -Like on-premises servers, Azure does not push Windows Updates to Windows Azure VMs since these machines are intended to be managed by the user. Customers are, however encouraged to leave the automatic Windows Update setting enabled. Automatic installation of Windows Updates can also cause reboots to occur after updates are applied. For more information, see [Windows Update FAQ](https://support.microsoft.com/help/12373/windows-update-faq). |
| 51 | +Like on-premises servers, Azure does not push updates from Windows Update to Windows Azure VMs, because these machines are intended to be managed by their users. You are, however, encouraged to leave the automatic Windows Update setting enabled. Automatic installation of updates from Windows Update can also cause reboots to occur after the updates are applied. For more information, see [Windows Update FAQ](https://support.microsoft.com/help/12373/windows-update-faq). |
52 | 52 |
|
53 | 53 | ### Other situations affecting the availability of your VM |
54 | | -There are other cases in which Azure might actively suspend the use of a VM. Users receive email notifications before this action is taken, so they have a chance to resolve the underlying issues. Examples include security violations, and expired payment method having expired. |
| 54 | +There are other cases in which Azure might actively suspend the use of a VM. You'll receive email notifications before this action is taken, so you'll have a chance to resolve the underlying issues. Examples of issues that affect VM availability include security violations and the expiration of payment methods. |
55 | 55 |
|
56 | | -### Host Server Faults |
57 | | -The VM is hosted on a physical server that is running inside an Azure datacenter. The physical server runs an agent called the Host Agent in addition to a few other Azure components. When these Azure software components on the physical server become unresponsive, the monitoring system triggers a reboot of the host server to attempt recovery. The VM is typically available again within five minutes and continues to live on the same host as previously. |
| 56 | +### Host server faults |
| 57 | +The VM is hosted on a physical server that is running inside an Azure datacenter. The physical server runs an agent called the Host Agent in addition to a few other Azure components. When these Azure software components on the physical server become unresponsive, the monitoring system triggers a reboot of the host server to attempt recovery. The VM is usually available again within five minutes and continues to live on the same host as previously. |
58 | 58 |
|
59 | | -Server faults are typically caused by hardware failure such as failure of a hard disk or solid-state drive. Azure continuously monitors these occurrences, identifies the underlying bugs, and rolls out updates after the mitigation has been implemented and tested. |
| 59 | +Server faults are usually caused by hardware failure, such as the failure of a hard disk or solid-state drive. Azure continuously monitors these occurrences, identifies the underlying bugs, and rolls out updates after the mitigation has been implemented and tested. |
60 | 60 |
|
61 | | -Since some host server faults can be specific to that server, a repeated VM reboot situation might be improved by manually redeploying it to another host server. This operation can be triggered by using the “redeploy” option on the details page of the VM, or by stopping and restarting the VM in the Azure portal. |
| 61 | +Because some host server faults can be specific to that server, a repeated VM reboot situation might be improved by manually redeploying the VM to another host server. This operation can be triggered by using the **redeploy** option on the details page of the VM, or by stopping and restarting the VM in the Azure portal. |
62 | 62 |
|
63 | 63 | ### Auto-recovery |
64 | | -In case the host server cannot reboot for any reason, the Azure platform initiates an auto-recovery action to take the faulty host server out of rotation for further investigation. |
65 | | -All VMs on that host are automatically relocated to a different, healthy host server. This process is typically complete within 15 minutes. This blog describes the auto-recovery process: [Auto-recovery of VMs](https://azure.microsoft.com/blog/service-healing-auto-recovery-of-virtual-machines). |
| 64 | +If the host server cannot reboot for any reason, the Azure platform initiates an auto-recovery action to take the faulty host server out of rotation for further investigation. |
| 65 | + |
| 66 | +All VMs on that host are automatically relocated to a different, healthy host server. This process is usually complete within 15 minutes. To learn more about the auto-recovery process, see [Auto-recovery of VMs](https://azure.microsoft.com/blog/service-healing-auto-recovery-of-virtual-machines). |
66 | 67 |
|
67 | 68 | ### Unplanned maintenance |
68 | | -On rare occasions, the Azure operations team may need to perform maintenance activities to ensure the overall health of the Azure platform. This behavior may affect VM availability and typically results in the same auto-recovery action as described earlier. |
| 69 | +On rare occasions, the Azure operations team might need to perform maintenance activities to ensure the overall health of the Azure platform. This behavior might affect VM availability, and it usually results in the same auto-recovery action as described earlier. |
69 | 70 |
|
70 | | -Unplanned maintenance s include the following: |
| 71 | +Unplanned maintenances include the following: |
71 | 72 |
|
72 | 73 | - Urgent node defragmentation |
73 | 74 | - Urgent network switch updates |
74 | 75 |
|
75 | | -### VM Crashes |
76 | | -VMs may restart due to issues within the VM itself. The work load or role running on the VM may trigger a bug check within the guest operating system. For help determining the reason for the crash, view system and application logs for Windows VMs, and serial logs for Linux VMs. |
| 76 | +### VM crashes |
| 77 | +VMs might restart because of issues within the VM itself. The workload or role that's running on the VM might trigger a bug check within the guest operating system. For help determining the reason for the crash, view the system and application logs for Windows VMs, and the serial logs for Linux VMs. |
77 | 78 |
|
78 | 79 | ### Storage-related forced shutdowns |
79 | | -VMs in Azure rely on virtual disks for operating system and data storage that is hosted on the Azure Storage infrastructure. Whenever the availability or connectivity between the VM and the associated virtual disks is impacted for more than 120 seconds, the Azure platform performs a forced shutdown of the VMs to avoid data corruption. The VMs are automatically powered back on after storage connectivity has been restored. |
| 80 | +VMs in Azure rely on virtual disks for operating system and data storage that is hosted on the Azure Storage infrastructure. Whenever the availability or connectivity between the VM and the associated virtual disks is affected for more than 120 seconds, the Azure platform performs a forced shutdown of the VMs to avoid data corruption. The VMs are automatically powered back on after storage connectivity has been restored. |
80 | 81 |
|
81 | 82 | The duration of the shutdown can be as short as five minutes but can be significantly longer. The following is one of the specific cases that is associated with storage-related forced shutdowns: |
82 | 83 |
|
83 | 84 | **Exceeding IO limits** |
84 | 85 |
|
85 | | -VMs might be temporarily shut down when I/O requests are consistently throttled due to a volume of input/output operations per second (IOPS) that exceeds the I/O limits for disk (Standard disk storage is limited to 500 IOPS). To mitigate this issue, use disk striping or configure storage space inside the guest VM, depending on the workload. For details, see [Configuring Azure VMs for Optimal Storage Performance](http://blogs.msdn.com/b/mast/archive/2014/10/14/configuring-azure-virtual-machines-for-optimal-storage-performance.aspx). |
| 86 | +VMs might be temporarily shut down when I/O requests are consistently throttled because the volume of I/O operations per second (IOPS) exceeds the I/O limits for the disk. (Standard disk storage is limited to 500 IOPS.) To mitigate this issue, use disk striping or configure the storage space inside the guest VM, depending on the workload. For details, see [Configuring Azure VMs for Optimal Storage Performance](http://blogs.msdn.com/b/mast/archive/2014/10/14/configuring-azure-virtual-machines-for-optimal-storage-performance.aspx). |
86 | 87 |
|
87 | | -Higher IOPS limits are available via Azure Premium Storage with up to 80,000 IOPs. For more information, See [High-Performance Premium Storage](../articles/storage/storage-premium-storage.md). |
| 88 | +Higher IOPS limits are available via Azure Premium Storage with up to 80,000 IOPS. For more information, see [High-Performance Premium Storage](../articles/storage/storage-premium-storage.md). |
88 | 89 |
|
89 | 90 | ### Other incidents |
90 | | -In rare circumstances, a wide spread issue can impact multiple servers in an Azure data center. If this occurs, the Azure team sends email notifications to affected subscriptions. You can check the [Azure Service Health Dashboard](https://azure.microsoft.com/status/) and Azure portal for the status of on going outages and past incidents. |
| 91 | +In rare circumstances, a widespread issue can affect multiple servers in an Azure datacenter. If this issue occurs, the Azure team sends email notifications to the affected subscriptions. You can check the [Azure Service Health dashboard](https://azure.microsoft.com/status/) and the Azure portal for the status of ongoing outages and past incidents. |
0 commit comments