|
| 1 | +~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| 2 | +~~ you may not use this file except in compliance with the License. |
| 3 | +~~ You may obtain a copy of the License at |
| 4 | +~~ |
| 5 | +~~ http://www.apache.org/licenses/LICENSE-2.0 |
| 6 | +~~ |
| 7 | +~~ Unless required by applicable law or agreed to in writing, software |
| 8 | +~~ distributed under the License is distributed on an "AS IS" BASIS, |
| 9 | +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 10 | +~~ See the License for the specific language governing permissions and |
| 11 | +~~ limitations under the License. See accompanying LICENSE file. |
| 12 | + |
| 13 | + --- |
| 14 | + ResourceManger Restart |
| 15 | + --- |
| 16 | + --- |
| 17 | + ${maven.build.timestamp} |
| 18 | + |
| 19 | +ResourceManger Restart |
| 20 | + |
| 21 | +%{toc|section=1|fromDepth=0} |
| 22 | + |
| 23 | +* {Overview} |
| 24 | + |
| 25 | + ResourceManager is the central authority that manages resources and schedules |
| 26 | + applications running atop of YARN. Hence, it is potentially a single point of |
| 27 | + failure in a Apache YARN cluster. |
| 28 | + |
| 29 | + This document gives an overview of ResourceManager Restart, a feature that |
| 30 | + enhances ResourceManager to keep functioning across restarts and also makes |
| 31 | + ResourceManager down-time invisible to end-users. |
| 32 | + |
| 33 | + ResourceManager Restart feature is divided into two phases: |
| 34 | + |
| 35 | + ResourceManager Restart Phase 1: Enhance RM to persist application/attempt state |
| 36 | + and other credentials information in a pluggable state-store. RM will reload |
| 37 | + this information from state-store upon restart and re-kick the previously |
| 38 | + running applications. Users are not required to re-submit the applications. |
| 39 | + |
| 40 | + ResourceManager Restart Phase 2: |
| 41 | + Focus on re-constructing the running state of ResourceManger by reading back |
| 42 | + the container statuses from NodeMangers and container requests from ApplicationMasters |
| 43 | + upon restart. The key difference from phase 1 is that previously running applications |
| 44 | + will not be killed after RM restarts, and so applications won't lose its work |
| 45 | + because of RM outage. |
| 46 | + |
| 47 | + As of Hadoop 2.4.0 release, only ResourceManager Restart Phase 1 is implemented which |
| 48 | + is described below. |
| 49 | + |
| 50 | +* {Feature} |
| 51 | + |
| 52 | + The overall concept is that RM will persist the application metadata |
| 53 | + (i.e. ApplicationSubmissionContext) in |
| 54 | + a pluggable state-store when client submits an application and also saves the final status |
| 55 | + of the application such as the completion state (failed, killed, finished) |
| 56 | + and diagnostics when the application completes. Besides, RM also saves |
| 57 | + the credentials like security keys, tokens to work in a secure environment. |
| 58 | + Any time RM shuts down, as long as the required information (i.e.application metadata |
| 59 | + and the alongside credentials if running in a secure environment) is available |
| 60 | + in the state-store, when RM restarts, it can pick up the application metadata |
| 61 | + from the state-store and re-submit the application. RM won't re-submit the |
| 62 | + applications if they were already completed (i.e. failed, killed, finished) |
| 63 | + before RM went down. |
| 64 | + |
| 65 | + NodeMangers and clients during the down-time of RM will keep polling RM until |
| 66 | + RM comes up. When RM becomes alive, it will send a re-sync command to |
| 67 | + all the NodeMangers and ApplicationMasters it was talking to via heartbeats. |
| 68 | + Today, the behaviors for NodeMangers and ApplicationMasters to handle this command |
| 69 | + are: NMs will kill all its managed containers and re-register with RM. From the |
| 70 | + RM's perspective, these re-registered NodeManagers are similar to the newly joining NMs. |
| 71 | + AMs(e.g. MapReduce AM) today are expected to shutdown when they receive the re-sync command. |
| 72 | + After RM restarts and loads all the application metadata, credentials from state-store |
| 73 | + and populates them into memory, it will create a new |
| 74 | + attempt (i.e. ApplicationMaster) for each application that was not yet completed |
| 75 | + and re-kick that application as usual. As described before, the previously running |
| 76 | + applications' work is lost in this manner since they are essentially killed by |
| 77 | + RM via the re-sync command on restart. |
| 78 | + |
| 79 | +* {Configurations} |
| 80 | + |
| 81 | + This section describes the configurations involved to enable RM Restart feature. |
| 82 | + |
| 83 | + * Enable ResourceManager Restart functionality. |
| 84 | + |
| 85 | + To enable RM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true: |
| 86 | + |
| 87 | +*--------------------------------------+--------------------------------------+ |
| 88 | +|| Property || Value | |
| 89 | +*--------------------------------------+--------------------------------------+ |
| 90 | +| <<<yarn.resourcemanager.recovery.enabled>>> | | |
| 91 | +| | <<<true>>> | |
| 92 | +*--------------------------------------+--------------------------------------+ |
| 93 | + |
| 94 | + |
| 95 | + * Configure the state-store that is used to persist the RM state. |
| 96 | + |
| 97 | +*--------------------------------------+--------------------------------------+ |
| 98 | +|| Property || Description | |
| 99 | +*--------------------------------------+--------------------------------------+ |
| 100 | +| <<<yarn.resourcemanager.store.class>>> | | |
| 101 | +| | The class name of the state-store to be used for saving application/attempt | |
| 102 | +| | state and the credentials. The available state-store implementations are | |
| 103 | +| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore>>> | |
| 104 | +| | , a ZooKeeper based state-store implementation and | |
| 105 | +| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>> | |
| 106 | +| | , a Hadoop FileSystem based state-store implementation like HDFS. | |
| 107 | +| | The default value is set to | |
| 108 | +| | <<<org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore>>>. | |
| 109 | +*--------------------------------------+--------------------------------------+ |
| 110 | + |
| 111 | + * Configurations when using Hadoop FileSystem based state-store implementation. |
| 112 | + |
| 113 | + Configure the URI where the RM state will be saved in the Hadoop FileSystem state-store. |
| 114 | + |
| 115 | +*--------------------------------------+--------------------------------------+ |
| 116 | +|| Property || Description | |
| 117 | +*--------------------------------------+--------------------------------------+ |
| 118 | +| <<<yarn.resourcemanager.fs.state-store.uri>>> | | |
| 119 | +| | URI pointing to the location of the FileSystem path where RM state will | |
| 120 | +| | be stored (e.g. hdfs://localhost:9000/rmstore). | |
| 121 | +| | Default value is <<<${hadoop.tmp.dir}/yarn/system/rmstore>>>. | |
| 122 | +| | If FileSystem name is not provided, <<<fs.default.name>>> specified in | |
| 123 | +| | <<conf/core-site.xml>> will be used. | |
| 124 | +*--------------------------------------+--------------------------------------+ |
| 125 | + |
| 126 | + Configure the retry policy state-store client uses to connect with the Hadoop |
| 127 | + FileSystem. |
| 128 | + |
| 129 | +*--------------------------------------+--------------------------------------+ |
| 130 | +|| Property || Description | |
| 131 | +*--------------------------------------+--------------------------------------+ |
| 132 | +| <<<yarn.resourcemanager.fs.state-store.retry-policy-spec>>> | | |
| 133 | +| | Hadoop FileSystem client retry policy specification. Hadoop FileSystem client retry | |
| 134 | +| | is always enabled. Specified in pairs of sleep-time and number-of-retries | |
| 135 | +| | i.e. (t0, n0), (t1, n1), ..., the first n0 retries sleep t0 milliseconds on | |
| 136 | +| | average, the following n1 retries sleep t1 milliseconds on average, and so on. | |
| 137 | +| | Default value is (2000, 500) | |
| 138 | +*--------------------------------------+--------------------------------------+ |
| 139 | + |
| 140 | + * Configurations when using ZooKeeper based state-store implementation. |
| 141 | + |
| 142 | + Configure the ZooKeeper server address and the root path where the RM state is stored. |
| 143 | + |
| 144 | +*--------------------------------------+--------------------------------------+ |
| 145 | +|| Property || Description | |
| 146 | +*--------------------------------------+--------------------------------------+ |
| 147 | +| <<<yarn.resourcemanager.zk-address>>> | | |
| 148 | +| | Comma separated list of Host:Port pairs. Each corresponds to a ZooKeeper server | |
| 149 | +| | (e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the RM | |
| 150 | +| | for storing RM state. | |
| 151 | +*--------------------------------------+--------------------------------------+ |
| 152 | +| <<<yarn.resourcemanager.zk-state-store.parent-path>>> | | |
| 153 | +| | The full path of the root znode where RM state will be stored. | |
| 154 | +| | Default value is /rmstore. | |
| 155 | +*--------------------------------------+--------------------------------------+ |
| 156 | + |
| 157 | + Configure the retry policy state-store client uses to connect with the ZooKeeper server. |
| 158 | + |
| 159 | +*--------------------------------------+--------------------------------------+ |
| 160 | +|| Property || Description | |
| 161 | +*--------------------------------------+--------------------------------------+ |
| 162 | +| <<<yarn.resourcemanager.zk-num-retries>>> | | |
| 163 | +| | Number of times RM tries to connect to ZooKeeper server if the connection is lost. | |
| 164 | +| | Default value is 500. | |
| 165 | +*--------------------------------------+--------------------------------------+ |
| 166 | +| <<<yarn.resourcemanager.zk-retry-interval-ms>>> | |
| 167 | +| | The interval in milliseconds between retries when connecting to a ZooKeeper server. | |
| 168 | +| | Default value is 2 seconds. | |
| 169 | +*--------------------------------------+--------------------------------------+ |
| 170 | +| <<<yarn.resourcemanager.zk-timeout-ms>>> | | |
| 171 | +| | ZooKeeper session timeout in milliseconds. This configuration is used by | |
| 172 | +| | the ZooKeeper server to determine when the session expires. Session expiration | |
| 173 | +| | happens when the server does not hear from the client (i.e. no heartbeat) within the session | |
| 174 | +| | timeout period specified by this configuration. Default | |
| 175 | +| | value is 10 seconds | |
| 176 | +*--------------------------------------+--------------------------------------+ |
| 177 | + |
| 178 | + Configure the ACLs to be used for setting permissions on ZooKeeper znodes. |
| 179 | + |
| 180 | +*--------------------------------------+--------------------------------------+ |
| 181 | +|| Property || Description | |
| 182 | +*--------------------------------------+--------------------------------------+ |
| 183 | +| <<<yarn.resourcemanager.zk-acl>>> | | |
| 184 | +| | ACLs to be used for setting permissions on ZooKeeper znodes. Default value is <<<world:anyone:rwcda>>> | |
| 185 | +*--------------------------------------+--------------------------------------+ |
| 186 | + |
| 187 | + * Configure the max number of application attempt retries. |
| 188 | + |
| 189 | +*--------------------------------------+--------------------------------------+ |
| 190 | +|| Property || Description | |
| 191 | +*--------------------------------------+--------------------------------------+ |
| 192 | +| <<<yarn.resourcemanager.am.max-attempts>>> | | |
| 193 | +| | The maximum number of application attempts. It's a global | |
| 194 | +| | setting for all application masters. Each application master can specify | |
| 195 | +| | its individual maximum number of application attempts via the API, but the | |
| 196 | +| | individual number cannot be more than the global upper bound. If it is, | |
| 197 | +| | the RM will override it. The default number is set to 2, to | |
| 198 | +| | allow at least one retry for AM. | |
| 199 | +*--------------------------------------+--------------------------------------+ |
| 200 | + |
| 201 | + This configuration's impact is in fact beyond RM restart scope. It controls |
| 202 | + the max number of attempts an application can have. In RM Restart Phase 1, |
| 203 | + this configuration is needed since as described earlier each time RM restarts, |
| 204 | + it kills the previously running attempt (i.e. ApplicationMaster) and |
| 205 | + creates a new attempt. Therefore, each occurrence of RM restart causes the |
| 206 | + attempt count to increase by 1. In RM Restart phase 2, this configuration is not |
| 207 | + needed since the previously running ApplicationMaster will |
| 208 | + not be killed and the AM will just re-sync back with RM after RM restarts. |
0 commit comments