|
| 1 | +Feature |
| 2 | +------- |
| 3 | + |
| 4 | +Gluster logging enhancements to support message IDs per message |
| 5 | + |
| 6 | +Summary |
| 7 | +------- |
| 8 | + |
| 9 | +Enhance gluster logging to provide the following features, SubFeature |
| 10 | +--\> SF |
| 11 | + |
| 12 | +- SF1: Add message IDs to message |
| 13 | + |
| 14 | +- SF2: Standardize error num reporting across messages |
| 15 | + |
| 16 | +- SF3: Enable repetitive message suppression in logs |
| 17 | + |
| 18 | +- SF4: Log location and hierarchy standardization (in case anything is |
| 19 | +further required here, analysis pending) |
| 20 | + |
| 21 | +- SF5: Enable per sub-module logging level configuration |
| 22 | + |
| 23 | +- SF6: Enable logging to other frameworks, than just the current gluster |
| 24 | +logs |
| 25 | + |
| 26 | +- SF7: Generate a catalogue of these message, with message ID, message, |
| 27 | +reason for occurrence, recovery/troubleshooting steps. |
| 28 | + |
| 29 | +Owners |
| 30 | +------ |
| 31 | + |
| 32 | +Balamurugan Arumugam <[email protected]> |
| 33 | +Krishnan Parthasarathi <[email protected]> |
| 34 | +Krutika Dhananjay <[email protected]> |
| 35 | +Shyamsundar Ranganathan <[email protected]> |
| 36 | + |
| 37 | +Current status |
| 38 | +-------------- |
| 39 | + |
| 40 | +### Existing infrastructure: |
| 41 | + |
| 42 | +Currently gf\_logXXX exists as an infrastructure API for all logging |
| 43 | +related needs. This (typically) takes the form, |
| 44 | + |
| 45 | +gf\_log(dom, levl, fmt...) |
| 46 | + |
| 47 | +where, |
| 48 | + |
| 49 | + dom: Open format string usually the xlator name, or "cli" or volume name etc. |
| 50 | + levl: One of, GF_LOG_EMERG, GF_LOG_ALERT, GF_LOG_CRITICAL, GF_LOG_ERROR, GF_LOG_WARNING, GF_LOG_NOTICE, GF_LOG_INFO, GF_LOG_DEBUG, GF_LOG_TRACE |
| 51 | + fmt: the actual message string, followed by the required arguments in the string |
| 52 | + |
| 53 | +The log initialization happens through, |
| 54 | + |
| 55 | +gf\_log\_init (void \*data, const char \*filename, const char \*ident) |
| 56 | + |
| 57 | +where, |
| 58 | + |
| 59 | + data: glusterfs_ctx_t, largely unused in logging other than the required FILE and mutex fields |
| 60 | + filename: file name to log to |
| 61 | + ident: Like syslog ident parameter, largely unused |
| 62 | + |
| 63 | +The above infrastructure leads to logs of type, (sample extraction from |
| 64 | +nfs.log) |
| 65 | + |
| 66 | + [2013-12-08 14:17:17.603879] I [socket.c:3485:socket_init] 0-socket.ACL: SSL support is NOT enabled |
| 67 | + [2013-12-08 14:17:17.603937] I [socket.c:3500:socket_init] 0-socket.ACL: using system polling thread |
| 68 | + [2013-12-08 14:17:17.612128] I [nfs.c:934:init] 0-nfs: NFS service started |
| 69 | + [2013-12-08 14:17:17.612383] I [dht-shared.c:311:dht_init_regex] 0-testvol-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ |
| 70 | + |
| 71 | +### Limitations/Issues in the infrastructure |
| 72 | + |
| 73 | +1) Auto analysis of logs needs to be done based on the final message |
| 74 | +string. Automated tools that can help with log message and related |
| 75 | +troubleshooting options need to use the final string, which needs to be |
| 76 | +intelligently parsed and also may change between releases. It would be |
| 77 | +desirable to have message IDs so that such tools and trouble shooting |
| 78 | +options can leverage the same in a much easier fashion. |
| 79 | + |
| 80 | +2) The log message itself currently does not use the \_ident\_ which |
| 81 | +can help as we move to more common logging frameworks like journald, |
| 82 | +rsyslog (or syslog as the case maybe) |
| 83 | + |
| 84 | +3) errno is the primary identifier of errors across gluster, i.e we do |
| 85 | +not have error codes in gluster and use errno values everywhere. The log |
| 86 | +messages currently do not lend themselves to standardization like |
| 87 | +printing the string equivalent of errno rather than the actual errno |
| 88 | +value, which \_could\_ be cryptic to administrators |
| 89 | + |
| 90 | +4) Typical logging infrastructures provide suppression (on a |
| 91 | +configurable basis) for repetitive messages to prevent log flooding, |
| 92 | +this is currently missing in the current infrastructure |
| 93 | + |
| 94 | +5) The current infrastructure cannot be used to control log levels at a |
| 95 | +per xlator or sub module, as the \_dom\_ passed is a string that change |
| 96 | +based on volume name, translator name etc. It would be desirable to have |
| 97 | +a better module identification mechanism that can help with this |
| 98 | +feature. |
| 99 | + |
| 100 | +6) Currently the entire logging infrastructure resides within gluster. |
| 101 | +It would be desirable in scaled situations to have centralized logging |
| 102 | +and monitoring solutions in place, to be able to better analyse and |
| 103 | +monitor the cluster health and take actions. |
| 104 | + |
| 105 | +This requires some form of pluggable logging frameworks that can be used |
| 106 | +within gluster to enable this possibility. Currently the existing |
| 107 | +framework is used throughout gluster and hence we need only to change |
| 108 | +configuration and logging.c to enable logging to other frameworks (as an |
| 109 | +example the current syslog plug that was provided). |
| 110 | + |
| 111 | +It would be desirable to enhance this to provide a more robust framework |
| 112 | +for future extensions to other frameworks. This is not a limitation of |
| 113 | +the current framework, so much as a re-factor to be able to switch |
| 114 | +logging frameworks with more ease. |
| 115 | + |
| 116 | +7) For centralized logging in the future, it would need better |
| 117 | +identification strings from various gluster processes and hosts, which |
| 118 | +is currently missing or suppressed in the logging infrastructure. |
| 119 | + |
| 120 | +Due to the nature of enhancements proposed, it is required that we |
| 121 | +better the current infrastructure for the stated needs and do some |
| 122 | +future proofing in terms of newer messages that would be added. |
| 123 | + |
| 124 | +Detailed Description |
| 125 | +-------------------- |
| 126 | + |
| 127 | +NOTE: Covering details for SF1, SF2, and partially SF3, SF5, SF6. SF4/7 |
| 128 | +will be covered in later revisions/phases. |
| 129 | + |
| 130 | +### Logging API changes: |
| 131 | + |
| 132 | +1) Change the logging API as follows, |
| 133 | + |
| 134 | +From: gf\_log(dom, levl, fmt...) |
| 135 | + |
| 136 | +To: gf\_msg(dom, levl, errnum, msgid, fmt...) |
| 137 | + |
| 138 | +Where: |
| 139 | + |
| 140 | + dom: Open string as used in the current logging infrastructure (helps in backward compat) |
| 141 | + levl: As in current logging infrastructure (current levels seem sufficient enough to not add more levels for better debuggability etc.) |
| 142 | + <new fields> |
| 143 | + msgid: A message identifier, unique to this message FMT string and possibly this invocation. (SF1, lending to SF3) |
| 144 | + errnum: The errno that this message is generated for (with an implicit 0 meaning no error number per se with this message) (SF2) |
| 145 | + |
| 146 | +NOTE: Internally the gf\_msg would still be a macro that would add the |
| 147 | +\_\_FILE\_\_ \_\_LINE\_\_ \_\_FUNCTION\_\_ arguments |
| 148 | + |
| 149 | +2) Enforce \_ident\_ in the logging initialization API, gf\_log\_init |
| 150 | +(void \*data, const char \*filename, const char \*ident) |
| 151 | + |
| 152 | +Where: |
| 153 | + |
| 154 | + ident would be the identifier string like, nfs, <mountpoint>, brick-<brick-name>, cli, glusterd, as is the case with the log file name that is generated today (lending to SF6) |
| 155 | + |
| 156 | +#### What this achieves: |
| 157 | + |
| 158 | +With the above changes, we now have a message ID per message |
| 159 | +(\_msgid\_), location of the message in terms of which component |
| 160 | +(\_dom\_) and which process (\_ident\_). The further identification of |
| 161 | +the message location in terms of host (ip/name) can be done in the |
| 162 | +framework, when centralized logging infrastructure is introduced. |
| 163 | + |
| 164 | +#### Log message changes: |
| 165 | + |
| 166 | +With the above changes to the API the log message can now appear in a |
| 167 | +compatibility mode to adhere to current logging format, or be presented |
| 168 | +as follows, |
| 169 | + |
| 170 | +log invoked as: gf\_msg(dom, levl, ENOTSUP, msgidX) |
| 171 | + |
| 172 | +Example: gf\_msg ("logchecks", GF\_LOG\_CRITICAL, 22, logchecks\_msg\_4, |
| 173 | +42, "Forty-Two", 42); |
| 174 | + |
| 175 | +Where: logchecks\_msg\_4 (GLFS\_COMP\_BASE + 4), "Critical: Format |
| 176 | +testing: %d:%s:%x" |
| 177 | + |
| 178 | +1) Gluster logging framework (logged as) |
| 179 | + |
| 180 | + [2014-02-17 08:52:28.038267] I [MSGID: 1002] [logchecks.c:44:go_log] 0-logchecks: Informational: Format testing: 42:Forty-Two:2a [Invalid argument] |
| 181 | + |
| 182 | +2) syslog (passed as) |
| 183 | + |
| 184 | + Feb 17 14:17:42 somari logchecks[26205]: [MSGID: 1002] [logchecks.c:44:go_log] 0-logchecks: Informational: Format testing: 42:Forty-Two:2a [Invalid argument] |
| 185 | + |
| 186 | +3) journald (passed as) |
| 187 | + |
| 188 | + sd_journal_send("MESSAGE=<vasprintf(dom, msgid(fmt))>", |
| 189 | + "MESSAGE_ID=msgid", |
| 190 | + "PRIORITY=levl", |
| 191 | + "CODE_FILE=`<fname>`", "CODE_LINE=`<lnum>", "CODE_FUNC=<fnnam>", |
| 192 | + "ERRNO=errnum", |
| 193 | + "SYSLOG_IDENTIFIER=<ident>" |
| 194 | + NULL); |
| 195 | + |
| 196 | +4) CEE (Common Event Expression) format string passed to any CEE |
| 197 | +consumer (say lumberjack) |
| 198 | + |
| 199 | +Based on generating @CEE JSON string as per specifications and passing |
| 200 | +it the infrastructure in question. |
| 201 | + |
| 202 | +#### Message ID generation: |
| 203 | + |
| 204 | +1) Some rules for message IDs |
| 205 | + |
| 206 | +- Every message, even if it is the same message FMT, will have a unique |
| 207 | +message ID - Changes to a specific message string, hence will not change |
| 208 | +its ID and also not impact other locations in the code that use the same |
| 209 | +message FMT |
| 210 | + |
| 211 | +2) A glfs-message-id.h file would contain ranges per component for |
| 212 | +individual component based messages to be created without overlapping on |
| 213 | +the ranges. |
| 214 | + |
| 215 | +3) <component>-message.h would contain something as follows, |
| 216 | + |
| 217 | + #define GLFS_COMP_BASE GLFS_MSGID_COMP_<component> |
| 218 | + #define GLFS_NUM_MESSAGES 1 |
| 219 | + #define GLFS_MSGID_END (GLFS_COMP_BASE + GLFS_NUM_MESSAGES + 1) |
| 220 | + /* Messaged with message IDs */ |
| 221 | + #define glfs_msg_start_x GLFS_COMP_BASE, "Invalid: Start of messages" |
| 222 | + /*------------*/ |
| 223 | + #define <component>_msg_1 (GLFS_COMP_BASE + 1), "Test message, replace with"\ |
| 224 | + " original when using the template" |
| 225 | + /*------------*/ |
| 226 | + #define glfs_msg_end_x GLFS_MSGID_END, "Invalid: End of messages" |
| 227 | + |
| 228 | +5) Each call to gf\_msg hence would be, |
| 229 | + |
| 230 | + gf_msg(dom, levl, errnum, glfs_msg_x, ...) |
| 231 | + |
| 232 | +#### Setting per xlator logging levels (SF5): |
| 233 | + |
| 234 | +short description to be elaborated later |
| 235 | + |
| 236 | +Leverage this-\>loglevel to override the global loglevel. This can be |
| 237 | +also configured from gluster CLI at runtime to change the log levels at |
| 238 | +a per xlator level for targeted debugging. |
| 239 | + |
| 240 | +#### Multiple log suppression(SF3): |
| 241 | + |
| 242 | +short description to be elaborated later |
| 243 | + |
| 244 | +1) Save the message string as follows, Msg\_Object(msgid, |
| 245 | +msgstring(vasprintf(dom, fmt)), timestamp, repetitions) |
| 246 | + |
| 247 | +2) On each message received by the logging infrastructure check the |
| 248 | +list of saved last few Msg\_Objects as follows, |
| 249 | + |
| 250 | +2.1) compare msgid and on success compare msgstring for a match, compare |
| 251 | +repetition tolerance time with current TS and saved TS in the |
| 252 | +Msg\_Object |
| 253 | + |
| 254 | +2.1.1) if tolerance is within limits, increment repetitions and do not |
| 255 | +print message |
| 256 | + |
| 257 | +2.1.2) if tolerance is outside limits, print repetition count for saved |
| 258 | +message (if any) and print the new message |
| 259 | + |
| 260 | +2.2) If none of the messages match the current message, knock off the |
| 261 | +oldest message in the list printing any repetition count message for the |
| 262 | +same, and stash new message into the list |
| 263 | + |
| 264 | +The key things to remember and act on here would be to, minimize the |
| 265 | +string duplication on each message, and also to keep the comparison |
| 266 | +quick (hence base it off message IDs and errno to start with) |
| 267 | + |
| 268 | +#### Message catalogue (SF7): |
| 269 | + |
| 270 | +<short description to be elaborated later> |
| 271 | + |
| 272 | +The idea is to use Doxygen comments in the <component>-message.h per |
| 273 | +component, to list information in various sections per message of |
| 274 | +consequence and later use Doxygen to publish this catalogue on a per |
| 275 | +release basis. |
| 276 | + |
| 277 | +Benefit to GlusterFS |
| 278 | +-------------------- |
| 279 | + |
| 280 | +The mentioned limitations and auto log analysis benefits would accrue |
| 281 | +for GlusterFS |
| 282 | + |
| 283 | +Scope |
| 284 | +----- |
| 285 | + |
| 286 | +### Nature of proposed change |
| 287 | + |
| 288 | +All gf\_logXXX function invocations would change to gf\_msgXXX |
| 289 | +invocations. |
| 290 | + |
| 291 | +### Implications on manageability |
| 292 | + |
| 293 | +None |
| 294 | + |
| 295 | +### Implications on presentation layer |
| 296 | + |
| 297 | +None |
| 298 | + |
| 299 | +### Implications on persistence layer |
| 300 | + |
| 301 | +None |
| 302 | + |
| 303 | +### Implications on 'GlusterFS' backend |
| 304 | + |
| 305 | +None |
| 306 | + |
| 307 | +### Modification to GlusterFS metadata |
| 308 | + |
| 309 | +None |
| 310 | + |
| 311 | +### Implications on 'glusterd' |
| 312 | + |
| 313 | +None |
| 314 | + |
| 315 | +How To Test |
| 316 | +----------- |
| 317 | + |
| 318 | +A separate test utility that tests various logs and formats would be |
| 319 | +provided to ensure that functionality can be tested independent of |
| 320 | +GlusterFS |
| 321 | + |
| 322 | +User Experience |
| 323 | +--------------- |
| 324 | + |
| 325 | +Users would notice changed logging formats as mentioned above, the |
| 326 | +additional field of importance would be the MSGID: |
| 327 | + |
| 328 | +Dependencies |
| 329 | +------------ |
| 330 | + |
| 331 | +None |
| 332 | + |
| 333 | +Documentation |
| 334 | +------------- |
| 335 | + |
| 336 | +Intending to add a logging.md (or modify the same) to elaborate on how a |
| 337 | +new component should now use the new framework and generate messages |
| 338 | +with IDs in the same. |
| 339 | + |
| 340 | +Status |
| 341 | +------ |
| 342 | + |
| 343 | +In development (see, <http://review.gluster.org/#/c/6547/> ) |
| 344 | + |
| 345 | +Comments and Discussion |
| 346 | +----------------------- |
| 347 | + |
| 348 | +<Follow here> |
0 commit comments