Skip to content

Commit 149ac7b

Browse files
committed
adding feature planning pages
Signed-off-by: shravantc <[email protected]>
1 parent 7097607 commit 149ac7b

14 files changed

+2293
-2
lines changed
Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
Feature
2+
-------
3+
4+
Gluster logging enhancements to support message IDs per message
5+
6+
Summary
7+
-------
8+
9+
Enhance gluster logging to provide the following features, SubFeature
10+
--\> SF
11+
12+
- SF1: Add message IDs to message
13+
14+
- SF2: Standardize error num reporting across messages
15+
16+
- SF3: Enable repetitive message suppression in logs
17+
18+
- SF4: Log location and hierarchy standardization (in case anything is
19+
further required here, analysis pending)
20+
21+
- SF5: Enable per sub-module logging level configuration
22+
23+
- SF6: Enable logging to other frameworks, than just the current gluster
24+
logs
25+
26+
- SF7: Generate a catalogue of these message, with message ID, message,
27+
reason for occurrence, recovery/troubleshooting steps.
28+
29+
Owners
30+
------
31+
32+
Balamurugan Arumugam <[email protected]>
33+
Krishnan Parthasarathi <[email protected]>
34+
Krutika Dhananjay <[email protected]>
35+
Shyamsundar Ranganathan <[email protected]>
36+
37+
Current status
38+
--------------
39+
40+
### Existing infrastructure:
41+
42+
Currently gf\_logXXX exists as an infrastructure API for all logging
43+
related needs. This (typically) takes the form,
44+
45+
gf\_log(dom, levl, fmt...)
46+
47+
where,
48+
49+
   dom: Open format string usually the xlator name, or "cli" or volume name etc.
50+
   levl: One of, GF_LOG_EMERG, GF_LOG_ALERT, GF_LOG_CRITICAL, GF_LOG_ERROR, GF_LOG_WARNING, GF_LOG_NOTICE, GF_LOG_INFO, GF_LOG_DEBUG, GF_LOG_TRACE
51+
   fmt: the actual message string, followed by the required arguments in the string
52+
53+
The log initialization happens through,
54+
55+
gf\_log\_init (void \*data, const char \*filename, const char \*ident)
56+
57+
where,
58+
59+
   data: glusterfs_ctx_t, largely unused in logging other than the required FILE and mutex fields
60+
   filename: file name to log to
61+
   ident: Like syslog ident parameter, largely unused
62+
63+
The above infrastructure leads to logs of type, (sample extraction from
64+
nfs.log)
65+
66+
    [2013-12-08 14:17:17.603879] I [socket.c:3485:socket_init] 0-socket.ACL: SSL support is NOT enabled
67+
    [2013-12-08 14:17:17.603937] I [socket.c:3500:socket_init] 0-socket.ACL: using system polling thread
68+
    [2013-12-08 14:17:17.612128] I [nfs.c:934:init] 0-nfs: NFS service started
69+
    [2013-12-08 14:17:17.612383] I [dht-shared.c:311:dht_init_regex] 0-testvol-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
70+
71+
### Limitations/Issues in the infrastructure
72+
73+
​1) Auto analysis of logs needs to be done based on the final message
74+
string. Automated tools that can help with log message and related
75+
troubleshooting options need to use the final string, which needs to be
76+
intelligently parsed and also may change between releases. It would be
77+
desirable to have message IDs so that such tools and trouble shooting
78+
options can leverage the same in a much easier fashion.
79+
80+
​2) The log message itself currently does not use the \_ident\_ which
81+
can help as we move to more common logging frameworks like journald,
82+
rsyslog (or syslog as the case maybe)
83+
84+
​3) errno is the primary identifier of errors across gluster, i.e we do
85+
not have error codes in gluster and use errno values everywhere. The log
86+
messages currently do not lend themselves to standardization like
87+
printing the string equivalent of errno rather than the actual errno
88+
value, which \_could\_ be cryptic to administrators
89+
90+
​4) Typical logging infrastructures provide suppression (on a
91+
configurable basis) for repetitive messages to prevent log flooding,
92+
this is currently missing in the current infrastructure
93+
94+
​5) The current infrastructure cannot be used to control log levels at a
95+
per xlator or sub module, as the \_dom\_ passed is a string that change
96+
based on volume name, translator name etc. It would be desirable to have
97+
a better module identification mechanism that can help with this
98+
feature.
99+
100+
​6) Currently the entire logging infrastructure resides within gluster.
101+
It would be desirable in scaled situations to have centralized logging
102+
and monitoring solutions in place, to be able to better analyse and
103+
monitor the cluster health and take actions.
104+
105+
This requires some form of pluggable logging frameworks that can be used
106+
within gluster to enable this possibility. Currently the existing
107+
framework is used throughout gluster and hence we need only to change
108+
configuration and logging.c to enable logging to other frameworks (as an
109+
example the current syslog plug that was provided).
110+
111+
It would be desirable to enhance this to provide a more robust framework
112+
for future extensions to other frameworks. This is not a limitation of
113+
the current framework, so much as a re-factor to be able to switch
114+
logging frameworks with more ease.
115+
116+
​7) For centralized logging in the future, it would need better
117+
identification strings from various gluster processes and hosts, which
118+
is currently missing or suppressed in the logging infrastructure.
119+
120+
Due to the nature of enhancements proposed, it is required that we
121+
better the current infrastructure for the stated needs and do some
122+
future proofing in terms of newer messages that would be added.
123+
124+
Detailed Description
125+
--------------------
126+
127+
NOTE: Covering details for SF1, SF2, and partially SF3, SF5, SF6. SF4/7
128+
will be covered in later revisions/phases.
129+
130+
### Logging API changes:
131+
132+
​1) Change the logging API as follows,
133+
134+
From: gf\_log(dom, levl, fmt...)
135+
136+
To: gf\_msg(dom, levl, errnum, msgid, fmt...)
137+
138+
Where:
139+
140+
   dom: Open string as used in the current logging infrastructure (helps in backward compat)
141+
   levl: As in current logging infrastructure (current levels seem sufficient enough to not add more levels for better debuggability etc.)
142+
   <new fields>
143+
   msgid: A message identifier, unique to this message FMT string and possibly this invocation. (SF1, lending to SF3)
144+
   errnum: The errno that this message is generated for (with an implicit 0 meaning no error number per se with this message) (SF2)
145+
146+
NOTE: Internally the gf\_msg would still be a macro that would add the
147+
\_\_FILE\_\_ \_\_LINE\_\_ \_\_FUNCTION\_\_ arguments
148+
149+
​2) Enforce \_ident\_ in the logging initialization API, gf\_log\_init
150+
(void \*data, const char \*filename, const char \*ident)
151+
152+
Where:
153+
154+
ident would be the identifier string like, nfs, <mountpoint>, brick-<brick-name>, cli, glusterd, as is the case with the log file name that is generated today (lending to SF6)
155+
156+
#### What this achieves:
157+
158+
With the above changes, we now have a message ID per message
159+
(\_msgid\_), location of the message in terms of which component
160+
(\_dom\_) and which process (\_ident\_). The further identification of
161+
the message location in terms of host (ip/name) can be done in the
162+
framework, when centralized logging infrastructure is introduced.
163+
164+
#### Log message changes:
165+
166+
With the above changes to the API the log message can now appear in a
167+
compatibility mode to adhere to current logging format, or be presented
168+
as follows,
169+
170+
log invoked as: gf\_msg(dom, levl, ENOTSUP, msgidX)
171+
172+
Example: gf\_msg ("logchecks", GF\_LOG\_CRITICAL, 22, logchecks\_msg\_4,
173+
42, "Forty-Two", 42);
174+
175+
Where: logchecks\_msg\_4 (GLFS\_COMP\_BASE + 4), "Critical: Format
176+
testing: %d:%s:%x"
177+
178+
​1) Gluster logging framework (logged as)
179+
180+
[2014-02-17 08:52:28.038267] I [MSGID: 1002] [logchecks.c:44:go_log] 0-logchecks: Informational: Format testing: 42:Forty-Two:2a [Invalid argument]
181+
182+
​2) syslog (passed as)
183+
184+
Feb 17 14:17:42 somari logchecks[26205]: [MSGID: 1002] [logchecks.c:44:go_log] 0-logchecks: Informational: Format testing: 42:Forty-Two:2a [Invalid argument]
185+
186+
​3) journald (passed as)
187+
188+
   sd_journal_send("MESSAGE=<vasprintf(dom, msgid(fmt))>",
189+
                       "MESSAGE_ID=msgid",
190+
                       "PRIORITY=levl",
191+
                       "CODE_FILE=`<fname>`", "CODE_LINE=`<lnum>", "CODE_FUNC=<fnnam>",
192+
                       "ERRNO=errnum",
193+
                       "SYSLOG_IDENTIFIER=<ident>"
194+
                       NULL);
195+
196+
​4) CEE (Common Event Expression) format string passed to any CEE
197+
consumer (say lumberjack)
198+
199+
Based on generating @CEE JSON string as per specifications and passing
200+
it the infrastructure in question.
201+
202+
#### Message ID generation:
203+
204+
​1) Some rules for message IDs
205+
206+
- Every message, even if it is the same message FMT, will have a unique
207+
message ID - Changes to a specific message string, hence will not change
208+
its ID and also not impact other locations in the code that use the same
209+
message FMT
210+
211+
​2) A glfs-message-id.h file would contain ranges per component for
212+
individual component based messages to be created without overlapping on
213+
the ranges.
214+
215+
​3) <component>-message.h would contain something as follows,
216+
217+
    #define GLFS_COMP_BASE         GLFS_MSGID_COMP_<component>
218+
    #define GLFS_NUM_MESSAGES       1
219+
    #define GLFS_MSGID_END          (GLFS_COMP_BASE + GLFS_NUM_MESSAGES + 1)
220+
    /* Messaged with message IDs */
221+
    #define glfs_msg_start_x GLFS_COMP_BASE, "Invalid: Start of messages"
222+
    /*------------*/
223+
    #define <component>_msg_1 (GLFS_COMP_BASE + 1), "Test message, replace with"\
224+
                       " original when using the template"
225+
    /*------------*/
226+
    #define glfs_msg_end_x GLFS_MSGID_END, "Invalid: End of messages"
227+
228+
​5) Each call to gf\_msg hence would be,
229+
230+
   gf_msg(dom, levl, errnum, glfs_msg_x, ...)
231+
232+
#### Setting per xlator logging levels (SF5):
233+
234+
short description to be elaborated later
235+
236+
Leverage this-\>loglevel to override the global loglevel. This can be
237+
also configured from gluster CLI at runtime to change the log levels at
238+
a per xlator level for targeted debugging.
239+
240+
#### Multiple log suppression(SF3):
241+
242+
short description to be elaborated later
243+
244+
​1) Save the message string as follows, Msg\_Object(msgid,
245+
msgstring(vasprintf(dom, fmt)), timestamp, repetitions)
246+
247+
​2) On each message received by the logging infrastructure check the
248+
list of saved last few Msg\_Objects as follows,
249+
250+
2.1) compare msgid and on success compare msgstring for a match, compare
251+
repetition tolerance time with current TS and saved TS in the
252+
Msg\_Object
253+
254+
2.1.1) if tolerance is within limits, increment repetitions and do not
255+
print message
256+
257+
2.1.2) if tolerance is outside limits, print repetition count for saved
258+
message (if any) and print the new message
259+
260+
2.2) If none of the messages match the current message, knock off the
261+
oldest message in the list printing any repetition count message for the
262+
same, and stash new message into the list
263+
264+
The key things to remember and act on here would be to, minimize the
265+
string duplication on each message, and also to keep the comparison
266+
quick (hence base it off message IDs and errno to start with)
267+
268+
#### Message catalogue (SF7):
269+
270+
<short description to be elaborated later>
271+
272+
The idea is to use Doxygen comments in the <component>-message.h per
273+
component, to list information in various sections per message of
274+
consequence and later use Doxygen to publish this catalogue on a per
275+
release basis.
276+
277+
Benefit to GlusterFS
278+
--------------------
279+
280+
The mentioned limitations and auto log analysis benefits would accrue
281+
for GlusterFS
282+
283+
Scope
284+
-----
285+
286+
### Nature of proposed change
287+
288+
All gf\_logXXX function invocations would change to gf\_msgXXX
289+
invocations.
290+
291+
### Implications on manageability
292+
293+
None
294+
295+
### Implications on presentation layer
296+
297+
None
298+
299+
### Implications on persistence layer
300+
301+
None
302+
303+
### Implications on 'GlusterFS' backend
304+
305+
None
306+
307+
### Modification to GlusterFS metadata
308+
309+
None
310+
311+
### Implications on 'glusterd'
312+
313+
None
314+
315+
How To Test
316+
-----------
317+
318+
A separate test utility that tests various logs and formats would be
319+
provided to ensure that functionality can be tested independent of
320+
GlusterFS
321+
322+
User Experience
323+
---------------
324+
325+
Users would notice changed logging formats as mentioned above, the
326+
additional field of importance would be the MSGID:
327+
328+
Dependencies
329+
------------
330+
331+
None
332+
333+
Documentation
334+
-------------
335+
336+
Intending to add a logging.md (or modify the same) to elaborate on how a
337+
new component should now use the new framework and generate messages
338+
with IDs in the same.
339+
340+
Status
341+
------
342+
343+
In development (see, <http://review.gluster.org/#/c/6547/> )
344+
345+
Comments and Discussion
346+
-----------------------
347+
348+
<Follow here>

0 commit comments

Comments
 (0)