Skip to content

Commit 80db5c9

Browse files
committed
adding feature planning
Signed-off-by: shravantc <[email protected]>
1 parent f0f84ed commit 80db5c9

30 files changed

+4701
-0
lines changed
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
Feature
2+
-------
3+
4+
AFR CLI enhancements
5+
6+
SUMMARY
7+
-------
8+
9+
Presently the AFR reporting via CLI has lots of problems in the
10+
representation of logs because of which they may not be able to use the
11+
data effectively. This feature is to correct these problems and provide
12+
a coherent mechanism to present heal status,information and the logs
13+
associated.
14+
15+
Owners
16+
------
17+
18+
Venkatesh Somayajulu
19+
Raghavan
20+
21+
Current status
22+
--------------
23+
24+
There are many bugs related to this which indicates the current status
25+
and why these requirements are required.
26+
27+
​1) 924062 - gluster volume heal info shows only gfids in some cases and
28+
sometimes names. This is very confusing for the end user.
29+
30+
​2) 852294 - gluster volume heal info hangs/crashes when there is a
31+
large number of entries to be healed.
32+
33+
​3) 883698 - when self heal daemon is turned off, heal info does not
34+
show any output. But healing can happen because of lookups from IO path.
35+
Hence list of entries to be healed still needs to be shown.
36+
37+
​4) 921025 - directories are not reported when list of split brain
38+
entries needs to be displayed.
39+
40+
​5) 981185 - when self heal daemon process is offline, volume heal info
41+
gives error as "staging failure"
42+
43+
​6) 952084 - We need a command to resolve files in split brain state.
44+
45+
​7) 986309 - We need to report source information for files which got
46+
healed during a self heal session.
47+
48+
​8) 986317 - Sometimes list of files to get healed also includes files
49+
to which IO s being done since the entries for these files could be in
50+
the xattrop directory. This could be confusing for the user.
51+
52+
There is a master bug 926044 that sums up most of the above problems. It
53+
does give the QA perspective of the current representation out of the
54+
present reporting infrastructure.
55+
56+
Detailed Description
57+
--------------------
58+
59+
​1) One common thread among all the above complaints is that the
60+
information presented to the user is <B>FUD</B> because of the following
61+
reasons:
62+
63+
(a) Split brain itself is a scary scenario especially with VMs.
64+
(b) The data that we present to the users cannot be used in a stable
65+
manner for them to get to the list of these files. <I>For ex:</I> we
66+
need to give mechanisms by which he can automate the resolution out
67+
of split brain.
68+
(c) The logs that are generated are all the more scarier since we
69+
see repetition of some error lines running into hundreds of lines.
70+
Our mailing lists are filled with such emails from end users.
71+
72+
Any data is useless unless it is associated with an event. For self
73+
heal, the event that leads to self heal is the loss of connectivity to a
74+
brick from a client. So all healing info and especially split brain
75+
should be associated with such events.
76+
77+
The following is hence the proposed mechanism:
78+
79+
(a) Every loss of a brick from client's perspective is logged and
80+
available via some ID. The information provides the time from when
81+
the brick went down to when it came up. Also it should also report
82+
the number of IO transactions(modifies) that hapenned during this
83+
event.
84+
(b) The list of these events are available via some CLI command. The
85+
actual command needs to be detailed as part of this feature.
86+
(c) All volume info commands regarding list of files to be healed,
87+
files healed and split brain files should be associated with this
88+
event(s).
89+
90+
​2) Provide a mechanism to show statistics at a volume and replica group
91+
level. It should show the number of files to be healed and number of
92+
split brain files at both the volume and replica group level.
93+
94+
​3) Provide a mechanism to show per volume list of files to be
95+
healed/files healed/split brain in the following info:
96+
97+
This should have the following information:
98+
99+
(a) File name
100+
(b) Bricks location
101+
(c) Event association (brick going down)
102+
(d) Source
103+
(v) Sink
104+
105+
​4) Self heal crawl statistics - Introduce new CLI commands for showing
106+
more information on self heal crawl per volume.
107+
108+
(a) Display why a self heal crawl ran (timeouts, brick coming up)
109+
(b) Start time and end time
110+
(c) Number of files it attempted to heal
111+
(d) Location of the self heal daemon
112+
113+
​5) Scale the logging infrastructure to handle huge number of file list
114+
that needs to be displayed as part of the logging.
115+
116+
(a) Right now the system crashes or hangs in case of a high number
117+
of files.
118+
(b) It causes CLI timeouts arbitrarily. The latencies involved in
119+
the logging have to be studied (profiled) and mechanisms to
120+
circumvent them have to be introduced.
121+
(c) All files are displayed on the output. Have a better way of
122+
representing them.
123+
124+
Options are:
125+
126+
(a) Maybe write to a glusterd log file or have a seperate directory
127+
for afr heal logs.
128+
(b) Have a status kind of command. This will display the current
129+
status of the log building and maybe have batched way of
130+
representing when there is a huge list.
131+
132+
​6) We should provide mechanism where the user can heal split brain by
133+
some pre-established policies:
134+
135+
(a) Let the system figure out the latest files (assuming all nodes
136+
are in time sync) and choose the copies that have the latest time.
137+
(b) Choose one particular brick as the source for split brain and
138+
heal all split brains from this brick.
139+
(c) Just remove the split brain information from changelog. We leave
140+
the exercise to the user to repair split brain where in he would
141+
rewrite to the split brained files. (right now the user is forced to
142+
remove xattrs manually for this step).
143+
144+
Benefits to GlusterFS
145+
--------------------
146+
147+
Makes the end user more aware of healing status and provides statistics.
148+
149+
Scope
150+
-----
151+
152+
6.1. Nature of proposed change
153+
154+
Modification to AFR and CLI and glusterd code
155+
156+
6.2. Implications on manageability
157+
158+
New CLI commands to be added. Existing commands to be improved.
159+
160+
6.3. Implications on presentation layer
161+
162+
N/A
163+
164+
6.4. Implications on persistence layer
165+
166+
N/A
167+
168+
6.5. Implications on 'GlusterFS' backend
169+
170+
N/A
171+
172+
6.6. Modification to GlusterFS metadata
173+
174+
N/A
175+
176+
6.7. Implications on 'glusterd'
177+
178+
Changes for healing specific commands will be introduced.
179+
180+
How To Test
181+
-----------
182+
183+
See documentation session
184+
185+
User Experience
186+
---------------
187+
188+
*Changes in CLI, effect on User experience...*
189+
190+
Documentation
191+
-------------
192+
193+
<http://review.gluster.org/#/c/7792/1/doc/features/afr-statistics.md>
194+
195+
Status
196+
------
197+
198+
Patches :
199+
200+
<http://review.gluster.org/6044> <http://review.gluster.org/4790>
201+
202+
Status:
203+
204+
Merged
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
Feature
2+
-------
3+
4+
Provide a capability to:
5+
6+
- Probe the type (posix or bd) of volume.
7+
- Provide list of capabilities of a xlator/volume. For example posix
8+
xlator could support zerofill, BD xlator could support offloaded
9+
copy, thin provisioning etc
10+
11+
Summary
12+
-------
13+
14+
With multiple storage translators (posix and bd) being supported in
15+
GlusterFS, it becomes necessary to know the volume type so that user can
16+
issue appropriate calls that are relevant only to the a given volume
17+
type. Hence there needs to be a way to expose the type of the storage
18+
translator of the volume to the user.
19+
20+
BD xlator is capable of providing server offloaded file copy,
21+
server/storage offloaded zeroing of a file etc. This capabilities should
22+
be visible to the client/user, so that these features can be exploited.
23+
24+
Owners
25+
------
26+
27+
M. Mohan Kumar
28+
Bharata B Rao.
29+
30+
Current status
31+
--------------
32+
33+
BD xlator exports capability information through gluster volume info
34+
(and --xml) output. For eg:
35+
36+
*snip of gluster volume info output for a BD based volume*
37+
38+
Xlator 1: BD
39+
Capability 1: thin
40+
41+
*snip of gluster volume info --xml output for a BD based volume*
42+
43+
<xlators>
44+
  <xlator>
45+
    <name>BD</name>
46+
    <capabilities>
47+
      <capability>thin</capability>
48+
    </capabilities>
49+
  </xlator>
50+
</xlators>
51+
52+
But this capability information should also exposed through some other
53+
means so that a host which is not part of Gluster peer could also avail
54+
this capabilities.
55+
56+
Exposing about type of volume (ie posix or BD) is still in conceptual
57+
state currently and needs discussion.
58+
59+
Detailed Description
60+
--------------------
61+
62+
1. Type
63+
- BD translator supports both regular files and block device,
64+
i,e., one can create files on GlusterFS volume backed by BD
65+
translator and this file could end up as regular posix file or a
66+
logical volume (block device) based on the user's choice. User
67+
can do a setxattr on the created file to convert it to a logical
68+
volume.
69+
- Users of BD backed volume like QEMU would like to know that it
70+
is working with BD type of volume so that it can issue an
71+
additional setxattr call after creating a VM image on GlusterFS
72+
backend. This is necessary to ensure that the created VM image
73+
is backed by LV instead of file.
74+
- There are different ways to expose this information (BD type of
75+
volume) to user. One way is to export it via a getxattr call.
76+
77+
2. Capabilities
78+
- BD xlator supports new features such as server offloaded file
79+
copy, thin provisioned VM images etc (there is a patch posted to
80+
Gerrit to add server offloaded file zeroing in posix xlator).
81+
There is no standard way of exploiting these features from
82+
client side (such as syscall to exploit server offloaded copy).
83+
So these features need to be exported to the client so that they
84+
can be used. BD xlator V2 patch exports these capabilities
85+
information through gluster volume info (and --xml) output. But
86+
if a client is not part of GlusterFS peer it can't run volume
87+
info command to get the list of capabilities of a given
88+
GlusterFS volume. Also GlusterFS block driver in qemu need to
89+
get the capability list so that these features are used.
90+
91+
Benefit to GlusterFS
92+
--------------------
93+
94+
Enables proper consumption of BD xlator and client exploits new features
95+
added in both posix and BD xlator.
96+
97+
### Scope
98+
99+
Nature of proposed change
100+
-------------------------
101+
102+
- Quickest way to expose volume type to a client can be achieved by
103+
using getxattr fop. When a client issues getxattr("volume\_type") on
104+
a root gfid, bd xlator will return 1 implying its BD xlator. But
105+
posix xlator will return ENODATA and client code can interpret this
106+
as posix xlator.
107+
108+
- Also capability list can be returned via getxattr("caps") for root
109+
gfid.
110+
111+
Implications on manageability
112+
-----------------------------
113+
114+
None.
115+
116+
Implications on presentation layer
117+
----------------------------------
118+
119+
N/A
120+
121+
Implications on persistence layer
122+
---------------------------------
123+
124+
N/A
125+
126+
Implications on 'GlusterFS' backend
127+
-----------------------------------
128+
129+
N/A
130+
131+
Modification to GlusterFS metadata
132+
----------------------------------
133+
134+
N/A
135+
136+
Implications on 'glusterd'
137+
--------------------------
138+
139+
N/A
140+
141+
How To Test
142+
-----------
143+
144+
User Experience
145+
---------------
146+
147+
Dependencies
148+
------------
149+
150+
Documentation
151+
-------------
152+
153+
Status
154+
------
155+
156+
Patch : <http://review.gluster.org/#/c/4809/>
157+
158+
Status : Merged
159+
160+
Comments and Discussion
161+
-----------------------
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
GlusterFS 3.5 Release
2+
---------------------
3+
4+
Tentative Dates:
5+
6+
<Strong>Latest: 13-Nov, 2014 GlusterFS 3.5.3 </Strong>
7+
8+
17th Apr, 2014 - 3.5.0 GA
9+
10+
GlusterFS 3.5
11+
-------------
12+
13+
### Features in 3.5.0
14+
15+
- [Features/AFR CLI enhancements](./AFR CLI enhancements.md)
16+
- [Features/exposing volume capabilities](./Exposing Volume Capabilities.md)
17+
18+
Proposing New Features
19+
----------------------
20+
21+
New feature proposals should be built using the New Feature Template in
22+
the GlusterFS 3.7 planning page

0 commit comments

Comments
 (0)