Description
Is there an existing issue for this?
- I have searched the existing issues
Description
When Hue attempts to autocomplete Hive queries, it fails with an IndexError while parsing HiveServer2 addresses from ZooKeeper. The error occurs because:
The current get_zk_hs2() function assumes all child nodes under the ZooKeeper path (HIVE_DISCOVERY_HIVESERVER2_ZNODE) contain a sequence=\d+ pattern.
If a node does not match this pattern, re.findall() returns an empty list, causing [0] to throw IndexError: list index out of range.
Steps To Reproduce
the sequence_nodes in hiveservers is: [‘serverUri=xxx:10000;version=3.1.2;sequence=0000000052’, ‘serverUri=xxx:10000;version=3.1.2;sequence=0000000051’]
Error Trace:
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in get_zk_hs2
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
IndexError: list index out of range
Root Cause
Unsafe Sorting: The code blindly sorts all child nodes using sequence=\d+ without validating if the pattern exists.
No Error Handling: Missing checks for malformed/non-sequence nodes (e.g., empty or invalid entries).
Expected Behavior
Filter Valid Nodes: Only nodes matching sequence=\d+ should be processed.
Graceful Degradation: If no valid nodes exist, log a warning and return an empty list instead of crashing.
Correct Sorting: Sort nodes by their numeric sequence value (e.g., 0000000052 → 52).
Proposed Fix
The fixed version (already implemented) includes:
Pre-Filtering: Uses re.search() to validate nodes before sorting.
Safe Extraction: Extracts sequence numbers as integers (sequence=(\d+)) to avoid string comparison issues.
Logging: Adds debug logs to track valid/invalid nodes.
Code Snippet:
sequence_nodes = [x for x in hiveservers if re.search(r'sequence=\d+', x)]
if sequence_nodes:
sequence_nodes.sort(key=lambda x: int(re.findall(r'sequence=(\d+)', x)[0]))
hiveservers = sequence_nodes
else:
LOG.warning("No nodes matching 'sequence=\d+' found under {0}".format(znode))
Logs
[27/Apr/2025 11:02:26 +0800] decorators ERROR Error running autocomplete
Traceback (most recent call last):
File "/usr/local/hue/desktop/libs/notebook/src/notebook/decorators.py", line 119, in wrapper
return f(*args, **kwargs)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/api.py", line 752, in autocomplete
autocomplete_data = get_api(request, snippet).autocomplete(snippet, database, table, column, nested, action)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 107, in decorator
return func(*args, **kwargs)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 551, in autocomplete
db = self._get_db(snippet, interpreter=self.interpreter)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 810, in _get_db
return dbms.get(self.user, query_server=get_query_server_config(name=name, connector=interpreter))
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 226, in get_query_server_config
hiveservers = get_zk_hs2()
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in get_zk_hs2
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
IndexError: list index out of range
Hue version
4.11