Skip to content

Fix IndexError in HiveServer2 Address Parsing via ZooKeeper #4146

Open
@dill21yu

Description

@dill21yu

Is there an existing issue for this?

  • I have searched the existing issues

Description

When Hue attempts to autocomplete Hive queries, it fails with an IndexError while parsing HiveServer2 addresses from ZooKeeper. The error occurs because:

The current get_zk_hs2() function assumes all child nodes under the ZooKeeper path (HIVE_DISCOVERY_HIVESERVER2_ZNODE) contain a sequence=\d+ pattern.
If a node does not match this pattern, re.findall() returns an empty list, causing [0] to throw IndexError: list index out of range.

Steps To Reproduce

the sequence_nodes in hiveservers is: [‘serverUri=xxx:10000;version=3.1.2;sequence=0000000052’, ‘serverUri=xxx:10000;version=3.1.2;sequence=0000000051’]
Error Trace:
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in get_zk_hs2
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
IndexError: list index out of range
Root Cause
Unsafe Sorting: The code blindly sorts all child nodes using sequence=\d+ without validating if the pattern exists.
No Error Handling: Missing checks for malformed/non-sequence nodes (e.g., empty or invalid entries).
Expected Behavior
Filter Valid Nodes: Only nodes matching sequence=\d+ should be processed.
Graceful Degradation: If no valid nodes exist, log a warning and return an empty list instead of crashing.
Correct Sorting: Sort nodes by their numeric sequence value (e.g., 0000000052 → 52).
Proposed Fix
The fixed version (already implemented) includes:

Pre-Filtering: Uses re.search() to validate nodes before sorting.
Safe Extraction: Extracts sequence numbers as integers (sequence=(\d+)) to avoid string comparison issues.
Logging: Adds debug logs to track valid/invalid nodes.
Code Snippet:
sequence_nodes = [x for x in hiveservers if re.search(r'sequence=\d+', x)]
if sequence_nodes:
sequence_nodes.sort(key=lambda x: int(re.findall(r'sequence=(\d+)', x)[0]))
hiveservers = sequence_nodes
else:
LOG.warning("No nodes matching 'sequence=\d+' found under {0}".format(znode))

Logs

[27/Apr/2025 11:02:26 +0800] decorators ERROR Error running autocomplete
Traceback (most recent call last):
File "/usr/local/hue/desktop/libs/notebook/src/notebook/decorators.py", line 119, in wrapper
return f(*args, **kwargs)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/api.py", line 752, in autocomplete
autocomplete_data = get_api(request, snippet).autocomplete(snippet, database, table, column, nested, action)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 107, in decorator
return func(*args, **kwargs)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 551, in autocomplete
db = self._get_db(snippet, interpreter=self.interpreter)
File "/usr/local/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 810, in _get_db
return dbms.get(self.user, query_server=get_query_server_config(name=name, connector=interpreter))
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 226, in get_query_server_config
hiveservers = get_zk_hs2()
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in get_zk_hs2
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 107, in
hiveservers.sort(key=lambda x: re.findall(r'sequence=\d+', x)[0])
IndexError: list index out of range

Hue version

4.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    BUGIssue type for reporting failure due to bug in functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions