Relax the dir name check, username can contain .. #47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Between Hadoop 3.3.1 and 3.3.6, new checks were introduced for security purposes. However, those checks were not nearly precise enough, searching for the existence of
..
in any part of a path. Usernames can include..
, which is acknowledged in some paths (user specific ones), but not in ones that include user as part of a greater path.Solution
Rather than use the simplistic
strstr
check, we're usingstrtok_r
to tokenize each part of a path to check if any of those elements exactly matches..
. This better captures the intent of the check rather than search for it anywhere in the path string.These changes are all targeted exclusively at checks introduced between HubSpot 3.3.1 (working) and 3.3.6 (currently incompatible with some jobs).
Alternate Solution Considered
We also considered changing the pattern for kerberos ID generation to remap any instances of
..
to another character(s). However, there are very specific conditions and requirements at play here that make this solution non-viable..
was chosen as an alternative to-
it wasn't used by URL safe base64 encoding, but wasn't dangerous to our use cases. Any character that can be used by that encoding is invalid as an option (this includes_
).hs-secret-agent
use, meaning condensing the characters down is also not viable.-
.Ultimately, changing this source code to be more specific is the safest option for currently running jobs and risks fewer future incompatibilities. It's also advantageous as it doesn't require changing multiple other pieces.
Testing
Sadly, the unit test framework for this code is not great and instead relies upon the username of the build entity to do validation. I wrote an impromptu test program to validate the function (see gist below). This will be further tested in QA.