Skip to content

Use opaque id in task cancellation assertion #110680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 12, 2024

Conversation

mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Jul 9, 2024

Add use of Opaque ID HTTP header in task cancellation assertion. In some tests, like this #88201 testCatSegmentsRestCancellation, we assert that all tasks related to specific HTTP request are cancelled. But we do blanket approach in assertion block catching all tasks by action name. I think narrowing down assertion to specific http request in this case would be more accurate.

It is still not clear why test mentioned above failing, but after hours of investigation and injecting random delays, I'm inclining more to @DaveCTurner's comment about interference from other tests or cluster activity. I added additional log that will report when we spot task with different opaque id.

@mhl-b mhl-b added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.15.1 labels Jul 9, 2024
@mhl-b mhl-b requested review from pxsalehi and DaveCTurner July 9, 2024 23:40
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to me. I left some inline comments mainly about checking expectations more strictly.

@@ -70,6 +72,14 @@ public static void assertAllCancellableTasksAreCancelled(String actionPrefix) th
for (CancellableTask cancellableTask : taskManager.getCancellableTasks().values()) {
if (cancellableTask.getAction().startsWith(actionPrefix)) {
logger.trace("--> found task with prefix [{}]: [{}]", actionPrefix, cancellableTask);
if (taskOpaqueIdMatches(cancellableTask.headers(), opaqueId) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not leniently skip non-matching tasks: we shouldn't have any such tasks in the first place. If this is the problem then I'd prefer to track down where those tasks are coming from and get rid of them at source.

Let's assert that the task ID matches here instead.

// returns new or existing id
private static String setOpaqueIdIfNotExists(Request request) {
String opaqueId;
if (request.getOptions().containsHeader(Task.X_OPAQUE_ID_HTTP_HEADER)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dead code today since none of the requests include an opaque-id header. Let's just assert it's absent.

}

private static boolean taskOpaqueIdMatches(Map<String, String> taskHeaders, @Nullable String opaqueId) {
if (opaqueId == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assert the opaque ID is missing if we didn't expect to see one.

@mhl-b
Copy link
Contributor Author

mhl-b commented Jul 11, 2024

@DaveCTurner thank you for feedback. I added assertions and test names to the opaque id. Also added more explicit exception message in assertAllCancellableTasksAreCancelled that will show all tasks rather than the one that does not pass.

@mhl-b mhl-b requested a review from DaveCTurner July 11, 2024 03:30
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one suggestion about how to identify the test more neatly

@@ -71,11 +72,16 @@ protected boolean addMockInternalEngine() {
return false;
}

void runTest(Request request, String actionPrefix) throws Exception {
void runTest(String testName, Request request, String actionPrefix) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing in a string that (hopefully) identifies the test, I'd suggest using org.apache.lucene.tests.util.LuceneTestCase#getTestName (and maybe org.apache.lucene.tests.util.LuceneTestCase#getTestClass).

@@ -19,9 +19,9 @@
*/
public class CancellableTask extends Task {

private final SubscribableListener<Void> listeners = new SubscribableListener<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: did you mean to move this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied formatting to entire file...

public static void assertAllTasksHaveFinished(String actionPrefix) throws Exception {
logger.info("--> checking that all tasks with prefix {} have finished", actionPrefix);
assertBusy(() -> {
final List<TaskInfo> tasks = client().admin().cluster().prepareListTasks().get().getTasks();
assertTrue(tasks.toString(), tasks.stream().noneMatch(t -> t.action().startsWith(actionPrefix)));
});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra line?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhl-b mhl-b added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 11, 2024
@mhl-b mhl-b changed the title Use optional opaque id in task cancellation assertion Use opaque id in task cancellation assertion Jul 11, 2024
@elasticsearchmachine elasticsearchmachine merged commit c1c5fe6 into elastic:main Jul 12, 2024
16 checks passed
tvernum pushed a commit that referenced this pull request Feb 25, 2025
Add use of Opaque ID HTTP header in task cancellation assertion. In some
tests, like this #88201 `testCatSegmentsRestCancellation`, we assert
that all tasks related to specific HTTP request are cancelled. But we do
blanket approach in assertion block catching all tasks by action name. I
think narrowing down assertion to specific http request in this case
would be more accurate.

It is still not clear why test mentioned above failing, but after hours
of investigation and injecting random delays, I'm inclining more to
@DaveCTurner's comment about interference from other tests or cluster
activity. I added additional log that will report when we spot task with
different opaque id.
tvernum pushed a commit that referenced this pull request Feb 25, 2025
Add use of Opaque ID HTTP header in task cancellation assertion. In some
tests, like this #88201 `testCatSegmentsRestCancellation`, we assert
that all tasks related to specific HTTP request are cancelled. But we do
blanket approach in assertion block catching all tasks by action name. I
think narrowing down assertion to specific http request in this case
would be more accurate.

It is still not clear why test mentioned above failing, but after hours
of investigation and injecting random delays, I'm inclining more to
@DaveCTurner's comment about interference from other tests or cluster
activity. I added additional log that will report when we spot task with
different opaque id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants