Skip to content

Fix use of precedence in endpoint routing DFA (#20801) #21200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2020

Conversation

rynowak
Copy link
Member

@rynowak rynowak commented Apr 24, 2020

Port of #20801

Description

Endpoint routing in 2.2->3.1 builds its execution graph incorrectly when a catch-all route (ex: {**path}) is used with higher order than a non-catchall route that overlaps it /test/route/{id?}.

As a result the matching behavior of routing does not match the documented behavior. The act of adding a route can cause an existing route to stop matching requests that previously worked.

The case that's impacted (using catch-all) is typical for a CMS-like "slug route" or for other frameworks like OData that integrate with routing or for a proxy.

Customer Impact

  • Was the bug reported by a customer?

Yes. #18677 and #16579.

We've also received reports from internal partners.

  • How does the bug impact the customer

Routing does not behave as documented. Requests that should match the catch-all route will not be matched. They may either be handled by lower priority routes, or "fall through" and result in a 404.

  • Are there any workarounds? If so, why are they not acceptable alternatives?

Depending on the scenario it may be possible to remove or reorder routes. It might be acceptable for the user's use case, but there's no workaround to restore the documented behavior of routing.

The other possible workaround is to disable endpoint routing and use the legacy routing system. This is a big backwards change for many applications. Depending on what features they are using this is not an option - Blazor Server and for instance is not supported on legacy routing.

In practice the cases that I have seen impacted are totally blocked. The user is either blocked from writing an application that functions the way they want, or they are blocked from adding features to an existing application.

Regression?

This is not a regression compared to the previous versions of endpoint routing, this bug existed with endpoint routing in 2.2 and 3.0.

Compared to legacy routing this is a regression. An app that functions on 2.1 could go through our migration guide and be broken by this bug.

Risk

Medium: The bug and fix are well understood and easily testable. We have a matrix of tests that runs with new and old implementations and are able to verify that the behavior is the same with 2.1 after the fix.

However, there's a risk that we could break an application by fixing the bug. Routing is a complex area and it's reasonable that some user might have tested their application and been happy with how it worked.

There's a set of possible cases where the fixed behavior will cause a different route to match than with the bug. Put another way the fix takes the application from a well behaved and stable (but incorrect) state to another well behaved and stable (but correct) state


  • Fix use of precedence in endpoint routing DFA

Fixes: #18677
Fixes: #16579

This is a change to how sorting is use when building endpoint routing's graph of
nodes that is eventually transformed into the route table. There were
bugs in how this was done that made it incompatible in some niche
scenarios both with previous implementations and how we describe the
features in the abstract.

There are a wide array of cases that might have been impacted by this
bug because routing is a pattern language. Generally the bugs will involve a
catch-all, and some something that changes ordering of templates.

Issue #18677 has the simplest repro for this, the following templates
would not behave as expected:

a/{*b}
{a}/{b}

One would expect any URL Path starting with /a to match the first
route, but that's not what happens.


The change supports an opt-out via the following AppContext switch:

Microsoft.AspNetCore.Routing.Use30CatchAllBehavior

Set to true to enable the buggy behavior for compatibility.


The root cause of this bug was an issue in how the algorithm used to be
build the DFA was designed. Specifically that it uses a BFS to build the
graph, and it uses an up-front one-time sort of endpoints in order to
drive that BFS.

The building of the graph has the expectation that at each level, we
will process all literal segments (/a) and then all parameter
segments (/{a}) and then all catch-all segments (/{*a}). Routing
defines a concept called precedence that defines the conceptual
order in while segments types are ordered.

So there are two problems:


The fix is to repeat the sort operation at each level and use precedence
as the only key for sorting (as dictated by the graph building algo).

We do a sort of the matches of each node after building the
precedence-based part of the DFA, based on the full sorting criteria, to
maintain compatibility.

@rynowak rynowak added the * NO MERGE * Do not merge this PR as long as this label is present. label Apr 24, 2020
@ghost ghost added the area-servers label Apr 24, 2020
@rynowak
Copy link
Member Author

rynowak commented Apr 24, 2020

This is a PR against release/3.1 mostly for staging purposes and so we can review the change with the compat switch in place.

@rynowak rynowak linked an issue Apr 24, 2020 that may be closed by this pull request
@rynowak rynowak requested a review from JamesNK April 24, 2020 23:49
Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+need shiproom template

@rynowak
Copy link
Member Author

rynowak commented Apr 25, 2020

+need shiproom template

Sure, I don't really intend for this to be ready for tactics yet. We don't have a branch for it to land in right now...

@Pilchie Pilchie added this to the 3.1.x milestone Apr 25, 2020
@Pilchie
Copy link
Member

Pilchie commented Apr 25, 2020

👀 Tactics can review it ahead of time, and then we can hold it until the branch opens if you want.

@rynowak
Copy link
Member Author

rynowak commented Apr 25, 2020

👀 Tactics can review it ahead of time, and then we can hold it until the branch opens if you want.

I suppose that's ultimately a good thing. Since we're giving out a build of this to someone, I'd really like to give out a build that has whatever solution we decide (opt-in vs opt-out).

@rynowak rynowak added the Servicing-consider Shiproom approval is required for the issue label Apr 28, 2020
@ghost
Copy link

ghost commented Apr 28, 2020

Hello human! Please make sure you've included the Shiproom Template in a comment or (preferably) the PR description. Also, make sure this PR is not marked as a draft and is ready-to-merge.

@Pilchie
Copy link
Member

Pilchie commented Apr 28, 2020

Tactics approved this for 3.1.5 today, but would prefer to have it be an opt-in switch for 3.1 for now. We can revisit that if we continue to hear from people affected by this who aren't finding the switch, but we should default to not changing behavior unless it's from clearly broken to working in servicing.

@Pilchie Pilchie modified the milestones: 3.1.x, 3.1.5 Apr 28, 2020
@rynowak
Copy link
Member Author

rynowak commented Apr 28, 2020

Tactics approved this for 3.1.5 today, but would prefer to have it be an opt-in switch for 3.1 for now. We can revisit that if we continue to hear from people affected by this who aren't finding the switch, but we should default to not changing behavior unless it's from clearly broken to working in servicing.

Should we set this switch in the template? I would expect we're going to hear a lot more about this. The fixed behavior is needed by anyone using the proxy as a library for example.

@Tratcher
Copy link
Member

The fixed behavior is needed by anyone using the proxy as a library for example.

? The proxy uses 5.0.

@rynowak
Copy link
Member Author

rynowak commented Apr 28, 2020

Ok that helps a lot then.

@rynowak rynowak force-pushed the rynowak/3.1-routing-fixes branch from 994bdf0 to 4f5bfd9 Compare April 29, 2020 01:19
@rynowak
Copy link
Member Author

rynowak commented Apr 29, 2020

@JamesNK - can you take a second look. I changed the pattern for the appcontext switch to be similar to our discussion, and I flipped the default behavior from fixed to broken:

  • Renamed switch
  • Inverted polarity
  • Lots of special casing is needed now that the broken behavior is default

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will test duplication be removable in 5.0 when the default behavior changes? If so, worth adding a comment to those tests for future generations?

@rynowak
Copy link
Member Author

rynowak commented Apr 29, 2020

Will test duplication be removable in 5.0 when the default behavior changes? If so, worth adding a comment to those tests for future generations?

The fix is already in 5.0. This change will/should be noop merged into 5.0 and all of this code will just melt away.

We're not porting forward the appcontext switch, etc.

* Fix use of precedence in endpoint routing DFA

Fixes: #18677
Fixes: #16579

This is a change to how sorting is use when building endpoint routing's graph of
nodes that is eventually transformed into the route table. There were
bugs in how this was done that made it incompatible in some niche
scenarios both with previous implementations and how we describe the
features in the abstract.

There are a wide array of cases that might have been impacted by this
bug because routing is a pattern language. Generally the bugs will involve a
catch-all, and some something that changes ordering of templates.

Issue #18677 has the simplest repro for this, the following templates
would not behave as expected:

```
a/{*b}
{a}/{b}
```

One would expect any URL Path starting with `/a` to match the first
route, but that's not what happens.

---

The change supports an opt-in via the following AppContext switch:

```
Microsoft.AspNetCore.Routing.UseCorrectCatchAllBehavior
```

Set to true to enable the correct behavior.

---

The root cause of this bug was an issue in how the algorithm used to be
build the DFA was designed. Specifically that it uses a BFS to build the
graph, and it uses an up-front one-time sort of endpoints in order to
drive that BFS.

The building of the graph has the expectation that at each level, we
will process **all** literal segments (`/a`) and then **all** parameter
segments (`/{a}`) and then **all** catch-all segments (`/{*a}`). Routing
defines a concept called *precedence* that defines the *conceptual*
order in while segments types are ordered.

So there are two problems:

- We sort based on criteria other than precedence (#16579)
- We can't rely on a one-time sort, it needs to be done at each level
(#18677)

---

The fix is to repeat the sort operation at each level and use precedence
as the only key for sorting (as dictated by the graph building algo).

We do a sort of the matches of each node *after* building the
precedence-based part of the DFA, based on the full sorting criteria, to
maintain compatibility.

* Add test
@rynowak rynowak force-pushed the rynowak/3.1-routing-fixes branch from 4f5bfd9 to e317089 Compare April 29, 2020 18:25
@rynowak
Copy link
Member Author

rynowak commented Apr 29, 2020

This fix is ready to go once we have a branch. I've kept the no-merge label until then.

@analogrelay analogrelay added Servicing-approved Shiproom has approved the issue and removed Servicing-consider Shiproom approval is required for the issue labels Apr 30, 2020
@analogrelay
Copy link
Contributor

Marked as approved. Noticed it was still in TacticView but was approved last meeting :).

@analogrelay
Copy link
Contributor

Still No Merge until 3.1.5 opens, as @rynowak said.

@javiercn javiercn removed the * NO MERGE * Do not merge this PR as long as this label is present. label May 13, 2020
@wtgodbe wtgodbe merged commit b194b6c into release/3.1 May 13, 2020
@wtgodbe wtgodbe deleted the rynowak/3.1-routing-fixes branch May 13, 2020 18:12
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Servicing-approved Shiproom has approved the issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix behavior of catch-all routes in 3.1
8 participants