VPC Subnet address allocations need to start from the first address in the subnet

A customer recently hit what looks like IP address exhaustion while trying to provision instances in a VPC Subnet. They received the error: `"No available IP addresses for interface"` when creating only 4 instances in a `/26` subnet. That subnet should have space for 2**6 - 6 == 58 addresses, so we're definitely not allocating addresses correctly.

The address allocation is handled by a "next item" query, specifically this one:

https://github.com/oxidecomputer/omicron/blob/068bf5db43c3d6307378853c3b0ffdf59b79b145/nexus/db-queries/src/db/queries/network_interface.rs#L521-L529

This is a thin wrapper around this query:

https://github.com/oxidecomputer/omicron/blob/068bf5db43c3d6307378853c3b0ffdf59b79b145/nexus/db-queries/src/db/queries/next_item.rs#L600-L674

That query uses a self-join between the table's "item column" (the thing we're trying to allocate) and that item column plus one. This self-join version of the query was written to reduce the memory consumption of the vanilla version of the query, which uses the `generate_series()` SQL function to join the table to the sequence of all possible items.

But while self-join uses less memory, it behaves differently from the original version of the query. If you allocate two items, and then deallocate the first, they differ in what happens on the _next_ allocation. The original version of the query will return that lowest / first item again. The self-join version will return the _third_ item in the sequence. So if you allocate right up to the limit of your range, and then delete almost all of the rows, you'll still fail the next query.

There's a unit test that's supposed to catch this, here:

https://github.com/oxidecomputer/omicron/blob/068bf5db43c3d6307378853c3b0ffdf59b79b145/nexus/db-queries/src/db/queries/next_item.rs#L1294

But that isn't quite exhaustive enough. It checks that we correctly fill gaps in the _middle_ of the sequence, but not that we fill gaps at the _beginning_ of the sequence. I've rewritten that test to skip the first value in the range too, and it immediately fails.

We need to make sure this query can also allocate free items at the beginning of the query too, similar to the original query but without the memory consumption issues.

	pub struct NextIpv4Address {
	inner: NextItemSelfJoined<
	nexus_db_schema::schema::network_interface::table,
	IpNetwork,
	nexus_db_schema::schema::network_interface::dsl::ip,
	Uuid,
	nexus_db_schema::schema::network_interface::dsl::subnet_id,
	>,
	}

	/// Select the next available item from a table, using a self-join.
	///
	/// This query is similar in spirit to the above `NextItem` query. However, it's
	/// implementation is different, and specifically designed to limit memory
	/// consumption of the overall query.
	///
	/// CockroachDB eagerly evaluates subqueries. This leads to high memory usage
	/// for the basic `NextItem` query, since that relies on a subquery that uses
	/// `generate_series()` to create the list of all possible items. That entire
	/// list must be materialized and held in memory.
	///
	/// In contrast, this query is implemented using a self-join between the
	/// existing table to be searched and the "next entry" (e.g., the value of the
	/// target column plus 1). This can result in lower memory consumption, since
	/// that series need not be stored in memory. Note that this relies on an index
	/// on the item column, which may be partial.
	///
	/// The full query looks like this:
	///
	/// ```sql
	/// SELECT IF(
	/// -- This condition detects if the table is empty.
	/// EXISTS(
	/// SELECT
	/// 1
	/// FROM
	/// <table>
	/// WHERE
	/// <scope_column> = <scope_key> AND
	/// time_deleted IS NULL
	/// LIMIT 1
	/// ),
	/// -- If the table is _not_ empty, we do the self-join between `item` and
	/// -- `item + 1`, and take the first value where `item` is NULL, i.e., the
	/// -- smallest value of `item + 1` where there is no corresponding `item`
	/// -- in the table.
	/// (
	/// SELECT next_item AS <item_column>
	/// FROM (
	/// SELECT
	/// <item_column> + 1 AS next_item
	/// FROM
	/// <table>
	/// WHERE
	/// <scope_column> = <scope_key> AND
	/// time_deleted IS NULL
	/// )
	/// LEFT OUTER JOIN
	/// <table>
	/// ON
	/// (<scope_column>, next_item <= <max_item>, time_deleted IS NULL) =
	/// (<scope_key>, TRUE, TRUE)
	/// WHERE
	/// <item_column> IS NULL AND next_item <= <max_item>
	/// ORDER BY next_item
	/// LIMIT 1
	/// ),
	/// -- If the table _is_ empty, just insert the minimum value.
	/// <min_item>
	/// )
	/// ```
	///
	/// # Which one to use?
	///
	/// One should probably prefer to use this form of the query, if possible. Both
	/// queries appear to read roughly the same amount of data from disk, but the
	/// series-based implementation uses far more memory and runs more slowly.
	///
	/// One downside of this query is that it always allocates the next _smallest_
	/// item that's available. The `NextItem` query lets the caller choose a base
	/// from which to start the search, which is useful in situations where one
	/// would prefer to randomly distribute items rather than sequentially allocate
	/// them.
	#[derive(Debug, Clone, Copy)]
	pub(super) struct NextItemSelfJoined<

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VPC Subnet address allocations need to start from the first address in the subnet #8208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VPC Subnet address allocations need to start from the first address in the subnet #8208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions