Skip to content

Introduce prefix allocation #2611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fmoehler
Copy link
Contributor

@fmoehler fmoehler commented Apr 15, 2025

What is this change about?

As described in RFC https://github.com/cloudfoundry/community/blob/main/toc/rfc/rfc-0038-ipv6-dual-stack-for-cf.md bosh shall be enabled to allocate prefix ip addresses (ipv4 and ipv6). Currently bosh only supports attaching ip addresses with a /32 (ipv4) or /128 (ipv6) prefix, which are single ip addresses. To apply these changes a new property called 'prefix' is introduced in the cloud config networks section. Please refer to the example below for a manual network:

- name: diego-cells-ipv6
  subnets:
  - az: z1
    cloud_properties:
      security_groups:
      - sg-0005f94257313417d
      - sg-06acfe8fb0a6247f0
      - sg-05a8b2b2e26ac1d5d
      - sg-064a667fb375e2dac
      - sg-01bb6e1e1f821fe4c
      subnet: subnet-0beae7541e0ebf5f1
    dns:
    - 2600:1f18:7415:8f00:0000:0000:0000:0253
    prefix: 80
    gateway: 2600:1f18:7415:8f00:0000:0000:0000:0001
    range: 2600:1f18:7415:8f00:0000:0000:0000:0000/56
    reserved:
    - 2600:1f18:7415:8f00:0000:0000:0000:0002
    - 2600:1f18:7415:8f00:0000:0000:0000:0003
    - 2600:1f18:7415:8f00:ffff:ffff:ffff:ffff
  type: manual

This example network tells bosh that instead of assigning a single ip address, to assign a prefix. So to slash the /56 network into multiple /80 blocks.

The ip address allocation of the bosh director is adapted to consider these prefixes (previously the director was just counting up by 1)

One major change is how the ip addresses are stored inside the database. The address_str field in the ip_addresses table will change from storing the ip address as an integer representation of the ip to store the ip address in cidr notation. This change is necessary to not "lose" the prefix information when storing the ip address. Also it has the advantage that you can directly create an IpAddrOrCidr Object out of the string coming from the database.

This PR also changes the RPC Interface for the create_vm method. It will include a separate field called "prefix". We will send the Prefix information in a separate field to not break existing cpis. Older CPIs that do not support prefix allocation will just ignore this field. Below you can find an example network section of the create_vm call:

{"ha_proxy":{"type":"manual","ip":"10.0.72.1","prefix":"32","netmask":"255.255.224.0","cloud_properties":{"security_groups":"<redacted>","subnet":"<redacted>"},"default":["dns","gateway"],"dns":["10.0.0.2"],"gateway":"10.0.64.1"}

The prefix here is 32 indicating a single ip address.

Once the vm creation is done. The bosh director api will return prefix ips with the normal ips. E.g. the bosh cli would display this as it displayed single ip addresses before. Here one example with one ipv4, one ipv4 prefix, one ipv6 and one ipv6 prefix with two instances:

image

What tests have you run against this PR?

Unit tests
Acceptance Tests in progress

How should this change be described in bosh release notes?

Enable prefix allocation support for manual networks.

Does this PR introduce a breaking change?

No

Tag your pair, your PM, and/or team!

@anshrupani @DennisAhausSAP

@peanball
Copy link

peanball commented Apr 20, 2025

Hi @aramprice, I just watched the recording of the foundational infra meeting from April 17th.

To clarify a bit:
Diego is currently using Silk as overlay network that is independent from the IP address(es) of the Diego VM itself.

The idea for IPv6 is to use as much native network routing as possible, i.e. not use an overlay network for IPv6. This means that the IP addresses that would be assigned to the containers will be delegated from an IPv6 prefix that is assigned to the Diego VM.

Your question about whether those IP addresses would move in case of an evacuation: no. These IP addresses are not "sticky" to the app or app instance and would not move.

The goal is to make the networking setup for Diego simpler by using "native" IPv6 addressing. Traffic aimed at a particular container will reach the Diego VM via its CIDR range, and then the Diego VM's kernel can forward the traffic to the container's virtual NIC.

Please also note that the networks are supposed to be dual stack, not pure IPv6. So you would want to assign multiple (at least one IPv4 and one IPv6) networks to the same VM, as @fmoehler mentioned in the call already.

The "prefix" parameter is also the size of the prefix to delegate to each VM from a larger range. Have a look at the discussion while creating the RFC for a more extensive example.

The VM is supposed to self-assign an address to itself. Usually this is the x:y:z::1 address (i.e. the "first" address in the provided CIDR range). Addressing the VM is done via that IP address. The "remaining" addresses are then to do for the VM as it wishes.

@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch 9 times, most recently from 354d3e8 to 7cdb704 Compare April 23, 2025 19:58
@aramprice
Copy link
Member

Hi @peanball,

Thanks for that additional context - the info and the RFC were helpful.

In thinking about the overall changes to Bosh a few goals or principles (maybe too strong a word) came to mind:

  • Make conditional behavior / IP-version specific code and settings as limited as possible
    • I really like that prefix: is used and not ipv6_prefix:
    • Should an IPv4 network allow/require a prefix: (one that selects a single IP) for consistency?
  • Reduce the back and forth conversion / formatting where possible (format_ip(), ip.to_i and similar)
    • Ideally a single IpAddrOrCidr (I apologize for the name) or IpAddress representation would be used, and conversions would be handled only where need by the class itself, ex: #to_db_value - for persistence, or #to_s - for logs
  • Avoid assumption in the code that bosh will always be dual-stack
    • I realize the system will start out dual-stack but this may not be forever

This isn't to imply disagreement about the current state of things, only to captures my thoughts at the moment.

@fmoehler
Copy link
Contributor Author

@aramprice thanks for your feedback!

I just want to elaborate a little on our current idea, but of course this is open for ideas. Regarding your points:

  1. We will introduce a "prefix" property in the subnet section of the network (Maybe we can also put it to the network section itself, so that it applies automatically to all subnets?). This property will be considered for ipv4 and ipv6 networks. However if the property is not defined (e.g. for existing networks), the director will consider the "prefix" as /32 (ipv4) or /128 (ipv6) and maintain this accordingly in the database.
  2. Yes we will try to get rid of as many conversion as possible. Sometimes it can be handy to have the ips in integer representation, but most of the time we will pass them as IpAddrOrCidr Objects
  3. Agree with that. I did not test this (yet), but it might not even be specific for a dual stack setup. Probably it would also need changes in the bosh agent.

@peanball
Copy link

Thanks @aramprice, I fully agree with you there. The logical unit of "IP address", with or without netmask (i.e. prefix) should be supported in either scenario and contain the logic of representation.

As @fmoehler mentioned, omitting the prefix will default to a single address (previous behavior).

And there should not be assumptions about dual stack but dual stack should be possible. So far, BOSH supported either v4 or v6, not both at the same time. The mechanism to support more than one network is a classic n+1 problem and can be solved as such where possible.

We just happen to choose n=2 with one v4 and one v6 network. This should be the mindset.

@peanball
Copy link

@fmoehler I have a general comment on the term prefix in the config. It's a bit terse and does not convey the meaning very clearly.

In the end this is the prefix (size) delegated to each instance attached to this subnet, right? Can we somehow express this better? My suggestion would be

  • delegate_prefix
  • delegate_prefix_size
  • delegate_cidr_size

alternatively s/delegate/assign/g.

@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch 3 times, most recently from 9f17169 to 4b6f351 Compare May 6, 2025 13:47
@DennisAhausSAP
Copy link
Contributor

One thing regarding the display of the ip addresses. The bosh-cli is showing the ips as they come from the directors API. As @fmoehler mentioned the prefixes are stored in the db with their prefixes. This also means that the bosh-cli will show the ip addresses with their prefix. Because all the docs are showing examples with ipv4 only this would also mean that everybody has to be prepared that with this change the cli could possibly show the ips (i.e. with command bosh ... vms) in format "ip<v4\v6>/prefix".

@fmoehler
Copy link
Contributor Author

fmoehler commented Jun 2, 2025

One thing regarding the display of the ip addresses. The bosh-cli is showing the ips as they come from the directors API. As @fmoehler mentioned the prefixes are stored in the db with their prefixes. This also means that the bosh-cli will show the ip addresses with their prefix. Because all the docs are showing examples with ipv4 only this would also mean that everybody has to be prepared that with this change the cli could possibly show the ips (i.e. with command bosh ... vms) in format "ip<v4\v6>/prefix".

So far we ensured not to change the API and only return the ip (without the prefix), please refer: https://github.com/cloudfoundry/bosh/pull/2611/files#diff-c9cf4222f4ba33b6b3f35d7f32bfbb2c03a91470e18672bd3dda5468c38a0808R22 so it should not change the way ip addresses are displayed.

@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch 2 times, most recently from 01c56da to acc71bd Compare June 18, 2025 08:32
@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch 3 times, most recently from ab6edde to e5e6af9 Compare June 18, 2025 10:56
@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch from 3be74c9 to a52e682 Compare June 26, 2025 14:15
commit a52e682
Author: Felix Moehler <[email protected]>
Date:   Tue Jun 24 13:24:21 2025 +0200

    adapt migration

commit 3cdec28
Author: Felix Moehler <[email protected]>
Date:   Mon Jun 23 15:50:05 2025 +0200

    fix unit tests

commit 3ed1a7a
Author: Felix Moehler <[email protected]>
Date:   Fri Jun 20 11:50:46 2025 +0200

    cleanup and bug fix

commit 1c092e7
Author: Felix Moehler <[email protected]>
Date:   Wed Jun 18 15:42:41 2025 +0200

    fixes after db migration

commit 0a1f304
Author: Felix Moehler <[email protected]>
Date:   Wed Jun 18 15:41:07 2025 +0200

    fix bug subnet and prefix reservation

commit 925a6e6
Author: Felix Moehler <[email protected]>
Date:   Wed Jun 18 10:22:45 2025 +0200

    perform migration

commit ebef432
Author: Felix Moehler <[email protected]>
Date:   Tue Jun 17 17:57:47 2025 +0200

    fix bug where prefix between subnet and reservation did not match

commit a9ad2ff
Author: Felix Moehler <[email protected]>
Date:   Mon Jun 16 11:19:27 2025 +0200

    refactor ip_repo

commit 9f31e94
Author: Felix Moehler <[email protected]>
Date:   Mon Jun 16 11:13:45 2025 +0200

    refactor instance network reservations

commit 9097d56
Author: Felix Moehler <[email protected]>
Date:   Mon Jun 16 09:50:32 2025 +0200

    adapt dynamic network

commit fa2ad8f
Author: Felix Moehler <[email protected]>
Date:   Thu Jun 12 20:20:08 2025 +0200

    fix unit next part

commit d5c1dbe
Author: Felix Moehler <[email protected]>
Date:   Thu Jun 12 20:09:59 2025 +0200

    fix unit tests and error message for invalid address

commit fcaf710
Author: Felix Moehler <[email protected]>
Date:   Thu Jun 12 19:38:45 2025 +0200

    fix bug and unit tests

commit ff30c40
Author: Felix Moehler <[email protected]>
Date:   Wed Jun 11 10:28:35 2025 +0200

    fix compatibility issue between new and old ip_address format

commit 48c90fa
Author: Felix Moehler <[email protected]>
Date:   Tue Jun 10 16:41:06 2025 +0200

    dont update restricted ips in place

commit bceda51
Author: Felix Moehler <[email protected]>
Date:   Tue Jun 10 11:39:39 2025 +0200

    return prefix for prefix addresses

commit 55bad9a
Author: Ansh Rupani <[email protected]>
Date:   Mon Jun 2 14:53:00 2025 +0200

    fix unit tests

commit 776fb94
Author: Felix Moehler <[email protected]>
Date:   Fri May 9 13:30:58 2025 +0200

    fix prefix for ipv6

commit acd894a
Author: Felix Moehler <[email protected]>
Date:   Thu May 8 16:11:16 2025 +0200

    continue

commit 3a616e8
Author: Felix Moehler <[email protected]>
Date:   Wed Apr 30 09:29:36 2025 +0200

    export ips_cidr in different field

commit 139e0ba
Author: Felix Moehler <[email protected]>
Date:   Tue Apr 29 12:58:16 2025 +0200

    introduce static ip check for prefixes

commit f53be1a
Author: Felix Moehler <[email protected]>
Date:   Mon Apr 28 17:05:50 2025 +0200

    more work

commit d0af77f
Author: Felix Moehler <[email protected]>
Date:   Mon Apr 28 14:35:18 2025 +0200

    return prefix in create_vm_response

commit 5a6dfa4
Author: Felix Moehler <[email protected]>
Date:   Mon Apr 28 14:14:13 2025 +0200

    unit test fixes + adaptions

commit 5da086f
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 16:05:03 2025 +0200

    fixes

commit bb89a65
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 15:20:53 2025 +0200

    fixes

commit 97230a3
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 15:09:07 2025 +0200

    fixes

commit 0a63fd8
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 14:36:46 2025 +0200

    fixes

commit 473d022
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 12:39:42 2025 +0200

    fixes

commit 14901f4
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 09:48:42 2025 +0200

    fixes

commit 25d00c2
Author: Felix Moehler <[email protected]>
Date:   Thu Apr 24 09:36:07 2025 +0200

    remove prefix from job network

commit a4f8444
Author: Felix Moehler <[email protected]>
Date:   Tue Apr 15 19:46:19 2025 +0200

    introduce new prefix column in ip addresses table

commit ca54445
Author: Felix Moehler <[email protected]>
Date:   Tue Apr 15 09:03:07 2025 +0200

    create seperate reservation for prefix

commit 00d71a5
Author: Felix Moehler <[email protected]>
Date:   Mon Apr 14 17:05:30 2025 +0200

    introduce prefix property for networks
@fmoehler fmoehler force-pushed the introduce-prefix-allocation branch from a52e682 to 888683f Compare June 26, 2025 14:18
@fmoehler fmoehler marked this pull request as ready for review June 26, 2025 14:24
@rkoster rkoster requested review from a team, rajathagasthya, xtreme-nitin-ravindran, benjaminguttmann-avtq, beyhan and rkoster and removed request for a team and xtreme-nitin-ravindran June 26, 2025 15:07
@rkoster rkoster moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Jun 26, 2025
@rkoster rkoster requested a review from Copilot June 26, 2025 15:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for prefix allocation when assigning IP addresses by storing IP addresses in CIDR notation rather than as integer values. Key changes include modifications to IP conversion logic in multiple models and utilities, updates to numerous tests for both IPv4 and IPv6 with explicit prefix values, and changes to RPC/API behavior to include a separate prefix field.

Reviewed Changes

Copilot reviewed 56 out of 56 changed files in this pull request and generated 2 comments.

File Description
src/bosh-director/lib/bosh/director/models/vm.rb Updates methods for extracting IPs, introducing logic that strips CIDR for /32 and /128 addresses.
src/bosh-director/lib/bosh/director/deployment_plan/manual_network.rb Validates that the reserved IP’s prefix matches the subnet prefix and propagates the prefix into network settings.
src/bosh-director/lib/bosh/director/ip_provider/ip_repo.rb Refactors IP allocation methods, introducing new logic to handle CIDR conversion and next-available IP calculation.
Comments suppressed due to low confidence (1)

src/bosh-director/lib/bosh/director/deployment_plan/manual_network.rb:64

  • [nitpick] It would be helpful to add a comment explaining why the subnet prefix must exactly match the reservation prefix to ensure future maintainability and clarity for other developers.
        unless subnet.prefix.to_i == ip_or_cidr.prefix.to_i

Comment on lines +22 to +28
ips_cidr.map do | cidr_ip |
if ( cidr_ip.include?(':') && cidr_ip.include?('/128') ) || ( cidr_ip.include?('.') && cidr_ip.include?('/32') )
cidr_ip.split('/')[0]
else
cidr_ip
end
end
Copy link
Preview

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition that strips the CIDR suffix for /32 and /128 addresses may lead to inconsistent behavior if non-standard prefix sizes are used; consider returning the full CIDR representation to align with the new prefix allocation support.

Suggested change
ips_cidr.map do | cidr_ip |
if ( cidr_ip.include?(':') && cidr_ip.include?('/128') ) || ( cidr_ip.include?('.') && cidr_ip.include?('/32') )
cidr_ip.split('/')[0]
else
cidr_ip
end
end
ips_cidr

Copilot uses AI. Check for mistakes.

Comment on lines +114 to +135
filtered_ips = ip_entries.sort_by { |ip| ip.to_i }.reject { |ip| ip.to_i < first_range_address.to_i } #remove ips that are below subnet range

current_ip = Bosh::Director::IpAddrOrCidr.new(first_range_address.to_i + 1)
found = false

while found == false
current_prefix = Bosh::Director::IpAddrOrCidr.new("#{current_ip}/#{prefix}")

if filtered_ips.any? { |ip| current_prefix.include?(ip) }
filtered_ips.reject! { |ip| ip.to_i < current_prefix.to_i }
actual_ip_prefix = filtered_ips.first.count
if actual_ip_prefix > current_prefix.count
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + actual_ip_prefix)
else
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + current_prefix.count)
end
else
found_cidr = current_prefix
found = true
end
end

Copy link
Preview

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The algorithm for determining the next available IP is fairly complex; consider refactoring this logic or adding inline comments to clarify the adjustments made to 'current_ip' based on the filtered IP entries.

Suggested change
filtered_ips = ip_entries.sort_by { |ip| ip.to_i }.reject { |ip| ip.to_i < first_range_address.to_i } #remove ips that are below subnet range
current_ip = Bosh::Director::IpAddrOrCidr.new(first_range_address.to_i + 1)
found = false
while found == false
current_prefix = Bosh::Director::IpAddrOrCidr.new("#{current_ip}/#{prefix}")
if filtered_ips.any? { |ip| current_prefix.include?(ip) }
filtered_ips.reject! { |ip| ip.to_i < current_prefix.to_i }
actual_ip_prefix = filtered_ips.first.count
if actual_ip_prefix > current_prefix.count
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + actual_ip_prefix)
else
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + current_prefix.count)
end
else
found_cidr = current_prefix
found = true
end
end
# Sort IP entries by their integer value and filter out IPs below the subnet range
filtered_ips = ip_entries.sort_by { |ip| ip.to_i }.reject { |ip| ip.to_i < first_range_address.to_i }
# Start searching from the first address in the range + 1
current_ip = Bosh::Director::IpAddrOrCidr.new(first_range_address.to_i + 1)
found = false
# Iterate until a valid CIDR is found
while found == false
# Create a CIDR prefix using the current IP and the given prefix length
current_prefix = Bosh::Director::IpAddrOrCidr.new("#{current_ip}/#{prefix}")
# Check if any filtered IPs fall within the current prefix
if filtered_ips.any? { |ip| current_prefix.include?(ip) }
# Remove IPs below the current prefix from the filtered list
filtered_ips.reject! { |ip| ip.to_i < current_prefix.to_i }
actual_ip_prefix = filtered_ips.first.count
# Adjust current_ip based on the prefix count of the first filtered IP
if actual_ip_prefix > current_prefix.count
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + actual_ip_prefix)
else
current_ip = Bosh::Director::IpAddrOrCidr.new(current_ip.to_i + current_prefix.count)
end
else
# If no IPs are within the current prefix, the CIDR is valid
found_cidr = current_prefix
found = true
end
end
# Return the found CIDR

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending Review | Discussion
Development

Successfully merging this pull request may close these issues.

4 participants