#3761 : Update createfederalportfolio script match fed agency [dg] #3941

DaisyGuti · 2025-07-02T19:38:30Z

Ticket

Resolves #3761

Changes

Enhanced fuzzy string matching using RapidFuzz library (new lib) to find organization name variations using multiple scoring strategies (token_sort_ratio, token_set_ratio, partial_ratio) plus federal agency name variant generation for common variations like U.S./US prefixes and Department/Dept abbreviations
create_suborganizations() - Now uses fuzzy matching for both domains and requests
update_domains() - Enhanced to find domain information records via fuzzy org name matching
update_requests() - Enhanced to find domain requests via fuzzy org name matching + shows detailed change preview
Added testing capabilities to preview changes before execution using a dry run option:
--fuzzy_threshold parameter - Configurable similarity threshold (default 85%)
--dry_run flag - Shows exactly what records would be created/updated with before/after values
Created a generic fuzzy matching utility (fuzzy_string_matcher.py)
Added a test for the fuzzy string matcher util and updated current tests
Added new library RapidFuzz, Pip files and Requirements.txt

Context for reviewers

A little bit of background as to what the script does because it is a bit unwieldy:

Creates a portfolio for a federal agency.
it finds existing domain request. It looks through existing domain request for all the various all the different organization name values.
It then uses the fuzzy string, matching to determine which organization names belong to which federal agency
it also creates sub organizations from the different organization names found
It packages everything together, portfolio to sub organizations to domain request

Previously, the script only matched requests that had federal_agency pointing to the target agency, missing requests where:
Organization name was e.g. "Department of State" but federal_agency something else
Organization name was "U.S. Department of State" but agency was stored as "Department of State"

Now, it finds requests by:
Direct relationship: federal_agency = target_agency
Name matching: organization_name matches any variant of the agency name (per the threshold set on fuzzy matches)
Normalization: All comparisons use normalized strings for consistent matching

Setup

To test locally assure you have good test data by editing some org names tied to your test portfolio / agency used for testing. I used DOS in my tests:

$ docker compose up

Open another terminal or run ^ in the background to test the commands below:

cd to /src

Preview what would change
$ docker compose exec app ./manage.py create_federal_portfolio --agency_name "Department of State" --parse_requests --dry_run --debug

Test with a higher threshold:
$ docker exec ./manage.py create_federal_portfolio --agency_name "Department of State" --parse_requests --dry_run --fuzzy_threshold 95

Code Review Verification Steps

As the original developer, I have

Satisfied acceptance criteria and met development standards

Met the acceptance criteria, or will meet them in a subsequent PR
Created/modified automated tests
Update documentation in READMEs and/or onboarding guide

Ensured code standards are met (Original Developer)

If any updated dependencies on Pipfile, also update dependencies in requirements.txt.
Interactions with external systems are wrapped in try/except
Error handling exists for unusual or missing values

Validated user-facing changes (if applicable)

Tag gov-designers in this PR's Reviewers section for design review. If code is not user-facing, delete design reviewer checklist
Verify new pages have been added to .pa11yci file so that they will be tested with our automated accessibility testing
Checked keyboard navigability
Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)

As a code reviewer, I have

Reviewed, tested, and left feedback about the changes

Pulled this branch locally and tested it
Verified code meets all checks above. Address any checks that are not satisfied
Reviewed this code and left comments. Indicate if comments must be addressed before code is merged
Checked that all code is adequately covered by tests
Verify migrations are valid and do not conflict with existing migrations

Validated user-facing changes as a developer

Note: Multiple code reviewers can share the checklists above, a second reviewer should not make a duplicate checklist. All checks should be checked before approving, even those labeled N/A.

New pages have been added to .pa11yci file so that they will be tested with our automated accessibility testing
Checked keyboard navigability
Meets all designs and user flows provided by design/product
Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)
(Rarely needed) Tested as both an analyst and applicant user

As a designer reviewer, I have

Verified that the changes match the design intention

Checked that the design translated visually
Checked behavior. Comment any found issues or broken flows.
Checked different states (empty, one, some, error)
Checked for landmarks, page heading structure, and links

Validated user-facing changes as a designer

Checked keyboard navigability
Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)
Tested with multiple browsers (check off which ones were used)
- Chrome
- Microsoft Edge
- FireFox
- Safari
(Rarely needed) Tested as both an analyst and applicant user

References

Code review best practices

Screenshots

…e federal porfolio.

…und. Set back the version of set up tools to what it should be.

…match_fed_agency_new

github-actions · 2025-07-02T19:43:17Z

🥳 Successfully deployed to developer sandbox dg.

DaisyGuti and others added 11 commits June 24, 2025 17:01

Bringing over changes from original PR

50a5b9b

Added the rapidfuzz lib to pip

2a7e370

Added the lib to requirements

c17eb0c

Refactored the fuzzy matcher out to a generic util, updated the creat…

e1c5c8f

…e federal porfolio.

linter fixes

a971ce6

lint fixes

d54aba1

Adjusting loop to skip index (correct testing)

6fb9d52

Created test for the fuzzy string match fixed any issues that were fo…

d9d5388

…und. Set back the version of set up tools to what it should be.

Linter and Black changes.

2adde76

Merge branch 'main' into dg/3761-update_creatfederalportfolio_script_…

ccb9806

…match_fed_agency_new

cleaning up updates

576aaa7

github-project-automation bot added this to .gov Product Board Jul 2, 2025

DaisyGuti requested a review from a team July 2, 2025 19:38

DaisyGuti changed the title ~~#3761 : Update createfederalportfolio script match fed agency~~ #3761 : Update createfederalportfolio script match fed agency [dg] Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#3761 : Update createfederalportfolio script match fed agency [dg] #3941

#3761 : Update createfederalportfolio script match fed agency [dg] #3941

Uh oh!

DaisyGuti commented Jul 2, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

Uh oh!

#3761 : Update createfederalportfolio script match fed agency [dg] #3941

Are you sure you want to change the base?

#3761 : Update createfederalportfolio script match fed agency [dg] #3941

Uh oh!

Conversation

DaisyGuti commented Jul 2, 2025

Ticket

Changes

Context for reviewers

Setup

Code Review Verification Steps

As the original developer, I have

Satisfied acceptance criteria and met development standards

Ensured code standards are met (Original Developer)

Validated user-facing changes (if applicable)

As a code reviewer, I have

Reviewed, tested, and left feedback about the changes

Validated user-facing changes as a developer

As a designer reviewer, I have

Verified that the changes match the design intention

Validated user-facing changes as a designer

References

Screenshots

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

Uh oh!