Skip to content

#3761 : Update createfederalportfolio script match fed agency [dg] #3941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

DaisyGuti
Copy link
Contributor

Ticket

Resolves #3761

Changes

  • Enhanced fuzzy string matching using RapidFuzz library (new lib) to find organization name variations using multiple scoring strategies (token_sort_ratio, token_set_ratio, partial_ratio) plus federal agency name variant generation for common variations like U.S./US prefixes and Department/Dept abbreviations
    create_suborganizations() - Now uses fuzzy matching for both domains and requests
    update_domains() - Enhanced to find domain information records via fuzzy org name matching
    update_requests() - Enhanced to find domain requests via fuzzy org name matching + shows detailed change preview
  • Added testing capabilities to preview changes before execution using a dry run option:
    --fuzzy_threshold parameter - Configurable similarity threshold (default 85%)
    --dry_run flag - Shows exactly what records would be created/updated with before/after values
  • Created a generic fuzzy matching utility (fuzzy_string_matcher.py)
  • Added a test for the fuzzy string matcher util and updated current tests
  • Added new library RapidFuzz, Pip files and Requirements.txt

Context for reviewers

A little bit of background as to what the script does because it is a bit unwieldy:

  • Creates a portfolio for a federal agency.
  • it finds existing domain request. It looks through existing domain request for all the various all the different organization name values.
  • It then uses the fuzzy string, matching to determine which organization names belong to which federal agency
  • it also creates sub organizations from the different organization names found
  • It packages everything together, portfolio to sub organizations to domain request

Previously, the script only matched requests that had federal_agency pointing to the target agency, missing requests where:
Organization name was e.g. "Department of State" but federal_agency something else
Organization name was "U.S. Department of State" but agency was stored as "Department of State"

Now, it finds requests by:
Direct relationship: federal_agency = target_agency
Name matching: organization_name matches any variant of the agency name (per the threshold set on fuzzy matches)
Normalization: All comparisons use normalized strings for consistent matching

Setup

To test locally assure you have good test data by editing some org names tied to your test portfolio / agency used for testing. I used DOS in my tests:

$ docker compose up

Open another terminal or run ^ in the background to test the commands below:

cd to /src

Preview what would change
$ docker compose exec app ./manage.py create_federal_portfolio --agency_name "Department of State" --parse_requests --dry_run --debug

Test with a higher threshold:
$ docker exec ./manage.py create_federal_portfolio --agency_name "Department of State" --parse_requests --dry_run --fuzzy_threshold 95

Code Review Verification Steps

As the original developer, I have

Satisfied acceptance criteria and met development standards

  • Met the acceptance criteria, or will meet them in a subsequent PR
  • Created/modified automated tests
  • Update documentation in READMEs and/or onboarding guide

Ensured code standards are met (Original Developer)

  • If any updated dependencies on Pipfile, also update dependencies in requirements.txt.
  • Interactions with external systems are wrapped in try/except
  • Error handling exists for unusual or missing values

Validated user-facing changes (if applicable)

  • Tag gov-designers in this PR's Reviewers section for design review. If code is not user-facing, delete design reviewer checklist
  • Verify new pages have been added to .pa11yci file so that they will be tested with our automated accessibility testing
  • Checked keyboard navigability
  • Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)

As a code reviewer, I have

Reviewed, tested, and left feedback about the changes

  • Pulled this branch locally and tested it
  • Verified code meets all checks above. Address any checks that are not satisfied
  • Reviewed this code and left comments. Indicate if comments must be addressed before code is merged
  • Checked that all code is adequately covered by tests
  • Verify migrations are valid and do not conflict with existing migrations

Validated user-facing changes as a developer

Note: Multiple code reviewers can share the checklists above, a second reviewer should not make a duplicate checklist. All checks should be checked before approving, even those labeled N/A.

  • New pages have been added to .pa11yci file so that they will be tested with our automated accessibility testing
  • Checked keyboard navigability
  • Meets all designs and user flows provided by design/product
  • Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)
  • (Rarely needed) Tested as both an analyst and applicant user

As a designer reviewer, I have

Verified that the changes match the design intention

  • Checked that the design translated visually
  • Checked behavior. Comment any found issues or broken flows.
  • Checked different states (empty, one, some, error)
  • Checked for landmarks, page heading structure, and links

Validated user-facing changes as a designer

  • Checked keyboard navigability
  • Tested general usability, landmarks, page header structure, and links with a screen reader (such as Voiceover or ANDI)
  • Tested with multiple browsers (check off which ones were used)
    • Chrome
    • Microsoft Edge
    • FireFox
    • Safari
  • (Rarely needed) Tested as both an analyst and applicant user

References

Screenshots

@DaisyGuti DaisyGuti requested a review from a team July 2, 2025 19:38
Copy link

github-actions bot commented Jul 2, 2025

🥳 Successfully deployed to developer sandbox dg.

@DaisyGuti DaisyGuti changed the title #3761 : Update createfederalportfolio script match fed agency #3761 : Update createfederalportfolio script match fed agency [dg] Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Update create_federal_portfolio --parse_requests command to match by domain requests' federal agency
2 participants