Skip to content

[release/4.0] Moved SpecialTokens assignment after the modification to avoid "Collection Modified" error #7330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 9, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Dec 5, 2024

Backport of #7328 to release/4.0

/cc @tarekgh @shaltielshmid

Customer Impact

Users of the BERT Tokenizer who provide a custom list of special tokens during tokenizer creation may encounter exceptions if the lowercasing option is enabled.

Testing

This has been manually tested, with new tests added, and all regression tests have passed successfully.

Risk

Low. This change does not alter any behavior or logic; it simply ensures that the supplied special tokens are handled correctly.

@tarekgh tarekgh requested a review from Copilot December 5, 2024 17:02
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 suggestions.

@tarekgh
Copy link
Member

tarekgh commented Dec 5, 2024

@ericstj @michaelgsharp could you please help approving this one? Thanks!

Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 98.86364% with 1 line in your changes missing coverage. Please review.

Project coverage is 68.89%. Comparing base (26bb7cb) to head (1362232).
Report is 2 commits behind head on release/4.0.

Files with missing lines Patch % Lines
src/Microsoft.ML.Tokenizers/Model/BertTokenizer.cs 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           release/4.0    #7330   +/-   ##
============================================
  Coverage        68.88%   68.89%           
============================================
  Files             1470     1470           
  Lines           274005   274081   +76     
  Branches         28403    28405    +2     
============================================
+ Hits            188752   188828   +76     
  Misses           77936    77936           
  Partials          7317     7317           
Flag Coverage Δ
Debug 68.89% <98.86%> (+<0.01%) ⬆️
production 63.30% <94.73%> (+<0.01%) ⬆️
test 89.42% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...icrosoft.ML.Tokenizers/Model/WordPieceTokenizer.cs 75.69% <100.00%> (ø)
...icrosoft.ML.Tokenizers.Tests/BertTokenizerTests.cs 100.00% <100.00%> (ø)
src/Microsoft.ML.Tokenizers/Model/BertTokenizer.cs 63.70% <94.44%> (+4.01%) ⬆️

... and 6 files with indirect coverage changes

@ericstj ericstj merged commit cfa306e into release/4.0 Dec 9, 2024
25 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants