Skip to content

Conversation

@axunonb
Copy link
Member

@axunonb axunonb commented Nov 9, 2025

Changes

Allowed Selector characters can be defined by an allow list or a block list.

  • Add enum FilterType { Allowlist, Blocklist } With Blocklistall Unicode characters are allowed, except those in the block list. Allowlist uses a list of characters that are allowed. The default characters are alphanumeric characters (upper and lower case), plus '_' and '-'. FilterType.Allowlist is the default and avoids breaking changes in the v3 major version.
  • Add FilterType SelectorCharFilter to ParserSettings
  • Add class CharSet. It represents a set of characters that supports efficient storage and lookup for both ASCII and non-ASCII characters. It is used in the Parser as allow list or block list. The speed for parsing Placeholdera decreases by ~25% compared to v3.2.0 to v3.6.1.
  • Update Parser to use CharSet and handle the defined FilterType
  • Refactor ParserSettings: Re-order members, update internal properties to better align with class CharSet.

Benchmark

Parser.ParseFormat("{SomePlaceholder1}{SomePlaceholder2}{SomePlaceholder3}{SomePlaceholder4}{SomePlaceholder5}");

Method N Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
v3.6.1 1000 1,065 us 231 us 12.7 us 7.48 0.02 - 406.25 KB 3.25
This PR 1000 813 us 90 us 4.9 us 5.91 0.04 - 406.25 KB 3.25

@codecov
Copy link

codecov bot commented Nov 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98%. Comparing base (6b1dbec) to head (8a598b1).
⚠️ Report is 1 commits behind head on pr/unicode-in-selectors.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           pr/unicode-in-selectors   #511   +/-   ##
======================================================
  Coverage                       97%    98%           
======================================================
  Files                           99    100    +1     
  Lines                         3426   3480   +54     
======================================================
+ Hits                          3340   3406   +66     
+ Misses                          86     74   -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@axunonb axunonb force-pushed the pr/selector-whitelist branch from d4c57eb to 8a598b1 Compare November 9, 2025 19:27
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 9, 2025

@axunonb axunonb merged commit b787b49 into pr/unicode-in-selectors Nov 9, 2025
9 checks passed
@axunonb axunonb deleted the pr/selector-whitelist branch November 9, 2025 19:43
axunonb added a commit that referenced this pull request Nov 11, 2025
* Allow Unicode characters in `Selector`s

Resolves #454

* Refactor internal `ParserSettings` to use static or const members

Refactored internal `ParserSettings` to convert instance-level properties and methods to static or const members.

* feat: Filter `Selector` chars by allowlist or blocklilst (#511)

* Change enum `FilterType` to `SelectorFilterType`

Implement proposals from review:

* SelectorFilterType.Alphanumeric: alphanumeric characters (upper and lower case), plus '_' and '-'
* SelectorFilterType.VisualUnicodeChars: All Unicode characters are allowed in a selector, except 68 non-visual characters: Control Characters (U+0000–U+001F, U+007F), Format Characters (Category: Cf), Directional Formatting (Category: Cf), Invisible Separator, Common Combining Marks (Category: Mn), Whitespace Characters (non-glyph spacing).

* Make `NonVisualUnicodeCharacters` read-only
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants