-
Notifications
You must be signed in to change notification settings - Fork 239
add attribute filtering to icu4x-datagen #7236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
09bc171 to
f9975b2
Compare
f9975b2 to
1053183
Compare
fce061b to
49aab1b
Compare
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
provider/core/src/hello_world.rs
Outdated
| ("vi", "", "Xin chào thế giới"), | ||
| ("zh", "", "你好世界"), | ||
| // tests for the attribute filter tests | ||
| ("de", "de", "Hallo Welt"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional Suggestion: Make new strings for the data marker attributes instead of copying the locale strings. There is already one, "reverse". How about:
- rotate1 (rotate the string 1 character, like "dHello Worl")
- rotate2 (rotate the string 2 characters, like "ldHello Wor")
- rotate3 (rotate the string 3 characters, like "rldHello Wo")
- uppercase (like "HELLO WORLD")
- lowercase (like "hello world")
I think that should be enough to make an interesting test case. Interesting regexes could be:
/^r.*$//^uppercase$/-/^.*3$/
which should resolve to: "reverse", "rotate1", "rotate2", and "uppercase", but exclude "rotate3" and "lowercase"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok then maybe my implementation is incorrect: when specifying multiple filter for the same domain, then all of them have to match to keep the element, doing /r.*/ and /uppercase/ means to let through all that start with an "r" and are equal to the string "uppercase", so nothing would be generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, i am always pretty bad at coming up with examples for tests, but yours are pretty good! for the filters i have chosen to do /r.*?|.*?case/, -/lowercase/ and -/.*3/, which results in the same things as yours.
cec660d to
7dc2d3c
Compare
|
I pulled the changes to |
7dc2d3c to
cfb9b73
Compare
|
ok i am a little confused about what to do with the tests: do you want me to only test for the argument parsing in the |
|
I'll push something |
Useful for testing, e.g. #7236
9832238 to
cdae493
Compare
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @m4rch3n1ng for the contribution and @robertbastian for the test improvement
|
Broadly in favor, removing myself from the blocking review path. |
add a
--attribute-filtercli flag to icu4x-datagen, that implements a regex search like suggested in this review comment chain.as i have implemented something extremely similarly sometime in the past i have some thoughts:
-as suggested in this review, but i wanted to share an alternative: in my previous implementation i used regex "flags" similarly to ecmascript regexes, so i can do a case-insensitive match like/regex/iand an inverted like/regex/v(-vflag from grep) or i can combine them.^(?:)$. this does however have one side effect: the error messages are worse, as the error message now point to the wrapped regex and)x(is now allowed as an input. this is fixable by parsing the regex twice, which i can do if you want me to. however, this is also an issue in ripgrep, so it is probably fine for a cli.resolves #6851
supersedes/closes #7166