-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add PCRE2 support #3432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add PCRE2 support #3432
Conversation
55d7ade to
8f73733
Compare
|
Looking at the download stats maybe it would be ok to skip some arch? and then we could probably also gets away with using docker for cross builds. $ curl -s https://api.github.com/repos/jqlang/jq/releases | jq -c '.[] | select(.tag_name == "jq-1.8.1").assets | map({name,download_count}) | sort_by(-.download_count)[]'
{"name":"jq-linux64","download_count":1557635}
{"name":"jq-linux-amd64","download_count":1175823}
{"name":"jq-windows-amd64.exe","download_count":264340}
{"name":"jq-osx-amd64","download_count":250878}
{"name":"jq-macos-amd64","download_count":218746}
{"name":"sha256sum.txt","download_count":208786}
{"name":"jq-macos-arm64","download_count":150022}
{"name":"jq-1.8.1.tar.gz","download_count":97404}
{"name":"jq-linux-arm64","download_count":88325}
{"name":"jq-win64.exe","download_count":73076}
{"name":"jq-windows-i386.exe","download_count":8321}
{"name":"jq-1.8.1.zip","download_count":3963}
{"name":"jq-linux-i386","download_count":2137}
{"name":"jq-linux-ppc64el","download_count":1778}
{"name":"jq-linux-s390x","download_count":1451}
{"name":"jq-linux-armhf","download_count":294}
{"name":"jq-linux-armel","download_count":194}
{"name":"jq-linux-mips64","download_count":156}
{"name":"jq-linux-mips","download_count":146}
{"name":"jq-linux-mips64el","download_count":140}
{"name":"jq-linux-riscv64","download_count":128}
{"name":"jq-linux-mips64r6el","download_count":110}
{"name":"jq-linux-mipsel","download_count":109}
{"name":"jq-linux-powerpc","download_count":109}
{"name":"jq-linux-mipsr6","download_count":105}
{"name":"jq-linux-mipsr6el","download_count":105}
{"name":"jq-linux-mips64r6","download_count":100}skip |
475cbff to
ab80536
Compare
b49be27 to
be3937d
Compare
This is still work in progress. Work on oniguruma (https://github.com/kkos/oniguruma) has been discontinued so we need to find some other regex implementation. To build and test: ./configure --with-oniguruma=no --with-pcre2 make make check ./jq --run-tests < ./tests/onig.test ./jq --run-tests < ./tests/pcre2.test Known TODOs: - Proper configure options (skip vendor for pcre2?) - Support both for a while? - libjq usage? dont builtin and require libpcre2? (seems to be what debian does) - pcre2 in ci/release workflows - Drop some archs and use docker to simplify things? - For macos use native builder instead of cross compile? - Update Dockerfile - Update docs - Breaking changes - Drop "l" modifier? - Could possibly reimplement - Empty pattern and multi-byte code points behave differntly. I think pcre2 is more correct? - onig: jq -n '["🚀" | match(""; "g")] | length' -> 5 (per byte it seems) - pcre2: jq -n '["🚀" | match(""; "g")] | length' -> 2 (per code point) Good references: - https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2demo.c Pre pcre2_next_match usage: - https://github.com/PCRE2Project/pcre2/blob/eb3bd3cf1418cb1a0eabf984b0b1e80b6bdd9314~1/src/pcre2demo.c (pre pcre2_next_match usage) - pcre2test cli tools is a good Playaround for pcre2 Notes: ".+?\\b" test in onig.test seems to be behaves differently depending on pcre2 version. I suspect this fix in 10.43: PCRE2Project/pcre2@0a55280 I noticed a clang -fsanitize=memory use-of-uninitialized-valu issue but it seems to go awa with pcre2 master Related to jqlang#3313
| git diff --exit-code | ||
| docker build --platform ${{ matrix.platform }} -t jq-build-${{ env.BIN_NAME }} . | ||
| docker cp $(docker create jq-build-${{ env.BIN_NAME }}):/jq ${{ env.BIN_NAME }} | ||
| file ${{ env.BIN_NAME }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this got complicated with pcre2 is that we're not vendoring it (i assume we don't want?) so to cross build like before we would have to cross build some version of pcre2 here. the new alternative is that we use docker build with emulation using the our Dockerfile which will end up using pcre2 from debian, that in turn would limit us in archs that can use this approach https://hub.docker.com/_/debian/tags?name=13-slim but maybe it's ok to limit us to whatever debian supports?
This is still work in progress.
Work on oniguruma (https://github.com/kkos/oniguruma) has been discontinued
so we need to find some other regex implementation.
To build and test:
./configure --with-oniguruma=no --with-pcre2
make
make check
./jq --run-tests < ./tests/onig.test
./jq --run-tests < ./tests/pcre2.test
Known TODOs:
Good references:
Pre pcre2_next_match usage:
Notes:
".+?\b" test in onig.test seems to be behaves differently depending on pcre2 version.
I suspect this fix in 10.43:
PCRE2Project/pcre2@0a55280
I noticed a clang -fsanitize=memory use-of-uninitialized-valu issue but it seems to go awa with pcre2 master
Related to #3313