Skip to content

Conversation

@wader
Copy link
Member

@wader wader commented Nov 5, 2025

This is still work in progress.

Work on oniguruma (https://github.com/kkos/oniguruma) has been discontinued
so we need to find some other regex implementation.

To build and test:
./configure --with-oniguruma=no --with-pcre2
make
make check
./jq --run-tests < ./tests/onig.test
./jq --run-tests < ./tests/pcre2.test

Known TODOs:

  • Proper configure options (skip vendor for pcre2?)
    • Support both for a while?
  • libjq usage? dont builtin and require libpcre2? (seems to be what debian does)
  • pcre2 in ci/release workflows
    • Drop some archs and use docker to simplify things?
    • For macos use native builder instead of cross compile?
  • Update Dockerfile
  • Update docs
    • Breaking changes
      • Drop "l" modifier?
        • Could possibly reimplement
      • Empty pattern and multi-byte code points behave differntly. I think pcre2 is more correct?
        • onig: jq -n '["🚀" | match(""; "g")] | length' -> 5 (per byte it seems)
        • pcre2: jq -n '["🚀" | match(""; "g")] | length' -> 2 (per code point)

Good references:

Notes:
".+?\b" test in onig.test seems to be behaves differently depending on pcre2 version.
I suspect this fix in 10.43:
PCRE2Project/pcre2@0a55280

I noticed a clang -fsanitize=memory use-of-uninitialized-valu issue but it seems to go awa with pcre2 master

Related to #3313

@wader wader marked this pull request as draft November 5, 2025 23:49
@wader wader force-pushed the pcre2 branch 13 times, most recently from 55d7ade to 8f73733 Compare November 7, 2025 16:19
@wader
Copy link
Member Author

wader commented Nov 7, 2025

Looking at the download stats maybe it would be ok to skip some arch? and then we could probably also gets away with using docker for cross builds.

$ curl -s https://api.github.com/repos/jqlang/jq/releases | jq -c '.[] | select(.tag_name == "jq-1.8.1").assets | map({name,download_count}) | sort_by(-.download_count)[]'
{"name":"jq-linux64","download_count":1557635}
{"name":"jq-linux-amd64","download_count":1175823}
{"name":"jq-windows-amd64.exe","download_count":264340}
{"name":"jq-osx-amd64","download_count":250878}
{"name":"jq-macos-amd64","download_count":218746}
{"name":"sha256sum.txt","download_count":208786}
{"name":"jq-macos-arm64","download_count":150022}
{"name":"jq-1.8.1.tar.gz","download_count":97404}
{"name":"jq-linux-arm64","download_count":88325}
{"name":"jq-win64.exe","download_count":73076}
{"name":"jq-windows-i386.exe","download_count":8321}
{"name":"jq-1.8.1.zip","download_count":3963}
{"name":"jq-linux-i386","download_count":2137}
{"name":"jq-linux-ppc64el","download_count":1778}
{"name":"jq-linux-s390x","download_count":1451}
{"name":"jq-linux-armhf","download_count":294}
{"name":"jq-linux-armel","download_count":194}
{"name":"jq-linux-mips64","download_count":156}
{"name":"jq-linux-mips","download_count":146}
{"name":"jq-linux-mips64el","download_count":140}
{"name":"jq-linux-riscv64","download_count":128}
{"name":"jq-linux-mips64r6el","download_count":110}
{"name":"jq-linux-mipsel","download_count":109}
{"name":"jq-linux-powerpc","download_count":109}
{"name":"jq-linux-mipsr6","download_count":105}
{"name":"jq-linux-mipsr6el","download_count":105}
{"name":"jq-linux-mips64r6","download_count":100}

skip armhf and below possibly?

@wader wader force-pushed the pcre2 branch 15 times, most recently from 475cbff to ab80536 Compare November 8, 2025 11:22
@wader wader force-pushed the pcre2 branch 3 times, most recently from b49be27 to be3937d Compare November 9, 2025 09:20
@wader wader marked this pull request as ready for review November 9, 2025 10:13
This is still work in progress.

Work on oniguruma (https://github.com/kkos/oniguruma) has been discontinued
so we need to find some other regex implementation.

To build and test:
./configure --with-oniguruma=no --with-pcre2
make
make check
./jq --run-tests < ./tests/onig.test
./jq --run-tests < ./tests/pcre2.test

Known TODOs:
- Proper configure options (skip vendor for pcre2?)
  - Support both for a while?
- libjq usage? dont builtin and require libpcre2? (seems to be what debian does)
- pcre2 in ci/release workflows
  - Drop some archs and use docker to simplify things?
  - For macos use native builder instead of cross compile?
- Update Dockerfile
- Update docs
  - Breaking changes
    - Drop "l" modifier?
      - Could possibly reimplement
    - Empty pattern and multi-byte code points behave differntly. I think pcre2 is more correct?
      - onig: jq -n '["🚀" | match(""; "g")] | length' -> 5 (per byte it seems)
      - pcre2: jq -n '["🚀" | match(""; "g")] | length' -> 2 (per code point)

Good references:
- https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2demo.c
Pre pcre2_next_match usage:
- https://github.com/PCRE2Project/pcre2/blob/eb3bd3cf1418cb1a0eabf984b0b1e80b6bdd9314~1/src/pcre2demo.c (pre pcre2_next_match usage)
- pcre2test cli tools is a good Playaround for pcre2

Notes:
".+?\\b" test in onig.test seems to be behaves differently depending on pcre2 version.
I suspect this fix in 10.43:
PCRE2Project/pcre2@0a55280

I noticed a clang -fsanitize=memory use-of-uninitialized-valu issue but it seems to go awa with pcre2 master

Related to jqlang#3313
git diff --exit-code
docker build --platform ${{ matrix.platform }} -t jq-build-${{ env.BIN_NAME }} .
docker cp $(docker create jq-build-${{ env.BIN_NAME }}):/jq ${{ env.BIN_NAME }}
file ${{ env.BIN_NAME }}
Copy link
Member Author

@wader wader Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this got complicated with pcre2 is that we're not vendoring it (i assume we don't want?) so to cross build like before we would have to cross build some version of pcre2 here. the new alternative is that we use docker build with emulation using the our Dockerfile which will end up using pcre2 from debian, that in turn would limit us in archs that can use this approach https://hub.docker.com/_/debian/tags?name=13-slim but maybe it's ok to limit us to whatever debian supports?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant