Skip to content

Conversation

@scherna
Copy link

@scherna scherna commented Nov 3, 2025

This is an attempt to fix the memory corruption seen on aarch64-linux (in issue 144). On aarch64-linux, tree-sitter test would crash, while tree-sitter test --debug-build would pass (except for one test, varsym: error: carrow:, which would fail no matter what, but with a failure seemingly unrelated to the memory corruption). With this change, tree-sitter test passes all tests but varsym: error: carrow: and does not exhibit any crashes.

Claude helped me track down that the discrepancy between --debug-build and not is that --debug-build uses lower optimization flags- -O0 and -O1 are fine, but -O2 causes the crash. Claude then identified the change to scanner.c as a potential fix (the remainder of the diff is generated by tree-sitter generate --build). I can't say I fully understand it, but seemingly at higher optimization levels, PEEK might be incremented simultaneously with array_push leading to out-of-bounds memory access?

@scherna
Copy link
Author

scherna commented Nov 3, 2025

I see that the changes outside of scanner.c are generated automatically for me when running tree-sitter generate, even on master. anyone know why?

@WillLillis
Copy link
Member

I'm not entirely sure why this is the issue, advance in its current state on master (with its macros expanded) looks like this:

static void advance(Env *env) {
  if (not_eof(env)) {
    (_array__grow((Array *)(&env->state->lookahead), 1,
                  (sizeof *(&env->state->lookahead)->contents)),
     (&env->state->lookahead)->contents[(&env->state->lookahead)->size++] =
         (env->lexer->lookahead));
    env->lexer->advance(env->lexer, false);
  }
}

and with your changes it looks like this:

static void advance(Env *env) {
  if (not_eof(env)) {
    int32_t c = env->lexer->lookahead;
    (_array__grow((Array *)(&env->state->lookahead), 1,
                  (sizeof *(&env->state->lookahead)->contents)),
     (&env->state->lookahead)->contents[(&env->state->lookahead)->size++] =
         (c));
    env->lexer->advance(env->lexer, false);
  }
}

The order of evaluation for comma separated expressions should be well defined in the C standard, so I'm not sure why we're seeing different behavior only on aarch64. Maybe we have a misbehaving compiler optimization? Regardless, I'm not able to test locally, but if this solves the issue nice work!

I see that the changes outside of scanner.c are generated automatically for me when running tree-sitter generate, even on master. anyone know why?

These other changes are just because you've generated with a different (newer) version of the tree-sitter CLI than the one that was committed. This generation introduces some breaking changes, which should happen but don't need to in this PR. You should be able to keep the changes to src/scanner.c and drop everything else for the purpose of the bug fix. If you can do that, I'll retrigger CI and we should be able to get the tests to run.

Change-Id: Ieb0ebae8376b0e45ebce88215e0cf5436a6a6964
@scherna
Copy link
Author

scherna commented Nov 4, 2025

Thanks @WillLillis, I have now removed the extraneous changes. I agree that this isn't quite a satisfying resolution, but I'm hoping that someone with deeper knowledge of the compiler optimizations might have an idea. In the mean time, this at least lets myself and others continue using tree-sitter-haskell. As I'm on aarch64-linux, I'd be happy to test locally any alternative fixes that you or others may have in mind.

@WillLillis
Copy link
Member

Not super sure what's going on in CI but I don't think it's anything you've done. I'll try to take a look sometime this week. Thanks again for the patch! 🙂

@scherna scherna changed the title Fix aarch64-linux malloc Fix aarch64-linux memory corruption Nov 4, 2025
@WillLillis
Copy link
Member

I unfortunately don't have the permissions on this repo to push this through right now. In the meantime, there's a fork of the repo under tree-sitter-grammars here with the change merged.

@skewb1k
Copy link

skewb1k commented Nov 11, 2025

The order of evaluation for comma separated expressions should be well defined in the C standard, so I'm not sure why we're seeing different behavior only on aarch64.

Actually, it's well defined that the order is unspecified. According to the C11 standart:

10 There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.94)
...
12 EXAMPLE In the function call
(*pf[f1()]) (f2(), f3() + f4())
the functions f1, f2, f3, and f4 may be called in any order. All side effects have to be completed before the function pointed to by pf[f1()] is called.

The same applies for other versions of the C and C++: see https://en.cppreference.com/w/cpp/language/eval_order.html

So the compiler on aarch64 probably chooses a different evaluation order, whether intentionally or not.

@ObserverOfTime
Copy link
Member

The issue here is likely a GCC optimizer bug like the one observed in tree-sitter/py-tree-sitter#330.

If you want a deep dive into it, you can compile with -O1 and all -O2 flags explicitly enabled.
Assuming this crashes, you can then start removing flags until it stops crashing to figure out which one causes it.
(Though it's possible that the flags make no difference and it's caused by a parameter instead.)

In any case, this is something we should fix in core by not relying on the comma operation order in the macro.

@414owen
Copy link
Contributor

414owen commented Nov 18, 2025

I'm seeing this issue (at least, I'm pretty sure it's this issue) on arm64 linux, in helix, but not when running tree-sitter test on master, in this repo.


I'm the person who rewrote the scanner in C, here, so this bug is probably my fault. Hey, it was for a good cause, right? Speed, portability, simplicity, easy compilation, preventing people from using incredibly slow C++ features...


Aaaanyway, @skewb1k and @WillLillis, you two had a little miscommunication. @WillLillis was talking about the comma operator adding a sequence point, but @skewb1k the part of the C spec you linked to was about function arguments (the commas between function arguments don't count as uses of the comma operator).

That being the case, I'm not actually sure why this PR fixes the problem...

I'd be inclined to change the array_push macro, rather than the call-site, in case there are other, trickier to reach, cases...
However, I can't seem to reproduce this without firing up my editor, which is annoying!
If anyone can point me to a way to reproduce this without opening my editor, I'd appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants