Support load sinking across notrap loads

#### Feature

For example for subtracting 2 mathematical vectors with 3 elements like so:

```
function u0:0(i64 sret, i64, i64) system_v {
block0(v0: i64, v1: i64, v2: i64):
    v3 = load.f64 notrap v1
    v4 = load.f64 notrap v2
    v6 = load.f64 notrap v1+8
    v7 = load.f64 notrap v2+8
    v9 = load.f64 notrap v1+16
    v10 = load.f64 notrap v2+16
    v5 = fsub v3, v4
    store notrap v5, v0
    v8 = fsub v6, v7
    store notrap v8, v0+8
    v11 = fsub v9, v10
    store notrap v11, v0+16
    return
}
```

6 load instructions will be generated followed by 3 pairs of sub + store:

```asm
0000000000000000 <sub>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   f2 0f 10 3e             movsd  xmm7,QWORD PTR [rsi]
   8:   f2 0f 10 2a             movsd  xmm5,QWORD PTR [rdx]
   c:   f2 0f 10 46 08          movsd  xmm0,QWORD PTR [rsi+0x8]
  11:   f2 0f 10 72 08          movsd  xmm6,QWORD PTR [rdx+0x8]
  16:   f2 0f 10 4e 10          movsd  xmm1,QWORD PTR [rsi+0x10]
  1b:   f2 0f 10 52 10          movsd  xmm2,QWORD PTR [rdx+0x10]
  20:   f2 0f 5c fd             subsd  xmm7,xmm5
  24:   f2 0f 11 3f             movsd  QWORD PTR [rdi],xmm7
  28:   f2 0f 5c c6             subsd  xmm0,xmm6
  2c:   f2 0f 11 47 08          movsd  QWORD PTR [rdi+0x8],xmm0
  31:   f2 0f 5c ca             subsd  xmm1,xmm2
  35:   f2 0f 11 4f 10          movsd  QWORD PTR [rdi+0x10],xmm1
  3a:   48 89 f8                mov    rax,rdi
  3d:   48 89 ec                mov    rsp,rbp
  40:   5d                      pop    rbp
  41:   c3                      ret
```

while LLVM is able to sink half the loads into the `subsd` instructions themself even with -O0:

```asm
  sub:
        mov     rax, rdi
        movsd   xmm2, qword ptr [rsi]
        subsd   xmm2, qword ptr [rdx]
        movsd   xmm1, qword ptr [rsi + 8]
        subsd   xmm1, qword ptr [rdx + 8]
        movsd   xmm0, qword ptr [rsi + 16]
        subsd   xmm0, qword ptr [rdx + 16]
        movsd   qword ptr [rdi], xmm2
        movsd   qword ptr [rdi + 8], xmm1
        movsd   qword ptr [rdi + 16], xmm0
        ret
```

Cranelift is not entirely incapable of load sinking as seen for a dot product where it does load sink a single load:

```
function u0:0(i64, i64) -> f64 system_v {
block0(v0: i64, v1: i64):
    v3 = load.f64 notrap v0
    v4 = load.f64 notrap v1
    v6 = load.f64 notrap v0+8
    v7 = load.f64 notrap v1+8
    v10 = load.f64 notrap v0+16
    v11 = load.f64 notrap v1+16
    v5 = fmul v3, v4
    v8 = fmul v6, v7
    v9 = fadd v5, v8
    v12 = fmul v10, v11
    v13 = fadd v9, v12
    return v13
}
```

```asm
0000000000000000 <dot>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   f2 0f 10 07             movsd  xmm0,QWORD PTR [rdi]
   8:   f2 0f 10 2e             movsd  xmm5,QWORD PTR [rsi]
   c:   f2 0f 10 4f 08          movsd  xmm1,QWORD PTR [rdi+0x8]
  11:   f2 0f 10 76 08          movsd  xmm6,QWORD PTR [rsi+0x8]
  16:   f2 0f 10 57 10          movsd  xmm2,QWORD PTR [rdi+0x10]
  1b:   f2 0f 59 c5             mulsd  xmm0,xmm5
  1f:   f2 0f 59 ce             mulsd  xmm1,xmm6
  23:   f2 0f 58 c1             addsd  xmm0,xmm1
  27:   f2 0f 59 56 10          mulsd  xmm2,QWORD PTR [rsi+0x10]
  2c:   f2 0f 58 c2             addsd  xmm0,xmm2
  30:   48 89 ec                mov    rsp,rbp
  33:   5d                      pop    rbp
  34:   c3                      ret
```

but again even LLVM -O0 will load sink all 3 possible loads:

```asm
dot:
        mov     qword ptr [rsp - 8], rdi
        movsd   xmm0, qword ptr [rdi]
        mulsd   xmm0, qword ptr [rsi]
        movsd   xmm1, qword ptr [rdi + 8]
        mulsd   xmm1, qword ptr [rsi + 8]
        addsd   xmm0, xmm1
        movsd   xmm1, qword ptr [rdi + 16]
        mulsd   xmm1, qword ptr [rsi + 16]
        addsd   xmm0, xmm1
        ret
```

These examples are taken from https://github.com/ebobby/simple-raytracer/blob/496b6164b9f16250f99b91327da8f01acc1e3534/src/vector.rs compiled with both cg_clif (`-Copt-level=3`) and cg_llvm (`-Copt-level=0`).

#### Benefit

Improves runtime performance.

#### Implementation

I think this is caused by `get_value_as_source_or_const` considering loads as having side-effects even when they are `notrap`.

#### Alternatives

TODO: What are the alternative implementation approaches or alternative ways to
solve the problem that this feature would solve? How do these alternatives
compare to this proposal?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support load sinking across notrap loads #12033

Feature

Benefit

Implementation

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support load sinking across notrap loads #12033

Description

Feature

Benefit

Implementation

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions