Skip to content

Conversation

@bobbyiliev
Copy link
Contributor

@bobbyiliev bobbyiliev commented Oct 7, 2025

Motivation

The yaml merge issue should now be fixed in v0.5.3 as per: MaterializeInc/terraform-aws-materialize#78 so re-enabling the tests.

Missed re-enabling the upgrade pipeline when initially re-enabled the other AWS test here: #33789

Testing this here: https://buildkite.com/materialize/nightly/builds/13686#0199bf4d-2cfe-4c58-9577-a7e6d31b738c

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bobbyiliev bobbyiliev requested review from antiguru and def- October 7, 2025 15:30
@bobbyiliev bobbyiliev requested a review from a team as a code owner October 7, 2025 15:30
@bobbyiliev
Copy link
Contributor Author

bobbyiliev commented Oct 7, 2025

The upgrade is failing because of:

2025-10-07T20:24:59.747575Z  thread 'main' panicked at src/environmentd/src/environmentd/main.rs:670:9:
--
  | environmentd: fatal: incompatible persist version 0.160.0-dev.0, current: 0.147.17, make sure to upgrade the catalog one version forward at a time
  | 6: core::panicking::panic_fmt
  | 7: mz_environmentd::environmentd::main::main
  | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
  | /usr/local/bin/entrypoint.sh: line 19:     8 Aborted                 (core dumped) environmentd "$@"

@def-
Copy link
Contributor

def- commented Oct 7, 2025

It's an upgrade test, so it's expected that we start out at v0.147.X (self-managed v25.2) and then upgrade to main, which should work! This sounds like our allowlist for what versions we allow upgrading from might be failing?

// Ignores the patch version
const SELF_MANAGED_VERSIONS: &[Version] = &[
// 25.1
Version::new(0, 130, 0),
// 25.2
Version::new(0, 147, 0),
];

@bobbyiliev
Copy link
Contributor Author

It's an upgrade test, so it's expected that we start out at v0.147.X (self-managed v25.2) and then upgrade to main, which should work! This sounds like our allowlist for what versions we allow upgrading from might be failing?

Oh I see, you mean we need 0.160.x in there? Is it because we're now doing weekly releases of the Helm chart that we'd need to bump the SELF_MANAGED_VERSIONS list every week to match the latest dev releases so the upgrade tests pass? I might be missing some of the recent context here on how the new SH releases work now.

@def-
Copy link
Contributor

def- commented Oct 8, 2025

No. The way it's supposed to work is that you can upgrade to main from self-managed v25.1 and v25.2. @bkirwi or @DAlperin might have an idea as to why it's failing now. This definitely worked before I disabled the tests a few months ago.

@bkirwi
Copy link
Contributor

bkirwi commented Oct 14, 2025

That error is reporting a downgrade: from 0.160.0-dev.0 in the shard to a current version of 0.147.17. Since we don't support downgrades, I suspect the test is either doing something weird or this is correctly fencing out an old version and your problem is elsewhere.

Note that the test reports a second panic:


2025-10-07T20:28:19.857220Z  thread 'tokio:work-3' panicked at src/repr/src/row/encode.rs:150:9:
--
  | tried pushing Null into non-nullable column
  | 6: core::panicking::panic_fmt
  | 7: <mz_repr::row::encode::RowColumnarEncoder as mz_persist_types::columnar::ColumnEncoder<mz_repr::row::Row>>::append
  | 8: <mz_persist_types::part::PartBuilder<mz_storage_types::sources::SourceData, mz_repr::relation::RelationDesc, (), mz_persist_types::codec_impls::UnitSchema>>::push::<mz_repr::timestamp::Timestamp, i64>

This error was introduced temporarily, and then fixed, on main recently. You may want to rebase to avoid the issue.

@def- def- force-pushed the terraform-tests-enable-upgrade-ci branch from 12a11da to 5b9e397 Compare October 15, 2025 07:52
@def-
Copy link
Contributor

def- commented Oct 15, 2025

Rebased and new run triggered: https://buildkite.com/materialize/nightly/builds/13747

@bobbyiliev
Copy link
Contributor Author

It seems to have worked this time: https://buildkite.com/materialize/nightly/builds/13754#0199e885-7df0-4d35-b5f7-3936937ae827

I'll do another test just to be safe.

Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you Bobby for getting this working!

@bobbyiliev bobbyiliev merged commit 35b9815 into MaterializeInc:main Oct 15, 2025
132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants