Skip to content

Conversation

@congqixia
Copy link
Contributor

This PR switches the Loon FFI packed writer from the high-level FFI interface to the lower-level internal C++ API to address a critical bug where properties keys become nullptr when writing large amounts of data.

Problem

When using the original FFI interface (writer_new, writer_write, writer_close, transaction_begin, transaction_commit) for writing large datasets, the properties map keys would become nullptr after extended operations. This manifests as:

  • Segmentation faults during write operations
  • Corrupted manifest data
  • Unpredictable behavior during transaction commits

The root cause appears to be related to memory management in the FFI boundary layer when handling long-lived properties objects across multiple FFI calls.

Solution

Replace the FFI-based writer implementation with direct calls to the internal milvus-storage C++ API:

  • Use milvus_storage::api::Writer directly instead of FFI wrapper
  • Use milvus_storage::api::transaction::TransactionImpl for commits
  • Construct arrow::RecordBatch directly from imported arrays

This bypasses the problematic FFI properties handling while maintaining the same functionality.

Also bump milvus-storage version resolving get chunk bug.

Related to #44956

…tr issue

This PR switches the Loon FFI packed writer from the high-level FFI interface
to the lower-level internal C++ API to address a critical bug where properties
keys become nullptr when writing large amounts of data.

***Problem***

When using the original FFI interface (`writer_new`, `writer_write`, `writer_close`,
`transaction_begin`, `transaction_commit`) for writing large datasets, the properties
map keys would become nullptr after extended operations. This manifests as:
- Segmentation faults during write operations
- Corrupted manifest data
- Unpredictable behavior during transaction commits

The root cause appears to be related to memory management in the FFI boundary layer
when handling long-lived properties objects across multiple FFI calls.

***Solution***

Replace the FFI-based writer implementation with direct calls to the internal
milvus-storage C++ API:
- Use `milvus_storage::api::Writer` directly instead of FFI wrapper
- Use `milvus_storage::api::transaction::TransactionImpl` for commits
- Construct `arrow::RecordBatch` directly from imported arrays

This bypasses the problematic FFI properties handling while maintaining the same
functionality.

Also bump milvus-storage version resolving get chunk bug.

Related to milvus-io#44956

Signed-off-by: Congqi Xia <[email protected]>
@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Nov 26, 2025
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Nov 26, 2025
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm [master branch only]
  • /ci-rerun-e2e-default // for ci-v2/e2e-default [master branch only]

If you have any questions or requests, please contact @zhikunyao.

@mergify
Copy link
Contributor

mergify bot commented Nov 26, 2025

@congqixia go-sdk check failed, comment rerun go-sdk can trigger the job again.

Signed-off-by: Congqi Xia <[email protected]>
@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 0.51813% with 192 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.11%. Comparing base (6c0a80d) to head (05851fc).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/storagev2/packed/packed_writer_ffi.go 0.00% 96 Missing ⚠️
...nternal/core/src/storage/loon_ffi/ffi_writer_c.cpp 0.00% 93 Missing ⚠️
...rnal/core/src/segcore/ChunkedSegmentSealedImpl.cpp 0.00% 1 Missing ⚠️
...re/storagev2translator/ManifestGroupTranslator.cpp 0.00% 1 Missing ⚠️
internal/datanode/index/task_index.go 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.51%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (76.11%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #45871      +/-   ##
==========================================
- Coverage   76.15%   76.11%   -0.05%     
==========================================
  Files        1881     1870      -11     
  Lines      294229   292469    -1760     
==========================================
- Hits       224082   222603    -1479     
+ Misses      62701    62489     -212     
+ Partials     7446     7377      -69     
Components Coverage Δ
Client 78.17% <ø> (ø)
Core 82.66% <0.00%> (-0.10%) ⬇️
Go 74.26% <1.02%> (+<0.01%) ⬆️
Files with missing lines Coverage Δ
internal/datacoord/task_index.go 76.36% <100.00%> (+0.08%) ⬆️
...rnal/core/src/segcore/ChunkedSegmentSealedImpl.cpp 60.62% <0.00%> (ø)
...re/storagev2translator/ManifestGroupTranslator.cpp 0.00% <0.00%> (ø)
internal/datanode/index/task_index.go 77.43% <0.00%> (-0.27%) ⬇️
...nternal/core/src/storage/loon_ffi/ffi_writer_c.cpp 0.00% <0.00%> (ø)
internal/storagev2/packed/packed_writer_ffi.go 0.00% <0.00%> (ø)

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@congqixia
Copy link
Contributor Author

/ci-rerun-ut-integration

1 similar comment
@congqixia
Copy link
Contributor Author

/ci-rerun-ut-integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/compilation dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants