Skip to content

Conversation

@Hogglo
Copy link

@Hogglo Hogglo commented Nov 6, 2025

📝 Description

Related Issue:

💡 Summary of Changes

  • Adds support for Db2 Vector Indexes to db2vs
  • Adds demo of Vector Index support in Jupyter notebook

✅ PR Checklist

  • PR Title Format: {TYPE}({SCOPE}): {DESCRIPTION}

    • Examples:
      • feat(langchain-ibm): add multi-tenant support
      • fix(langchain-db2): resolve flag parsing error
      • docs(langchain-ibm): update API usage examples
    • Allowed {TYPE} values:
      feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
    • Allowed {SCOPE} values (optional):
      langchain-ibm, langchain-db2
  • PR Description: The Description section clearly lists what was changed, why, and how it was tested.

  • Vector index support for DB2VS (DiskANN)
    ** Implemented end‑to‑end creation and setup of Db2 vector (DiskANN) indexes in libs/langchain-db2/langchain_db2/db2vs.py.
    ** Create vector index driven by create_index which calls _create_diskann_index. This creates 32K pagesize tablespaces and bufferpools to contain the index (largest pages required to fit 768 dimension float vectors). It then creates or recreates the vector index on the specified vector store. There's a bit more to it than that, but this is the high level summary.
  • Added a Jupyter notebook section demonstrating the full flow with creation of a vector index on a vector store in libs/langchain-db2/docs/db2.ipynb.

The purpose of this change is to provide an interface to create a vector index using Db2 LUW's vector index solution which is currently in Early Access. Ongoing development will take place in the db2_vector_index branch. Once the feature has been made GA, this work will be merged into main.

Testing was primarily achieved by running the Jupyter notebook and validating that the Db2 LUW server behaves as expected. More testing i.e. unit testing should be delivered before merging this into main.

  • Formatting, Linting & Tests
    Run the following commands from the root of the modified package(s):
    make format
    make lint
    make test
    ⚠️ PRs will only be considered if all checks pass in CI.
    See the contribution guidelines for more details.

🧪 Testing (optional)

Testing was primarily achieved by running the Jupyter notebook and validating that the Db2 LUW server behaves as expected. More testing i.e. unit testing should be delivered before merging this into main.

I ran make format and make lint and both report the follow error on changes that existed prior to this PR:

$ make format
[ "." = "" ] || uv run --all-groups ruff format .
      Built langchain-db2 @ file:///home/zacharyh/langchain-ibm/libs/langchain-db2
Uninstalled 1 package in 2ms
Installed 1 package in 7ms
12 files left unchanged
[ "." = "" ] || uv run --all-groups ruff check --fix .
S608 Possible SQL injection vector through string-based query construction
   --> langchain_db2/db2vs.py:751:17
    |
749 |           # If a vector index exists on the embedding with a matching distance type,
750 |           # approximate nearest neighbor (ANN) search will be used by default.
751 |           query = f"""
    |  _________________^
752 | |         SELECT id,
753 | |           {self._text_field},
754 | |           SYSTOOLS.BSON2JSON(metadata),
755 | |           vector_distance(embedding, VECTOR('{embedding}', {embedding_len}, FLOAT32),
756 | |           {_get_distance_function(self.distance_strategy)}) as distance
757 | |         FROM {self.table_name}
758 | |         ORDER BY distance
759 | |         FETCH FIRST {k} ROWS ONLY
760 | |         """
    | |___________^
761 |
762 |           # Execute the query
    |

S608 Possible SQL injection vector through string-based query construction
   --> langchain_db2/db2vs.py:815:17
    |
813 |           # If a vector index exists on the embedding with a matching distance type,
814 |           # approximate nearest neighbor (ANN) search will be used by default.
815 |           query = f"""
    |  _________________^
816 | |         SELECT id,
817 | |           {self._text_field},
818 | |           SYSTOOLS.BSON2JSON(metadata),
819 | |           vector_distance(embedding, VECTOR('{embedding}', {embedding_len}, FLOAT32),
820 | |           {_get_distance_function(self.distance_strategy)}) as distance,
821 | |           embedding
822 | |         FROM {self.table_name}
823 | |         ORDER BY distance
824 | |         FETCH FIRST {k} ROWS ONLY
825 | |         """
    | |___________^
826 |
827 |           # Execute the query
    |

Found 2 errors.
make: *** [Makefile:52: format] Error 1

🗒️ Notes (optional)

Thank you again for helping improve LangChain-IBM! 🚀
Your contribution makes the project better for everyone.

@Hogglo Hogglo changed the title Beta support for Vector Indexes on a Vector Store feat: Beta support for Vector Indexes on a Vector Store Nov 6, 2025
@Hogglo Hogglo changed the title feat: Beta support for Vector Indexes on a Vector Store feat(langchain-db2): Beta support for Vector Indexes on a Vector Store Nov 6, 2025
@Hogglo Hogglo marked this pull request as ready for review November 19, 2025 14:22
@Hogglo Hogglo marked this pull request as draft November 19, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant