Skip to content

Commit ec2c488

Browse files
feat: Support pgvector as a custom x-sql-datatype
Signed-off-by: Edgar Ramírez-Mondragón <[email protected]>
1 parent 91266be commit ec2c488

File tree

5 files changed

+399
-82
lines changed

5 files changed

+399
-82
lines changed

README.md

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -312,11 +312,12 @@ This target supports the [`x-sql-datatype` extension](https://sdk.meltano.com/en
312312

313313
<!-- insert a table with the mapping -->
314314

315-
| `x-sql-datatype` | Postgres | Description |
316-
| :--------------- | :------- | :----------------------------------------------------------------- |
317-
| smallint | smallint | small-range integer (-32768 to +32767) |
318-
| integer | integer | typical choice for integer (-2147483648 to +2147483647) |
319-
| bigint | bigint | large-range integer (-9223372036854775808 to +9223372036854775807) |
315+
| `x-sql-datatype` | Postgres | Description |
316+
| :--------------- | :------- | :------------------------------------------------------------------------------------------------------------------------------- |
317+
| smallint | smallint | small-range integer (-32768 to +32767) |
318+
| integer | integer | typical choice for integer (-2147483648 to +2147483647) |
319+
| bigint | bigint | large-range integer (-9223372036854775808 to +9223372036854775807) |
320+
| pgvector | vector | vector similarity search (requires [pgvector](https://github.com/pgvector/pgvector) extension and the `pgvector` Python package) |
320321

321322
### Using the Singer catalog to narrow down the Postgres data types
322323

@@ -350,6 +351,36 @@ plugins:
350351
x-sql-datatype: smallint
351352
```
352353

354+
For vector embeddings:
355+
356+
```yaml
357+
# meltano.yml
358+
plugins:
359+
extractors:
360+
- name: tap-my-tap
361+
schema:
362+
some_stream_id:
363+
embedding:
364+
type: array
365+
items:
366+
type: number
367+
x-sql-datatype: pgvector
368+
```
369+
370+
**Important:** To use `pgvector` data types:
371+
1. The [pgvector extension](https://github.com/pgvector/pgvector) **MUST** be installed and enabled in your PostgreSQL database:
372+
```sql
373+
CREATE EXTENSION IF NOT EXISTS vector;
374+
```
375+
2. The `pgvector` Python package **MUST** be installed in your environment:
376+
```bash
377+
pip install pgvector
378+
# or with the target
379+
pip install meltanolabs-target-postgres pgvector
380+
```
381+
382+
If the `pgvector` Python package is not installed, the target will fall back to using `ARRAY(INTEGER)` with a warning.
383+
353384
## Content Encoding Support
354385

355386
Json Schema supports the [`contentEncoding` keyword](https://datatracker.ietf.org/doc/html/rfc4648#section-8), which can be used to specify the encoding of input string types.

pyproject.toml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,15 @@ lint = [
5656
"ruff>=0.1.14",
5757
]
5858
testing = [
59+
"pgvector>=0.4.1",
5960
"pytest>=9",
6061
"tap-countries",
6162
"tap-fundamentals",
6263
]
6364
typing = [
65+
{ include-group = "testing" },
6466
"mypy>=1.6.1",
65-
"types-paramiko>=3.3.0.0,<4",
67+
"types-paramiko>=4",
6668
"types-simplejson>=3.19.0.2",
6769
"types-sqlalchemy>=1.4.53.38",
6870
"types-jsonschema>=4.19.0.3",
@@ -83,6 +85,12 @@ warn_redundant_casts = true
8385
warn_unused_configs = true
8486
warn_unused_ignores = true
8587

88+
[[tool.mypy.overrides]]
89+
follow_untyped_imports = true
90+
module = [
91+
"pgvector.*",
92+
]
93+
8694
[build-system]
8795
requires = [
8896
"hatchling==1.27.0",
@@ -119,12 +127,6 @@ select = [
119127
"RUF", # ruff
120128
]
121129

122-
[tool.ruff.lint.flake8-import-conventions]
123-
banned-from = ["sqlalchemy"]
124-
125-
[tool.ruff.lint.flake8-import-conventions.extend-aliases]
126-
sqlalchemy = "sa"
127-
128130
[tool.ruff.lint.pydocstyle]
129131
convention = "google"
130132

0 commit comments

Comments
 (0)