-
Notifications
You must be signed in to change notification settings - Fork 13
Design Document
mrbean edited this page Nov 10, 2022
·
7 revisions
- Support sending vectors from Spark with option to have Weaviate to do vectorizing
- Supports sending only text if vectorization is happening within DB (should failure happen if vectors are also passed?)
- Support specifying the rate limit to send
- Support automatic rate limiting with backoff (maybe v1.1.0?)
- Support inferring/creating the schema in weaviate based on Spark DF schema. How to infer which columns would have vectorization?
Example for Weaviate doing vectorizing:
df.write.option(WeaviateOptions()).format("io.weaviate").save()
Example in case column named vector holds the vectors already:
val weaviateOptions = Map("vector" -> "vector_column_name")
ds.write
.format("io.weaviate.spark.DataSource")
.option("url", "http://localhost:8080")
.option(weaviateOptions)
.save()