Skip to content

Design Document

mrbean edited this page Nov 10, 2022 · 7 revisions

Initial Milestone (v1.0.0)

  • Support sending vectors from Spark with option to have Weaviate to do vectorizing
  • Supports sending only text if vectorization is happening within DB (should failure happen if vectors are also passed?)
  • Support specifying the rate limit to send
  • Support automatic rate limiting with backoff (maybe v1.1.0?)
  • Support inferring/creating the schema in weaviate based on Spark DF schema. How to infer which columns would have vectorization?

Example for Weaviate doing vectorizing:

df.write.option(WeaviateOptions()).format("io.weaviate").save()

Example in case column named vector holds the vectors already:

val weaviateOptions = Map("vector" -> "vector_column_name")
ds.write
  .format("io.weaviate.spark.DataSource")
  .option("url", "http://localhost:8080")
  .option(weaviateOptions)
  .save()

Clone this wiki locally