You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modules/index/index.md
+67-1Lines changed: 67 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,4 +64,70 @@ The Index Module provides a vector indexing system for Django applications. It e
64
64
```
65
65
66
66
5. Build your indexes with`manage.py rebuild_indexes`
67
-
6. Query your index with`MyIndex().search("query")`
67
+
6. Query your index with`MyIndex().search_sources("query")`
68
+
69
+
## Querying Indexes
70
+
71
+
When indexes are built, source objects are often chunked in to many separate Documents before they are embedded and inserted in to the index.
72
+
73
+
This means that a query to the underlying index can return multiple Documents from the same source object. For example; if you have a `Book` Django model with a big summary to be embedded, searching the index might return many Documents from the same Book.
74
+
75
+
This can be fine in some cases, inRAG applications the most relevant chunks are usually what you want, even if they all come from the same source.
76
+
77
+
In other cases, such as finding similar content, this behaviour can be a hindrance.
78
+
79
+
To solve this, Vector Indexes provide two query methods depending on your needs:
80
+
81
+
### Document Search
82
+
83
+
```python
84
+
85
+
MyVectorIndex().search_documents("Similar to this")
86
+
```
87
+
88
+
`search_documents` returns a queryset-like interface over Document objects. If the underlying vector provider returns multiple Documents from the same source object, these will all be returned.
89
+
90
+
This is useful for RAG-like applications where the most relevant chunks are important.
91
+
92
+
### Source Search
93
+
94
+
```python
95
+
96
+
MyVectorIndex().search_sources("Similar to this")
97
+
```
98
+
99
+
When using the `search_sources` method, a Vector Index will attempt to map results from the index back to original source objects, i.e. in the `Book` example, when using `ModelSource(model=Book)`, this method will return a queryset-like interface over `Book` models.
100
+
101
+
As the underlying storage provider is likely to return multiple Documents for the same source object, this method overfetches Documents to attempt to ensure enough source objects are returned for your query.
102
+
103
+
This overfetching behaviour can be customised:
104
+
105
+
```python
106
+
107
+
MyVectorIndex().search_sources(
108
+
"Similar to this",
109
+
overfetch_multiplier=4,
110
+
max_overfetch_iterations=3
111
+
)
112
+
```
113
+
114
+
Where:
115
+
116
+
-`overfetch_multiplier` defines how many multiples of the requested limit will be retrieved from the source, e.g. if you request 5 results and provide an `overfetch_multiplier` of 4, 20 Documents will be retrieved from the index internally. The top 5 unique sources from these will then be returned.
117
+
-`max_overfetch_iterations` defines the maximum number of times the underlying search will be repeated to get all-unique source objects, e.g. if the initial search doesn't return enough unique objects, it will be repeated with an increasing number of items up to `max_overfetch_iterations` times.
118
+
119
+
### Converting Between Result Types
120
+
121
+
You can convert between result types on an existing queryset:
122
+
123
+
```python
124
+
# Start with document search, convert to sources
125
+
docs = MyVectorIndex().search_documents("query")
126
+
sources = docs.as_sources()
127
+
128
+
# Start with source search, convert to documents
129
+
sources = MyVectorIndex().search_sources("query")
130
+
docs = sources.as_documents()
131
+
```
132
+
133
+
This can be useful in RAG applications where you want to use `Documents` for building context, but then present source objects to users as the 'Sources referenced'.
0 commit comments