feat: implement boolean search parser for dandiset queries #2631
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add SearchParser class supporting advanced search syntax including:
The parser constructs Django Q objects from search queries, replacing the previous simple word-splitting approach with a more powerful query language that enables complex filtering expressions.
This would be accompanied by a new page in docs.dandiarchive.org. Something like:
DANDI Archive Search Guide
The DANDI Archive search supports boolean operators and phrase matching to help you find exactly the datasets you need. This guide explains how to construct effective search queries.
Simple Searches
Just type what you're looking for. The search will find datasets that contain all of your terms:
This finds datasets that mention both "hippocampus" and "mouse" anywhere in their metadata.
Exact Phrases
Put quotes around phrases you want to match exactly:
This finds only datasets that contain this exact phrase, not datasets that just happen to mention "two", "photon", "calcium", and "imaging" separately.
Boolean Operators
OR - Find Either Term
Use
ORwhen you want datasets that match any of several alternatives:This finds datasets about mice, datasets about rats, or datasets about both.
This finds datasets studying any of these three brain regions.
NOT - Exclude Terms
Use
NOTto exclude datasets containing certain terms:This finds calcium imaging datasets but excludes any that mention anesthesia.
To exclude multiple terms, you can either chain them:
Or group them with
OR:Both approaches exclude datasets mentioning either "lesion" or "drug".
AND - Require All Terms
You rarely need to type
ANDbecause it's automatic. These two queries are identical:Both find datasets that mention all three terms.
Grouping with Parentheses
Use parentheses to control how operators combine:
This finds hippocampus datasets from either mice or rats.
This finds calcium imaging datasets from hippocampus or cortex, excluding lesion studies.
You can nest parentheses for complex queries:
This finds hippocampus datasets from mice or rats, excluding those involving lesions or anesthesia.
Operator Precedence
When you don't use parentheses, operators are evaluated in this order:
NOT(highest priority)AND(implicit between adjacent terms)OR(lowest priority)This means:
Is interpreted as:
To get what you probably meant, use parentheses:
Best practices:
ORwith other operators, use parentheses to make your intent clear.