Skip to content

Conversation

@bendichter
Copy link
Member

@bendichter bendichter commented Nov 4, 2025

Add SearchParser class supporting advanced search syntax including:

  • Boolean operators (AND, OR, NOT) for combining search terms
  • Quoted phrases for exact matching
  • Parentheses for grouping and precedence control
  • Implicit AND between adjacent search terms

The parser constructs Django Q objects from search queries, replacing the previous simple word-splitting approach with a more powerful query language that enables complex filtering expressions.

This would be accompanied by a new page in docs.dandiarchive.org. Something like:

DANDI Archive Search Guide

The DANDI Archive search supports boolean operators and phrase matching to help you find exactly the datasets you need. This guide explains how to construct effective search queries.

Simple Searches

Just type what you're looking for. The search will find datasets that contain all of your terms:

hippocampus mouse

This finds datasets that mention both "hippocampus" and "mouse" anywhere in their metadata.

Exact Phrases

Put quotes around phrases you want to match exactly:

"two-photon calcium imaging"

This finds only datasets that contain this exact phrase, not datasets that just happen to mention "two", "photon", "calcium", and "imaging" separately.

Boolean Operators

OR - Find Either Term

Use OR when you want datasets that match any of several alternatives:

mouse OR rat

This finds datasets about mice, datasets about rats, or datasets about both.

hippocampus OR cortex OR amygdala

This finds datasets studying any of these three brain regions.

NOT - Exclude Terms

Use NOT to exclude datasets containing certain terms:

calcium imaging NOT anesthesia

This finds calcium imaging datasets but excludes any that mention anesthesia.

To exclude multiple terms, you can either chain them:

hippocampus NOT lesion NOT drug

Or group them with OR:

hippocampus NOT (lesion OR drug)

Both approaches exclude datasets mentioning either "lesion" or "drug".

AND - Require All Terms

You rarely need to type AND because it's automatic. These two queries are identical:

mouse hippocampus electrophysiology
mouse AND hippocampus AND electrophysiology

Both find datasets that mention all three terms.

Grouping with Parentheses

Use parentheses to control how operators combine:

(mouse OR rat) hippocampus

This finds hippocampus datasets from either mice or rats.

"calcium imaging" (hippocampus OR cortex) NOT lesion

This finds calcium imaging datasets from hippocampus or cortex, excluding lesion studies.

You can nest parentheses for complex queries:

((mouse OR rat) AND hippocampus) NOT (lesion OR anesthesia)

This finds hippocampus datasets from mice or rats, excluding those involving lesions or anesthesia.

Operator Precedence

When you don't use parentheses, operators are evaluated in this order:

  1. NOT (highest priority)
  2. AND (implicit between adjacent terms)
  3. OR (lowest priority)

This means:

mouse OR rat hippocampus

Is interpreted as:

mouse OR (rat AND hippocampus)

To get what you probably meant, use parentheses:

(mouse OR rat) hippocampus

Best practices:

  1. When mixing OR with other operators, use parentheses to make your intent clear.
  2. Consider different forms of words. Using "hippocamp" will capture "hippocampus" and "hippocampal"

Add SearchParser class supporting advanced search syntax including:
- Boolean operators (AND, OR, NOT) for combining search terms
- Quoted phrases for exact matching
- Parentheses for grouping and precedence control
- Implicit AND between adjacent search terms

The parser constructs Django Q objects from search queries, replacing
the previous simple word-splitting approach with a more powerful
query language that enables complex filtering expressions.
@bendichter bendichter requested a review from jjnesbitt November 4, 2025 20:27
@bendichter bendichter marked this pull request as draft November 4, 2025 20:30
…kahead

- Update SearchParser to properly detect end of input by looking ahead
  for non-operator content before continuing to parse terms
- Fix handling of operator-only queries (e.g., 'AND OR NOT') to correctly
  return negation of empty Q object instead of empty Q
- Add position saving/restoration logic to check for meaningful content
  ahead without consuming tokens
- Update test expectations and documentation to reflect correct behavior
  when NOT operator has no following term

This resolves edge cases where the parser would incorrectly continue
parsing when only operators or closing parentheses remained in the input.
@bendichter bendichter marked this pull request as ready for review November 4, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants