RFC: On-Demand Collection Loading via loadSubset #676
Replies: 4 comments 6 replies
-
|
Stoked for this, sounds really great. One thing that I may be missing - will the current “eager” mode with the built in collections still be supported, eg Electric one? |
Beta Was this translation helpful? Give feedback.
-
|
If collections can load data in subsets, could this unlock a sort of SSR with Tanstack db? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, question, -- first off I love this! thank you. I wonder if you could clarify, under this approach, I would provide my own loader function which would accept this "where" argument, and I could write a function that, say, responds to certain parameters by fetching from an entirely different API? e.g. I might fetch cards from a REST API when I want them one at a time, or one language at a time, but then I might also want "most recent from my friends; weighted by my preferences" or something like that, which might be an entirely different API. (On the other thread where I requested the ability to basically just cat different react-query caches, this was an important reason for why; I sometimes have vastly different fetching strategies.) I loave to hear that each segment will have its own preload, subscriber, and loading state. Note that useShape does not have the same kind of subscription-management that we know and love around here, note on abort controllers.
|
Beta Was this translation helpful? Give feedback.
-
|
Hey Kyle, this RFC is great, I don't really have anything to add to the API/implementation plan, however one thing that would be great for adoption of Tanstack DB is maybe some transitional solution until this RFC is discussed enough and the implementation starts (which will probably be a ton of work as well). Transitional solution in the sense of: How can we achieve some parts of what this RFC wants to achieve with what we currently have, even if it means using janky workarounds. The benefit here would be that projects can start building ontop of Tanstack DB right now to get the benefits it offers already (greenfield projects that have a half-life of at least 5 years). From the RFC I would say the important stuff that would suffice for a lot of applications are:
If there is some way that this can be done in a not-perfect-but-still-ok-ish way today, I would say it'd be worthwhile to maybe write dedicated documentation for this (with obvious disclaimers that this is WIP and to be changed). I wouldn't mind chiming in here and diving deep, but obviously I am not sure if with what's currently available it would even be a reasonable attempt. Let me know what you (or other maintainers) think about this :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Now released as part of DB 0.5 — https://x.com/tan_stack/status/1988638050662895789 & https://tanstack.com/blog/tanstack-db-0.5-query-driven-sync
TanStack DB collections currently perform full dataset synchronization before becoming ready, limiting their ability to scale to large datasets. This RFC introduces on-demand subset loading through an optional
loadSubsetfunction that collections can provide, enabling efficient pagination, filtered loading, and progressive sync patterns. Collections withoutloadSubsetremain in eager mode with full sync behavior, while collections implementingloadSubsetcan load specific data subsets based on live query predicates, allowing immediate rendering with synchronous data access when subsets are already loaded.Background
TanStack DB collections materialize subsets of database data in memory on the client. Collections manage data synchronization through a
syncfunction provided inCollectionConfig, which is called once when the collection is created.The current sync function signature accepts parameters for managing collection state:
Collections track active subscriptions through reference counting. When the first subscription is created via
subscribeChanges(), the collection callsstartSync()to begin data synchronization. The collection transitions through loading states untilmarkReady()is called, at which pointcollection.status === 'ready'and subscriptions receive the initial state. When the last subscription is removed andgcTimehas elapsed, any cleanup function is called.Live queries execute through
useLiveQuery(), which subscribes to collections and returns data synchronously when available or enters a loading state when waiting for data. Thepreload()method allows awaiting collection readiness before rendering.QueryCollection integrates TanStack Query by calling a
queryFnwith query parameters and managing the resulting data as a collection. In the current implementation, thequeryFnis called once with the full query definition.Electric collections sync data from Postgres via HTTP Shape subscriptions, materializing the shape log into collection state. Shapes support filtering via WHERE clauses, ordering, and column selection. Other collection types like Trailbase, RxDB, and similar systems follow comparable patterns of syncing data from backend sources into local collections.
Problem
The current eager-only synchronization model creates five significant limitations that prevent TanStack DB from scaling to real-world application data sizes:
1. Cannot scale to large datasets
Collections must sync their entire dataset before
collection.status === 'ready', making them impractical for large tables. A collection of millions of posts cannot reasonably sync everything to the client. Applications need the ability to load only the subset required for the current view—such as the first 20 posts ordered by creation date—without waiting for complete dataset synchronization.2. Predicates don't influence initial sync
Subscription predicates cannot affect initial synchronization because
startSync()executes before predicates are available. Even subscriptions with narrow predicates likewhere(status === 'active').limit(10)trigger full collection sync before returning any data.3. No progressive loading patterns
Some use cases benefit from both immediate subset loading and background full syncing. An infinite scroll feed might want to immediately load the first page while syncing the full dataset in the background for offline support. The current model forces a binary choice between immediate readiness (local-only collections) or full sync (eager mode).
4. Pagination cannot load from backend
The
useLiveInfiniteQueryprototype demonstrates this limitation clearly. The current implementation can only paginate through data already present in the collection. When a user clicks "load more," the hook slices the next page from already-synced data. True infinite loading requires fetching additional pages from the backend on demand, but collections lack the API to support this pattern.5. Inefficient for live query collections
Live query collections subscribe to all source collections, forcing each source to perform full synchronization even when the query only needs a small subset. For example:
This query needs 10 posts for a specific user, but
postsCollectionsyncs its entire dataset before the query can execute.The lack of on-demand loading prevents TanStack DB from handling application data at realistic scale across all collection types—whether Electric, QueryCollection, Trailbase, RxDB, or others.
Proposal
Sync Function Return Type
The
syncfunction inCollectionConfigwill be extended to optionally return aSyncConfigResobject:Collections can return:
void: Eager mode with no cleanupCleanupFn: Eager mode with cleanupSyncConfigRes: On-demand mode ifloadSubsetis provided, otherwise eager modeSync Modes
Collections operate in one of two modes based on their
syncfunction return value:Eager Mode (current behavior, default):
loadSubsetfunctionmarkReady()only after entire dataset is syncedcollection.status === 'ready'after full sync completespreload()waits for full initial sync to completeOn-Demand Mode:
loadSubsetfunction inSyncConfigResloadSubset()preload()is typically a no-op since collections don't perform background syncingCollections can implement additional loading patterns on top of on-demand mode. A common pattern is progressive mode, where the collection returns a
loadSubsetfunction (making it on-demand) but also initiates a background full sync:loadSubset()calls can abort and return early since data is already presentcollection.status === 'ready'after full background sync completespreload()waits for full background sync to completeCollections will only expose
syncModeconfiguration if they support both eager and on-demand modes. Otherwise, they default to their single supported mode.LoadSubset Function Contract
When a collection provides a
loadSubsetfunction, it accepts the following options:The function returns:
true: Data is already present in the collection, subscription can proceed synchronouslyPromise<void>: Data is being loaded, resolves when subset is availableThe
loadSubsetfunction is responsible for:write()andcommit()truefor immediate access or a Promise that resolves when loading completesLive Query Integration
Live queries interact with
loadSubsetthrough the subscription lifecycle:useLiveQuery()creates a subscription with predicates, the subscription system callsloadSubset()if the collection provides oneloadSubset()promises resolveloadSubset()calls returntrue, the live query executes completely synchronously with no loading stateFor pagination via
useLiveInfiniteQuery:fetchNextPage()updates the subscription's limit predicateloadSubset()call with the increased limitFor joins in live query collections:
loadSubset()on the right collection with predicates likewhere: inArray(ref('userId'), ['abc', 'xyz])Synchronous Data Access
The
truereturn value enables flicker-free rendering when navigating between components:status === 'active'limited to 20loadSubset()returnstrueimmediatelyuseLiveQuery()executes completely synchronously in the same renderCollection-Specific Implementations
Electric Collections:
loadSubset()is called, compare requested predicate against loaded rangesQueryCollection:
LoadSubsetOptionsinto TanStack Query'smetaobjectqueryKeydynamically based on predicate parameterstrueif non-stale data covers the predicatequeryFnwith predicate parameters in meta for new data fetchingLive Query Collections:
loadSubset()functions with translated predicatesfrom({ post: postsCollection }).where(({ post }) => eq(post.userId, '123')).limit(10), callpostsCollection.loadSubset({ where: eq(ref('userId'), '123'), limit: 10 })loadSubset()on right-side collectionPredicate Deduplication
A predicate deduplication library will provide set operations on
PredicateObjectinstances to help collections track loaded subsets:Collections use these tools to:
loadSubset()requests are already satisfied by loaded dataThe exact mechanics of predicate set operations are outside the scope of this RFC and will be documented separately. Collections are free to implement their own tracking mechanisms based on their specific requirements (e.g., staleness policies, cache eviction strategies).
Error Handling
When
loadSubset()encounters an error:setError()to record the error in collection stateloadSubset()promise rejects with the erroruseLiveQuery().errorloadSubset()call site are also caught and recordedRetry logic is collection-dependent. Collections may implement automatic retries, exponential backoff, or leave retry decisions to the application layer.
Preload Behavior
The
preload()method behavior varies by sync mode:Eager Mode:
preload()callsstartSync()and waits forcollection.status === 'ready'On-Demand Mode:
preload()is typically a no-op since collections don't perform background syncingawait liveQuery.preload()instead to ensure specific query data is loadedProgressive Mode (collection-implemented):
preload()waits for background full sync to completeAreas Requiring Prototyping
Pagination State Tracking: How collections track pagination state (cursors, offsets, keyset values) across multiple subscriptions with different pagination requirements will be determined through implementation. Collections must match new
loadSubset()calls against previous calls to determine what data to fetch, but the exact mechanism for tracking this state needs prototyping.Definition of Success
This proposal succeeds if it enables the following outcomes:
1. Large Dataset Support
Collections can handle tables with millions of rows without requiring full synchronization. A live query displaying the first 20 posts from a 10-million-row table loads only those 20 rows and transitions to ready state in under 2 seconds (network-dependent).
2. Predicate-Driven Loading
Subscription predicates directly control initial data loading. A live query with
where(({ post }) => eq(post.status, 'active')).limit(10)loads exactly 10 active records, not the entire collection. The collection'sloadSubset()function is called with the subscription's exact predicate parameters.3. Synchronous Data Access
When navigating between components that request identical or already-loaded subsets,
useLiveQuery()returns data synchronously with zero loading state flicker.loadSubset()returnstruefor 100% of navigation cases where data is already present.4. True Infinite Scroll
useLiveInfiniteQuerycan fetch additional pages from the backend by callingfetchNextPage(). Each call increases the limit predicate, triggersloadSubset(), and fetches only the incremental data not already loaded. A 1000-row feed loads in chunks of 20 rows on demand rather than syncing all 1000 rows upfront.5. Efficient Query Collections
Live queries over multiple source collections only trigger subset loading on those sources. A query
from({ post: postsCollection }).where(({ post }) => eq(post.userId, '123')).limit(10)loads 10 posts and only the related users, not the full posts and users collections. Query execution time is proportional to result set size, not source collection size.6. Progressive Mode Viability
Collections can implement progressive loading patterns where subset requests resolve immediately while background full sync continues. An infinite scroll feed displays the first page in under 1 second while the full dataset syncs for offline support over the next 30 seconds.
7. Backward Compatibility
Existing collections continue working without changes. Collections without
loadSubset()functions maintain current eager-mode behavior. No breaking changes to collection API or live query usage patterns.8. QueryCollection Integration
QueryCollection successfully integrates
LoadSubsetOptionsinto TanStack Query'squeryFncalls. Developers can accessmeta.where,meta.orderBy, andmeta.limitto construct backend API requests. Pagination continuations correctly pass previous cursors to subsequentqueryFncalls.Beta Was this translation helpful? Give feedback.
All reactions