@@ -10,7 +10,7 @@ Gaggle supports configuration via environment variables to customize its behavio
1010
1111- ** Description** : Directory path for caching downloaded Kaggle datasets
1212- ** Type** : String (path)
13- - ** Default** : ` $XDG_CACHE_HOME/gaggle_cache ` (typically ` ~/.cache/gaggle_cache ` )
13+ - ** Default** : ` $XDG_CACHE_HOME/gaggle ` (typically ` ~/.cache/gaggle ` )
1414- ** Example** :
1515 ``` bash
1616 export GAGGLE_CACHE_DIR=" /var/cache/gaggle"
@@ -77,38 +77,34 @@ Gaggle supports configuration via environment variables to customize its behavio
7777 - ** Description** : Number of retry attempts after the initial try
7878 - ** Type** : Integer
7979 - ** Default** : ` 3 `
80- - ** GAGGLE_HTTP_RETRY_DELAY_MS **
81- - ** Description** : Initial backoff delay in milliseconds
82- - ** Type** : Integer (ms )
83- - ** Default** : ` 1000 `
84- - ** GAGGLE_HTTP_RETRY_MAX_DELAY_MS **
85- - ** Description** : Maximum backoff delay cap in milliseconds
86- - ** Type** : Integer (ms )
87- - ** Default** : ` 30000 `
80+ - ** GAGGLE_HTTP_RETRY_DELAY **
81+ - ** Description** : Initial backoff delay in seconds
82+ - ** Type** : Float or integer (seconds )
83+ - ** Default** : ` 1 `
84+ - ** GAGGLE_HTTP_RETRY_MAX_DELAY **
85+ - ** Description** : Maximum backoff delay cap in seconds
86+ - ** Type** : Float or integer (seconds )
87+ - ** Default** : ` 30 `
8888
89- These controls enable exponential backoff with cap across metadata/search/download requests.
89+ These controls enable exponential backoff with cap across metadata/search/download requests.
9090
9191#### Download Coordination
9292
9393When multiple queries attempt to download the same dataset concurrently, Gaggle coordinates using an in-process lock.
9494These settings control the wait behavior when a download is already in progress.
9595
96- - ** GAGGLE_DOWNLOAD_WAIT_TIMEOUT_MS **
97- - ** Description** : Maximum time a waiting request will block for a concurrent download to finish
98- - ** Type** : Integer (milliseconds )
99- - ** Default** : ` 30000 ` (30 seconds)
96+ - ** GAGGLE_DOWNLOAD_WAIT_TIMEOUT **
97+ - ** Description** : Maximum time a waiting request will block (seconds)
98+ - ** Type** : Float or integer (seconds )
99+ - ** Default** : ` 30 `
100100 - ** Example** :
101101 ``` bash
102- export GAGGLE_DOWNLOAD_WAIT_TIMEOUT_MS=600000 # 10 minutes
103- ```
104- - ** GAGGLE_DOWNLOAD_WAIT_POLL_MS**
105- - ** Description** : Polling interval while waiting on another download
106- - ** Type** : Integer (milliseconds)
107- - ** Default** : ` 100`
108- - ** Example** :
109- ` ` ` bash
110- export GAGGLE_DOWNLOAD_WAIT_POLL_MS=250
102+ export GAGGLE_DOWNLOAD_WAIT_TIMEOUT=600 # 10 minutes
111103 ```
104+ - ** GAGGLE_DOWNLOAD_WAIT_POLL**
105+ - ** Description** : Polling interval while waiting (seconds)
106+ - ** Type** : Float or integer (seconds)
107+ - ** Default** : ` 0.1`
112108
113109# ### Logging Configuration
114110
@@ -144,9 +140,10 @@ These settings control the wait behavior when a download is already in progress.
144140 - ** Type** : Boolean (` 1` , ` true` , ` yes` , ` on` to enable)
145141 - ** Default** : ` false`
146142 - ** Effects** :
147- - gaggle_download(...) fails if the dataset isn’t cached.
148- - Version checks use cached ` .downloaded` metadata when available; otherwise return " unknown" .
149- - Search and metadata calls will still attempt network; consider avoiding them in offline mode.
143+ - ` gaggle_download(...)` fails if the dataset isn’t cached.
144+ - ` gaggle_version_info` reports ` latest_version` as " unknown" if no cache metadata exists.
145+ - ` gaggle_is_current` and other version checks use cached ` .downloaded` metadata when available.
146+ - ` gaggle_search` and ` gaggle_info` also fail fast in offline mode (no network attempts).
150147 - ** Example** :
151148 ` ` ` bash
152149 export GAGGLE_OFFLINE=1
@@ -185,9 +182,9 @@ export GAGGLE_CACHE_DIR="/var/lib/gaggle/cache"
185182export GAGGLE_CACHE_SIZE_LIMIT_MB=51200 # 50GB
186183export GAGGLE_HTTP_TIMEOUT=120 # 2 minutes
187184export GAGGLE_HTTP_RETRY_ATTEMPTS=5 # Retry up to 5 times
188- export GAGGLE_HTTP_RETRY_DELAY_MS=2000 # 2 second initial delay
189- export GAGGLE_HTTP_RETRY_MAX_DELAY_MS=30000 # Cap backoff at 30s
190- export GAGGLE_LOG_LEVEL=WARN # Production logging (planned)
185+ export GAGGLE_HTTP_RETRY_DELAY=2 # 2 second initial delay
186+ export GAGGLE_HTTP_RETRY_MAX_DELAY=30 # Cap backoff at 30s
187+ export GAGGLE_LOG_LEVEL=WARN # Production logging
191188
192189## Set Kaggle credentials
193190export KAGGLE_USERNAME="your-username"
@@ -202,10 +199,10 @@ export KAGGLE_KEY="your-api-key"
202199` ` ` bash
203200# # Development setup with verbose logging
204201export GAGGLE_CACHE_DIR=" ./dev-cache"
205- export GAGGLE_LOG_LEVEL=DEBUG # # Detailed debug logs (planned)
202+ export GAGGLE_LOG_LEVEL=DEBUG # # Detailed debug logs
206203export GAGGLE_HTTP_TIMEOUT=10 ## Shorter timeout for dev
207204export GAGGLE_HTTP_RETRY_ATTEMPTS=1 ## Fail fast in development
208- export GAGGLE_HTTP_RETRY_DELAY_MS=250 ## Quick retry
205+ export GAGGLE_HTTP_RETRY_DELAY=0.25 ## Quick retry (250ms)
209206
210207## Run DuckDB
211208./build/release/duckdb
@@ -217,8 +214,8 @@ export GAGGLE_HTTP_RETRY_DELAY_MS=250 ## Quick retry
217214# # Configuration for slow or unreliable networks
218215export GAGGLE_HTTP_TIMEOUT=300 # # 5 minute timeout
219216export GAGGLE_HTTP_RETRY_ATTEMPTS=10 ## Many retries
220- export GAGGLE_HTTP_RETRY_DELAY_MS=5000 ## 5 second initial delay
221- export GAGGLE_HTTP_RETRY_MAX_DELAY_MS=60000 ## Cap at 60s
217+ export GAGGLE_HTTP_RETRY_DELAY=5 ## 5 second initial delay
218+ export GAGGLE_HTTP_RETRY_MAX_DELAY=60 ## Cap at 60s
222219
223220./build/release/duckdb
224221` ` `
@@ -230,10 +227,11 @@ export GAGGLE_HTTP_RETRY_MAX_DELAY_MS=60000 ## Cap at 60s
230227export GAGGLE_OFFLINE=1
231228
232229# Attempt to download a dataset (will fail if not cached)
233- gaggle download username/dataset-name
230+ SELECT gaggle_download( ' username/dataset-name' ) ;
234231
235- # Querying metadata or searching will still attempt network access
236- gaggle info username/dataset-name
232+ # Querying metadata or searching will fail fast in offline mode
233+ SELECT gaggle_info(' username/dataset-name' );
234+ SELECT gaggle_search(' keyword' , 1, 10);
237235` ` `
238236
239237# ## Configuration Verification
@@ -253,21 +251,27 @@ SELECT gaggle_search('housing', 1, 10);
253251
254252-- Get dataset metadata
255253SELECT gaggle_info(' username/dataset-name' );
254+
255+ -- Retrieve last error string (or NULL if none)
256+ SELECT gaggle_last_error ();
256257` ` `
257258
258259# ## Retry Policy Details
259260
260261Gaggle implements retries with exponential backoff for HTTP requests. The number of attempts, initial delay, and
261262maximum delay can be tuned with the environment variables above.
262263
263- # ## Logging Levels (planned)
264+ # ## Logging Levels
264265
265- Detailed logging control via ` GAGGLE_LOG_LEVEL` is planned but not yet implemented.
266+ Detailed logging control via ` GAGGLE_LOG_LEVEL` is implemented.
266267
267- # ## Notes
268+ # ## Units
268269
269- - Cache directory and HTTP timeout are checked at runtime. Changing ` GAGGLE_CACHE_DIR` or ` GAGGLE_HTTP_TIMEOUT` takes
270- effect for subsequent operations in the same process.
271- - Kaggle credentials can be provided via environment variables, config file, or the `gaggle_set_credentials ()` SQL
272- function.
273- - Invalid values fall back to sensible defaults.
270+ - Storage sizes are reported in megabytes (MB) throughout the API and SQL functions.
271+ - Timeouts and retry delays are configured in seconds via environment variables with clean names (no unit suffixes). For example: ` GAGGLE_HTTP_RETRY_DELAY=1.5` .
272+
273+ ` ` ` sql
274+ -- Example cache info (note size is in MB only)
275+ SELECT gaggle_cache_info ();
276+ -- {" path" :" ..." ," size_mb" :42," limit_mb" :102400," usage_percent" :0," is_soft_limit" :true," type" :" local" }
277+ ` ` `
0 commit comments