@@ -10,15 +10,14 @@ It outlines features to be implemented and their current status.
1010
1111* ** Authentication**
1212 * [x] Set Kaggle API credentials programmatically.
13- * [x] Support environment variables (using ` KAGGLE_USERNAME ` and ` KAGGLE_KEY ` ).
14- * [x] Support ` ~/.kaggle/kaggle.json file ` .
13+ * [x] Support environment variables for authentication ( ` KAGGLE_USERNAME ` and ` KAGGLE_KEY ` ).
14+ * [x] Support reading credentials from ` ~/.kaggle/kaggle.json file ` .
1515* ** Dataset Operations**
16- * [x] Search for datasets.
16+ * [x] Search for datasets on Kaggle .
1717 * [x] Download datasets from Kaggle.
1818 * [x] List files in a dataset.
1919 * [x] Get dataset metadata.
20- * [ ] Upload datasets to Kaggle.
21- * [ ] Delete datasets from Kaggle.
20+ * [ ] Upload DuckDB tables to Kaggle.
2221
2322### 2. Caching and Storage
2423
@@ -28,7 +27,6 @@ It outlines features to be implemented and their current status.
2827 * [x] Get cache information (size and storage location).
2928 * [ ] Set cache size limit.
3029 * [ ] Cache expiration policies.
31- * [ ] Support for partial file downloads and resumes.
3230* ** Storage**
3331 * [x] Store datasets in configurable directory.
3432 * [ ] Support for cloud storage backends (S3, GCS, and Azure).
@@ -37,24 +35,22 @@ It outlines features to be implemented and their current status.
3735
3836* ** File Format Support**
3937 * [x] CSV and TSV file reading.
40- * [x] JSON file reading.
4138 * [x] Parquet file reading.
39+ * [x] JSON file reading.
4240 * [ ] Excel and XLSX file reading.
43- * ** Direct Query Integration **
41+ * ** Querying Datasets **
4442 * [x] Replacement scan for ` kaggle: ` URLs.
45- * [ ] Direct SQL queries on remote datasets without full download (true streaming).
46- * [ ] Streaming data from Kaggle without caching.
4743 * [ ] Virtual table support for lazy loading.
4844
4945### 4. Performance and Concurrency
5046
5147* ** Concurrency Control**
5248 * [x] Thread-safe credential storage.
5349 * [x] Thread-safe cache access.
54- * [ ] Concurrent dataset downloads.
50+ * [x ] Concurrent dataset downloads (with per-dataset serialization to prevent race conditions) .
5551* ** Network Optimization**
5652 * [x] Configurable HTTP timeouts.
57- * [ ] Retry logic with backoff (configurable attempts/delay; planned) .
53+ * [x ] Retry logic with backoff for failed requests .
5854* ** Caching Strategy**
5955 * [ ] Incremental cache updates.
6056 * [ ] Background cache synchronization.
@@ -67,7 +63,7 @@ It outlines features to be implemented and their current status.
6763 * [x] Clear error messages for ` NULL ` inputs.
6864 * [ ] Detailed error codes for programmatic error handling.
6965* ** Resilience**
70- * [ ] Automatic retry on network failures (planned with backoff settings) .
66+ * [x ] Automatic retry on network failures.
7167 * [ ] Graceful degradation when Kaggle API is unavailable.
7268 * [ ] Local-only mode for cached datasets.
7369
@@ -87,4 +83,3 @@ It outlines features to be implemented and their current status.
8783* ** Distribution**
8884 * [ ] Pre-compiled extension binaries for Linux, macOS, and Windows.
8985 * [ ] Submission to the DuckDB Community Extensions repository.
90- * [ ] Docker image with Gaggle pre-installed.
0 commit comments