Skip to content

Commit d231ba6

Browse files
authored
The second revision (#2)
1 parent 01b3d68 commit d231ba6

34 files changed

+465
-319
lines changed

CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
cmake_minimum_required(VERSION 3.5)
22

3+
34
set(CORROSION_VERBOSE_OUTPUT ON)
45

56
# We need C++17 for std::filesystem on all platforms
@@ -143,6 +144,10 @@ message(STATUS "OS_ARCH: ${OS_ARCH} (orig='${_GAGGLE_ORIG_OS_ARCH}')")
143144
message(STATUS "DUCKDB_PLATFORM: ${DUCKDB_PLATFORM}")
144145
message(STATUS "Rust_CARGO_TARGET: ${Rust_CARGO_TARGET}")
145146

147+
148+
# ==============================================================================
149+
# Corrosion (Rust integration)
150+
# ==============================================================================
146151
include(FetchContent)
147152
FetchContent_Declare(
148153
Corrosion

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ updates, etc.
3030
This workflow can quickly become complex, especially when working with multiple datasets or when datasets are updated
3131
frequently.
3232
Gaggle tries to help simplify this process by hiding the complexity and letting you work with datasets directly inside
33-
an analytical database like DuckDB that can handle fast queries.
33+
DuckDB that allow you to run fast analytical queries on the data.
3434

3535
In essence, Gaggle makes DuckDB into a SQL-enabled frontend for Kaggle datasets.
3636

@@ -39,9 +39,9 @@ In essence, Gaggle makes DuckDB into a SQL-enabled frontend for Kaggle datasets.
3939
- Provides a simple API to interact with Kaggle datasets from DuckDB
4040
- Allows you to search, download, and read datasets from Kaggle
4141
- Supports datasets that contain CSV, Parquet, JSON, and XLSX files
42-
- Configurable and has built-in caching of downloaded datasets
43-
- Thread-safe, fast, and has a low memory footprint
4442
- Supports dataset updates and versioning
43+
- Configurable and has built-in caching support to avoid re-downloading
44+
- Thread-safe, fast, and has a low memory footprint
4545

4646
See the [ROADMAP.md](ROADMAP.md) for the list of implemented and planned features.
4747

@@ -103,7 +103,7 @@ select *
103103
from gaggle_ls('habedi/flickr-8k-dataset-clean') limit 5;
104104

105105
-- Read a Parquet file from local cache using a prepared statement
106-
-- (DuckDB doesn't support subquery in function arguments, so we use a prepared statement)
106+
-- (DuckDB doesn't allow the use of subqueries in function arguments, so we use a prepared statement)
107107
prepare rp as select * from read_parquet(?) limit 10;
108108
execute rp(gaggle_file_path('habedi/flickr-8k-dataset-clean', 'flickr8k.parquet'));
109109

@@ -118,7 +118,7 @@ select gaggle_cache_info();
118118
select gaggle_is_current('habedi/flickr-8k-dataset-clean');
119119
```
120120

121-
[![Simple Demo 1](https://asciinema.org/a/745806.svg)](https://asciinema.org/a/745806)
121+
[![Simple Demo 1](https://asciinema.org/a/do6g8xv1G5tkRc4e3bExbNYwZ.svg)](https://asciinema.org/a/do6g8xv1G5tkRc4e3bExbNYwZ)
122122

123123
---
124124

ROADMAP.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -72,21 +72,19 @@ It outlines features to be implemented and their current status.
7272
### 6. Documentation and Distribution
7373

7474
* **Documentation**
75-
* [x] API reference in README.md.
76-
* [x] Usage examples (see `docs/examples/`).
77-
* [ ] Tutorial documentation.
78-
* [ ] FAQ section.
79-
* [ ] Troubleshooting guide.
75+
* [x] API reference (see `docs/README.md`).
76+
* [x] Usage examples (see the files in `docs/examples/`).
77+
* [x] Other documentation files like the list of errors (check out `docs/` directory).
8078
* **Testing**
81-
* [x] Unit tests for core modules (Rust).
82-
* [x] SQL integration tests (DuckDB shell).
83-
* [x] End-to-end integration tests with mocked HTTP (basic coverage).
79+
* [x] Unit tests for core (Rust) modules.
80+
* [x] SQL integration tests (run in DuckDB shell).
81+
* [x] End-to-end integration tests with mocked HTTP.
8482
* [ ] Performance benchmarks.
8583
* **Distribution**
86-
* [ ] Pre-compiled extension binaries for Linux, macOS, and Windows.
87-
* [ ] Submission to the DuckDB Community Extensions repository.
84+
* [x] Built binaries for Linux, macOS, and Windows; AMD64 and ARM64.
85+
* [x] Submission to the DuckDB's community extensions repository.
8886

8987
### 7. Observability
9088

9189
* **Logging**
92-
* [x] Structured logging via `tracing` with `GAGGLE_LOG_LEVEL`.
90+
* [x] Structured logging (configurable via `GAGGLE_LOG_LEVEL` environment variable).

0 commit comments

Comments
 (0)