Client side SQLite with remote file

This is a follow-up to the earlier SQLite example. The app we built previously relies on downloading the whole file in advance. This approach imposes a practical limit on how large the database can be. We are going to try to ovecome that with the help of HTTP range requests.

Enabling multithreaded operation

We start out by making a copy of the original example. The first change we need to make is to move the data retrieval code to a worker thread. Because data requested by SQLite will no longer be available immediately, a wait must be performed, an operation not permitted in the main thread.

Open vite.config.js and add the multithreaded option to rollup-plugin-zigar:

    zigar({ optimize: 'ReleaseSmall', multithreaded: true }),

Add the following HTTP headers to the dev server options to enable shared memory:

  server: {
    headers: {
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'require-corp',
    }
  },

You'll need to do the same when you deploy the app. If your hosting service provider does not let you add HTTP headers (e.g. Github Pages) then you're out of luck.

If the database extension had been changed to .txt, rename it back and add back the assetsInclude setting:

  assetsInclude: [ '**/*.db' ],

Patching the the standard library

As of Zig 0.14.1, the standard library still does not properly support multithreading in WebAssembly. It needs to be patched in multiple places. If you haven't done so already, run the following command to apply the patch file:

npx rollup-plugin-zigar patch

Setting up a work queue

Rename sqlite.zig to worker.zig. Open it and remove the @"meta(zigar)" declaration (it won't be used since the file is no longer the root).

Create a new sqlite.zig:

const std = @import("std");
const wasm_allocator = std.heap.wasm_allocator;

const zigar = @import("zigar");

const worker = @import("./worker.zig");

var work_queue: zigar.thread.WorkQueue(worker) = .{};

pub fn startup() !void {
    try work_queue.init(.{
        .allocator = wasm_allocator,
        .n_jobs = 1,
    });
}

pub fn shutdown(promise: zigar.function.Promise(void)) void {
    work_queue.deinitAsync(promise);
}

pub const openDb = work_queue.promisify(worker.openDb);
pub const closeDb = work_queue.promisify(worker.closeDb);
pub const findAlbums = work_queue.promisify(worker.findAlbums);
pub const getTracks = work_queue.promisify(worker.getTracks);

pub const @"meta(zigar)" = struct {
    pub fn isFieldString(comptime T: type, comptime _: std.meta.FieldEnum(T)) bool {
        return true;
    }

    pub fn isDeclPlain(comptime T: type, comptime _: std.meta.DeclEnum(T)) bool {
        return true;
    }
};

Switching to async

Now our functions return promises instead of the actual results. We need to update our JavaScript to reflect that. Open src/App.js and add await in front of the call to openDb() and replace the call to byte() with blob():

let data
const dataPromise = (async () => {
  const resp = await fetch(chinook)
  data = await resp.blob()
  await openDb('/db.sqlite3')
})()

We're switching from a Uint8Array to a Blob in order to test async file access. In order to read some part of the blob, Zigar has to wait for the fulfillment of the promise returned by its arrayBuffer() method.

In the useEffect handlers, change the call to findAlbums():

      findAlbums(deferredSearchString || '%').then(setAlbums)

And getTracks():

      getTracks(selectedAlbumId).then(setTracks)

Finally, add a call to startup() at the top:

startup()

Now the app is ready to run again:

npm run dev

It should work as before.

Simplifying the code

In the original app, we had to make sure that no query is performed until the database is downloaded. This made the start-up sequence a bit awkward. Now that SQLite is capable of waiting, we can clean up our code a bit.

We can get rid of the state variable ready. dataPromise is also not longer needed. We can make our open listener async instead:

__zigar.on('open', async (evt) => {
  if (evt.path.endsWith('.sqlite3')) {
    if (!data) {
      const resp = await fetch(chinook)
      data = await resp.blob()
    }
    return data
  } else {
    return false
  }
})

We can move the call to openDb() into a useEffect hook:

  useEffect(() => {
    openDb('/db.sqlite3')
    return () => closeDb()
  }, [])

Using HTTP range requests

The original plan for this tutorial was to use an existing npmjs package to handle the data retrieval. That turns out to be unworkable due to a check for the Accept-Ranges header, which is frequently missing even when a server is capable of performing such requests.

So we have to roll our solution--it's simple enough. Instead of mimicking a blob we're going to create a class that implements Zigar's [file interface](File interface). Create a new file web-file.js and paste in the following:

export class WebFile {
  static async create(url) {
    const resp = await fetch(url, { method: 'HEAD' });
    const len = resp.headers.get("content-length");
    if (!len) {
      throw new Error('Missing Content-Length header');
    }
    const self = new WebFile();
    self.url = url;
    self.size = parseInt(len);
    self.pos = 0;    
		return self;
  }
  
  async pread(len, offset) {
    const end = offset + len;
    const resp = await fetch(this.url, {
			headers: {
				Range: `bytes=${offset}-${end - 1}`,
			},
		});
    if (!resp.headers.get('content-range')) {
      throw new Error('Missing range header');
    }
    const buffer = await resp.arrayBuffer();
    return new Uint8Array(buffer);
  }

  async read(len) {
    const chunk = await this.pread(len, this.pos);
    this.pos += chunk.length;
    return chunk;
  }

  tell() {
    return this.pos;
  }

  seek(offset, whence) {
    let pos = -1;
    switch (whence) {
      case 0: pos = offset; break;
      case 1: pos = this.pos + offset; break;
      case 2: pos = this.size + offset; break;
    }
    if (!(pos >= 0 && pos <= this.size)) throw new Error('Invalid argument');
    this.pos = pos;
    return pos;
  }
}

The code above is largely self-explanatory, I believe. We first make a HEAD request to the server to obtain the file's size. As pread calls come in, they're translated into Fetch API calls.

Import the class in App.jsx and use it in the open listener:

__zigar.on('open', async (evt) => {
  if (evt.path.endsWith('.sqlite3')) {
    return data ??= WebFile.create(chinook)
  } else {
    return false
  }
})

Return to the browser. Open the development console and switch to the Network tab. When you reload the page, you should see a large number of small requests:

Dev console - Network

Retrieving the file in such small chunks is obviously ineffecient. Luckily, we can easily fix this by increasing the database's page size.

Install DB Browser for SQLite if you don't have it on your computer already. Open src/assets/chinook.db and switch to the Edit Pragmas tab. You'll see that the page size is only 1K. Set it to 16K and press Save:

DB Browser for SQLite

Or you can use the CLI tool sqlite3 to run the following commands:

PRAGMA page_size = 65536;
VACUUM;

After the change, the number of requests drop significantly:

Dev console - Network

Further optimization

If you examine the network traffic, you'll notice that SQLite regularly reads a small region near the beginning of the file:

Dev console - Network

So if we cache this part of the file, we can eliminate these superflouous requests. Instead of making a HEAD request initially, we're going do do a partial GET instead:

  static async create(url, prefetch = 16384) {
    const resp = await fetch(url, {
      headers: {
        Range: `bytes=0-${prefetch - 1}`,
      },
    });
    const data = await resp.bytes();
    let size;
    if (resp.status === 206) {
      const range = resp.headers.get("content-range");
      size = parseInt(range.slice(range.indexOf('/') + 1));   // e.g. "bytes 0-16384/1261568"
    } else {
      size = data.length;
    }
    const self = new WebFile();
    self.url = url;
    self.size = size;
    self.pos = 0;
    self.cache = data;
		return self;
  }

Then in pread(), we check if the call can be satisfied with the data that was fetched:

  async pread(len, offset) {
    const end = offset + len;
    if (end <= this.cache.length) {
      return this.cache.slice(offset, end);
    }
    // ...
  }

As a bonus, our code now can handle the situation where the server chooses to send the whole file (status code 200 instead of 206).

Fixing problems with Hot Module Replacement

One mistake that we made in the original example is neglecting to include our Zig functions as dependencies of our useEffect hooks. When they change, these hooks aren't rerun. We'll fix that now:

  useEffect(() => {
    openDb('/db.sqlite3')
    return () => closeDb()
  }, [ openDb, closeDb ])
  useEffect(() => {
    findAlbums(deferredSearchString || '%').then(setAlbums)
  }, [ deferredSearchString, findAlbums ])
  useEffect(() => {
    if (selectedAlbumId !== undefined) {
      getTracks(selectedAlbumId).then(setTracks)
    } else {
      setTracks([])
    }
  }, [ selectedAlbumId, getTracks ])

Trying a larger database

The Chinook database is pretty small. Let us now try a larger sample file. Download northwind.db from this Github project. Move it into src/assets and change the page size to 64K. The resulting file should be around 25 MB.

Rework the app so that it uses the following queries:

SELECT a.CustomerID, a.CompanyName, a.Region
FROM Customers a
WHERE a.CompanyName LIKE '%' || ? || '%'
ORDER BY a.CompanyName

SELECT a.OrderID, a.OrderDate, SUM(b.UnitPrice * b.Quantity * (1 - b.Discount)) as Total FROM Orders a
INNER JOIN "Order Details" b ON a.OrderID = b.OrderID
WHERE CustomerID = ?
GROUP BY a.OrderDate

One searches for companies by name. The other returns the orders from a given company, calculating the total of each by summing the prices of the products involved.

Once you have everything working again, you'll discover that the app is essentially unusable. The second query is simply too slow. If we look at the database, we see one potential problem: the total lack of indices:

DB Browser for SQL

So we ask SQLite itself what indices it wants:

$ sqlite3 northwind.db
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
sqlite> .expert
sqlite> SELECT a.OrderID, a.OrderDate, SUM(b.UnitPrice * b.Quantity * (1 - b.Discount)) as Total FROM Orders a
INNER JOIN "Order Details" b ON a.OrderID = b.OrderID
WHERE CustomerID = ?
GROUP BY a.OrderDate; 
CREATE INDEX Orders_idx_0a898c2d ON Orders(CustomerID, OrderDate);

SEARCH a USING COVERING INDEX Orders_idx_0a898c2d (CustomerID=?)
SEARCH b USING INDEX sqlite_autoindex_Order Details_1 (OrderID=?)

sqlite> .expert
sqlite> SELECT a.CustomerID, a.CompanyName, a.Region
FROM Customers a
WHERE a.CompanyName LIKE '%' || ? || '%'
ORDER BY a.CompanyName;
CREATE INDEX Customers_idx_a2d7b248 ON Customers(CompanyName);

SCAN a USING INDEX Customers_idx_a2d7b248

After adding the indices, performance seems to improve a little. The second query still takes an agonizingly long time to run though. Here SQLite is aggregating thousands of rows. It simply must read nearly all parts of the file. The only way to cut execution times down to a tolerable level is to limit the rows returned to something like 5.

Live demo

You can see the web app in action here and here. For some reason Cloudflare's servers refuse to honor range requests for files with a .db extension. That's why the database were given a .jpg extension instead.

Source code

You can find the complete source code for this example here and here.

Conclusion

Using HTTP range requests to handle a database on the client side seems to be a viable solution. Performance is okay as long as we avoid queries that aggregate information from many rows. For something like a CMS or a product catalog, it's adequate.

One feature of SQLite that can very useful for a web app is full text search. We'll explore its use in a future example.

Client side SQLite with remote file

Enabling multithreaded operation

Patching the the standard library

Setting up a work queue

Switching to async

Simplifying the code

Using HTTP range requests

Further optimization

Fixing problems with Hot Module Replacement

Trying a larger database

Live demo

Source code

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!