Skip to content

Conversation

@myloginid
Copy link

Title: HDFS content-server signer + YAML config; select signer for HDFS/WebHDFS

Why

  • Enable Delta Sharing over HDFS without cloud object storage by returning short‑lived, signed URLs to an external Content Server that streams from HDFS. Provider stays stateless; data remains in HDFS.

Summary of changes (delta-sharing submodule)

  • New signer
    • server/src/main/scala/io/delta/sharing/server/common/HdfsFileSigner.scala: Signs Ed25519 JWT tokens and returns /get?token=... URLs pointing at a Content Server.
  • Select signer for HDFS
    • standalone/internal/DeltaSharedTable.scala: choose HdfsFileSigner when table data path is on org.apache.hadoop.hdfs.DistributedFileSystem, org.apache.hadoop.hdfs.web.WebHdfsFileSystem, or org.apache.hadoop.hdfs.web.SWebHdfsFileSystem.
    • kernel/internal/DeltaSharedTableKernel.scala: matches the same logic for Kernel path.
  • YAML configuration (optional, overrides env)
    • server/src/main/scala/io/delta/sharing/server/config/ServerConfig.scala: add hdfsSigner with fields:
      • contentServerBase: base URL for the Content Server (e.g., https://content.example.com).
      • signingPrivateKeyFile: PEM Ed25519 private key path.
      • audience (optional): JWT aud to embed/enforce.
      • kid (optional): key id for rotation.
    • server/src/main/scala/io/delta/sharing/server/DeltaSharingService.scala: configure HdfsFileSigner from YAML if provided; falls back to env/-D.
    • config/delta-sharing-server.yaml.sample: shows hdfsSigner: block with comments.
  • Build
    • build.sbt: add org.bitbucket.b_c:jose4j for Ed25519 JWT signing.

Behavior & compatibility

  • Cloud storage paths (S3/ABFS/GCS) are unchanged; their signers remain default.
  • HDFS/WebHDFS tables receive signed URLs for a separate Content Server, which enforces token constraints and streams bytes with range support.
  • YAML is optional; environment variables remain supported and backwards compatible.

Content Server (reference implementation in companion repo)

  • Implements /get?token=... endpoint with:
    • Ed25519 verification, optional audience enforcement.
    • Range: bytes=... handling; returns 206 + Content-Range.
    • WebHDFS backend and per-connection throttling.
    • Example YAML: config/content-server.yaml.sample.

Security

  • Ed25519 JWT, short TTL (e.g., 5–15 minutes) recommended.
  • Optional aud and kid support; rotation by adding new public key to Content Server.

Testing

  • Builds server and runs existing suites.
  • Manual verification:
    1. Start Content Server with public key and WebHDFS base.
    2. Start Provider with a table on HDFS and hdfsSigner configured (YAML or env).
    3. Call Delta Sharing query and fetch file URLs; verify they are /get?token=....
    4. curl -H 'Range: bytes=0-1023' the returned URL; expect 206 Partial Content.

Docs

  • Sample YAML includes hdfsSigner block; comments explain fields and operational guidance.

Notes

  • All new code paths are gated by HDFS/WebHDFS FS detection. No changes to S3/ABFS/GCS flows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant