-
Notifications
You must be signed in to change notification settings - Fork 4.9k
docs: update jobs.md with socket mode architecture and speed improvements #69763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: docs-architecture-for-speed
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -15,28 +15,97 @@ Generally, there are 2 types of workload pods: | |||||||||||||||||||
|
|
||||||||||||||||||||
| ## Airbyte Middleware and Bookkeeping Containers | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Inside any connector operation pod, a special airbyte controlled container will run alongside the connector container(s) to process and interpret the results as well as perform necessary side effects. | ||||||||||||||||||||
| Inside any connector operation pod, a special Airbyte-controlled container runs alongside the connector container(s) to process and interpret results and perform necessary side effects. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| There are two types of middleware containers: | ||||||||||||||||||||
| * The Container Orchestrator | ||||||||||||||||||||
| * The Connector Sidecar | ||||||||||||||||||||
| * The Container Orchestrator (legacy mode) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * The Bookkeeper (socket mode) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * The Connector Sidecar (for CHECK, DISCOVER, SPEC operations) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+21
to
+23
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| #### Container Orchestrator | ||||||||||||||||||||
| ### Replication Architecture Modes | ||||||||||||||||||||
|
|
||||||||||||||||||||
| An airbyte controlled container that sits between the source and destination connector containers inside a Replication Pod. | ||||||||||||||||||||
| Airbyte supports two architecture modes for replication (sync) jobs, with the platform automatically selecting the optimal mode based on connector capabilities and connection configuration. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Responsibilities: | ||||||||||||||||||||
| * Hosts middleware capabilities such as scrubbing PPI, aggregating stats, transforming data, and checkpointing progress. | ||||||||||||||||||||
| #### Socket Mode (Bookkeeper) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Socket mode is Airbyte's high-performance architecture that enables 4-10x faster data movement compared to legacy mode. In this mode, data flows directly from source to destination via Unix domain sockets, while control messages (logs, state, statistics) flow through the Bookkeeper via standard I/O. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Architecture:** | ||||||||||||||||||||
| ``` | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
| Source ────→ Unix Socket Files → Destination (direct data transfer) | ||||||||||||||||||||
| │ │ | ||||||||||||||||||||
| └─────────→ Bookkeeper ←───────────┘ | ||||||||||||||||||||
| (control messages, state, logs via STDIO) | ||||||||||||||||||||
| ``` | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Bookkeeper Responsibilities:** | ||||||||||||||||||||
| * Processes control messages from source and destination via STDIO | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * Persists state messages and statistics | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * Handles heartbeating and job lifecycle management | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚫 [vale] reported by reviewdog 🐶 |
||||||||||||||||||||
| * Lightweight resource footprint (1 CPU, 1024Mi memory) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+42
to
+45
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| **Performance Benefits:** | ||||||||||||||||||||
| * **Parallel Processing**: Multiple Unix domain sockets enable concurrent data streams | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * **Binary Serialization**: Protocol Buffers provide efficient data encoding and strong type safety | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * **Lower Latency**: Eliminates STDIO buffering delays | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * **Higher Throughput**: Direct socket communication reduces overhead | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+48
to
+51
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| **Socket Count:** The number of sockets is determined by `min(source_cpu_limit, destination_cpu_limit) * 2`, allowing parallel data transfer. For example, connectors with 4 CPU limits will use 8 sockets. | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| #### Legacy Mode (Container Orchestrator) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Legacy mode uses the traditional STDIO-based architecture where all data flows through the Container Orchestrator. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Architecture:** | ||||||||||||||||||||
| ``` | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
| Source → STDIO → Container Orchestrator → STDIO → Destination | ||||||||||||||||||||
| ``` | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Container Orchestrator Responsibilities:** | ||||||||||||||||||||
| * Sits between source and destination connector containers | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * Hosts middleware capabilities such as scrubbing PII, aggregating stats, transforming data, and checkpointing progress | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚫 [vale] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚫 [vale] reported by reviewdog 🐶 |
||||||||||||||||||||
| * Interprets and records connector operation results | ||||||||||||||||||||
| * Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. ) | ||||||||||||||||||||
| * Handles miscellaneous side effects (logging, auth token refresh flows, etc.) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+65
to
+68
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| #### Architecture Selection | ||||||||||||||||||||
|
|
||||||||||||||||||||
| The platform automatically determines which mode to use based on several factors: | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| **Socket mode is used when ALL conditions are met:** | ||||||||||||||||||||
| 1. Not a file transfer operation | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
| 2. Not a reset operation | ||||||||||||||||||||
| 3. Both source and destination declare IPC capabilities in their metadata | ||||||||||||||||||||
| 4. No hashed fields or mappers configured in the connection | ||||||||||||||||||||
| 5. Matching data channel versions between source and destination | ||||||||||||||||||||
| 6. Both connectors support socket transport | ||||||||||||||||||||
| 7. Compatible serialization format exists (PROTOBUF preferred, JSONL fallback) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Legacy mode is used when:** | ||||||||||||||||||||
| * Any of the above conditions are not met | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||||||||
| * The `ForceRunStdioMode` feature flag is enabled | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * IPC options are missing or incompatible | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+84
to
+86
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| #### State Management in Socket Mode | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Socket mode introduces enhanced state management to support parallel processing and ensure data consistency: | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Partition Identifiers:** Each record and state message includes a `partition_id` (a random alphanumeric string) that links records to their corresponding checkpoint state. This enables the destination to verify that all records from a partition have been received before committing the state. | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| **State Ordering:** State messages include an incrementing `id` field to maintain proper ordering. Since states can arrive on any socket in any order due to parallel processing, the destination uses these IDs to commit states in the correct sequence, ensuring resumability if a sync fails. | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚫 [vale] reported by reviewdog 🐶 |
||||||||||||||||||||
|
|
||||||||||||||||||||
| **Dual State Emission:** In socket mode, state messages are sent to both: | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||||||||
| * The destination via socket (for record count verification and ordering) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶 |
||||||||||||||||||||
| * The Bookkeeper via STDIO (for persistence and platform tracking) | ||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint] reported by reviewdog 🐶
Comment on lines
+97
to
+98
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| This dual emission ensures both the destination and platform maintain consistent state information throughout the sync. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| #### Connector Sidecar | ||||||||||||||||||||
| ### Connector Sidecar | ||||||||||||||||||||
|
|
||||||||||||||||||||
| An airbyte controlled container that reads the output of a connector container inside a Connector Pod (CHECK, DISCOVER, SPEC). | ||||||||||||||||||||
| An Airbyte-controlled container that reads the output of a connector container inside a Connector Pod for non-replication operations (CHECK, DISCOVER, SPEC). | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Responsibilities: | ||||||||||||||||||||
| **Responsibilities:** | ||||||||||||||||||||
| * Interprets and records connector operation results | ||||||||||||||||||||
| * Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. ) | ||||||||||||||||||||
| * Handles miscellaneous side effects (logging, auth token refresh flows, etc.) | ||||||||||||||||||||
|
|
||||||||||||||||||||
|
Comment on lines
107
to
109
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [markdownlint-fix] reported by reviewdog 🐶
Suggested change
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| ## Workload launching architecture | ||||||||||||||||||||
|
|
||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [vale] reported by reviewdog 🐶
[Google.OptionalPlurals] Don't use plurals in parentheses such as in 'container(s)'.