Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 81 additions & 12 deletions docs/platform/understanding-airbyte/jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,97 @@ Generally, there are 2 types of workload pods:

## Airbyte Middleware and Bookkeeping Containers

Inside any connector operation pod, a special airbyte controlled container will run alongside the connector container(s) to process and interpret the results as well as perform necessary side effects.
Inside any connector operation pod, a special Airbyte-controlled container runs alongside the connector container(s) to process and interpret results and perform necessary side effects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Google.OptionalPlurals] Don't use plurals in parentheses such as in 'container(s)'.


There are two types of middleware containers:
* The Container Orchestrator
* The Connector Sidecar
* The Container Orchestrator (legacy mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* The Container Orchestrator (..."]

* The Bookkeeper (socket mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

* The Connector Sidecar (for CHECK, DISCOVER, SPEC operations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +21 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* The Container Orchestrator (legacy mode)
* The Bookkeeper (socket mode)
* The Connector Sidecar (for CHECK, DISCOVER, SPEC operations)
- The Container Orchestrator (legacy mode)
- The Bookkeeper (socket mode)
- The Connector Sidecar (for CHECK, DISCOVER, SPEC operations)


#### Container Orchestrator
### Replication Architecture Modes

An airbyte controlled container that sits between the source and destination connector containers inside a Replication Pod.
Airbyte supports two architecture modes for replication (sync) jobs, with the platform automatically selecting the optimal mode based on connector capabilities and connection configuration.

Responsibilities:
* Hosts middleware capabilities such as scrubbing PPI, aggregating stats, transforming data, and checkpointing progress.
#### Socket Mode (Bookkeeper)

Socket mode is Airbyte's high-performance architecture that enables 4-10x faster data movement compared to legacy mode. In this mode, data flows directly from source to destination via Unix domain sockets, while control messages (logs, state, statistics) flow through the Bookkeeper via standard I/O.

**Architecture:**
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
```
```

Source ────→ Unix Socket Files → Destination (direct data transfer)
│ │
└─────────→ Bookkeeper ←───────────┘
(control messages, state, logs via STDIO)
```

**Bookkeeper Responsibilities:**
* Processes control messages from source and destination via STDIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* Processes control messages f..."]

* Persists state messages and statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

* Handles heartbeating and job lifecycle management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'heartbeating'?

* Lightweight resource footprint (1 CPU, 1024Mi memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +42 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* Processes control messages from source and destination via STDIO
* Persists state messages and statistics
* Handles heartbeating and job lifecycle management
* Lightweight resource footprint (1 CPU, 1024Mi memory)
- Processes control messages from source and destination via STDIO
- Persists state messages and statistics
- Handles heartbeating and job lifecycle management
- Lightweight resource footprint (1 CPU, 1024Mi memory)


**Performance Benefits:**
* **Parallel Processing**: Multiple Unix domain sockets enable concurrent data streams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* Parallel Processing: Mul..."]

* **Binary Serialization**: Protocol Buffers provide efficient data encoding and strong type safety
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

* **Lower Latency**: Eliminates STDIO buffering delays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

* **Higher Throughput**: Direct socket communication reduces overhead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +48 to +51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* **Parallel Processing**: Multiple Unix domain sockets enable concurrent data streams
* **Binary Serialization**: Protocol Buffers provide efficient data encoding and strong type safety
* **Lower Latency**: Eliminates STDIO buffering delays
* **Higher Throughput**: Direct socket communication reduces overhead
- **Parallel Processing**: Multiple Unix domain sockets enable concurrent data streams
- **Binary Serialization**: Protocol Buffers provide efficient data encoding and strong type safety
- **Lower Latency**: Eliminates STDIO buffering delays
- **Higher Throughput**: Direct socket communication reduces overhead


**Socket Count:** The number of sockets is determined by `min(source_cpu_limit, destination_cpu_limit) * 2`, allowing parallel data transfer. For example, connectors with 4 CPU limits will use 8 sockets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Colons] ': T' should be in lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Will] Avoid using 'will'.


#### Legacy Mode (Container Orchestrator)

Legacy mode uses the traditional STDIO-based architecture where all data flows through the Container Orchestrator.

**Architecture:**
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
```
```

Source → STDIO → Container Orchestrator → STDIO → Destination
```

**Container Orchestrator Responsibilities:**
* Sits between source and destination connector containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* Sits between source and dest..."]

* Hosts middleware capabilities such as scrubbing PII, aggregating stats, transforming data, and checkpointing progress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'middleware'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'checkpointing'?

* Interprets and records connector operation results
* Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. )
* Handles miscellaneous side effects (logging, auth token refresh flows, etc.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +65 to +68
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* Sits between source and destination connector containers
* Hosts middleware capabilities such as scrubbing PII, aggregating stats, transforming data, and checkpointing progress
* Interprets and records connector operation results
* Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. )
* Handles miscellaneous side effects (logging, auth token refresh flows, etc.)
- Sits between source and destination connector containers
- Hosts middleware capabilities such as scrubbing PII, aggregating stats, transforming data, and checkpointing progress
- Interprets and records connector operation results
- Handles miscellaneous side effects (logging, auth token refresh flows, etc.)


#### Architecture Selection

The platform automatically determines which mode to use based on several factors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[write-good.Weasel] 'several' is a weasel word!


**Socket mode is used when ALL conditions are met:**
1. Not a file transfer operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "1. Not a file transfer operati..."]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
1. Not a file transfer operation
1. Not a file transfer operation

2. Not a reset operation
3. Both source and destination declare IPC capabilities in their metadata
4. No hashed fields or mappers configured in the connection
5. Matching data channel versions between source and destination
6. Both connectors support socket transport
7. Compatible serialization format exists (PROTOBUF preferred, JSONL fallback)

**Legacy mode is used when:**
* Any of the above conditions are not met
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* Any of the above conditions ..."]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.WordList] Use 'preceding' instead of 'above'.

* The `ForceRunStdioMode` feature flag is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

* IPC options are missing or incompatible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +84 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* Any of the above conditions are not met
* The `ForceRunStdioMode` feature flag is enabled
* IPC options are missing or incompatible
- Any of the above conditions are not met
- The `ForceRunStdioMode` feature flag is enabled
- IPC options are missing or incompatible


#### State Management in Socket Mode

Socket mode introduces enhanced state management to support parallel processing and ensure data consistency:

**Partition Identifiers:** Each record and state message includes a `partition_id` (a random alphanumeric string) that links records to their corresponding checkpoint state. This enables the destination to verify that all records from a partition have been received before committing the state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Colons] ': E' should be in lowercase.


**State Ordering:** State messages include an incrementing `id` field to maintain proper ordering. Since states can arrive on any socket in any order due to parallel processing, the destination uses these IDs to commit states in the correct sequence, ensuring resumability if a sync fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Colons] ': S' should be in lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Spelling] Did you really mean 'resumability'?


**Dual State Emission:** In socket mode, state messages are sent to both:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Colons] ': I' should be in lowercase.

* The destination via socket (for record count verification and ordering)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* The destination via socket (..."]

* The Bookkeeper via STDIO (for persistence and platform tracking)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk]

Comment on lines +97 to +98
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* The destination via socket (for record count verification and ordering)
* The Bookkeeper via STDIO (for persistence and platform tracking)
- The destination via socket (for record count verification and ordering)
- The Bookkeeper via STDIO (for persistence and platform tracking)


This dual emission ensures both the destination and platform maintain consistent state information throughout the sync.

#### Connector Sidecar
### Connector Sidecar

An airbyte controlled container that reads the output of a connector container inside a Connector Pod (CHECK, DISCOVER, SPEC).
An Airbyte-controlled container that reads the output of a connector container inside a Connector Pod for non-replication operations (CHECK, DISCOVER, SPEC).

Responsibilities:
**Responsibilities:**
* Interprets and records connector operation results
* Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. )
* Handles miscellaneous side effects (logging, auth token refresh flows, etc.)

Comment on lines 107 to 109
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change
* Interprets and records connector operation results
* Handles miscellaneous side effects (e.g. logging, auth token refresh flows, etc. )
* Handles miscellaneous side effects (logging, auth token refresh flows, etc.)
- Interprets and records connector operation results
- Handles miscellaneous side effects (logging, auth token refresh flows, etc.)


## Workload launching architecture
Expand Down
Loading