Skip to content

Commit 6ec7406

Browse files
committed
Add Spark 4.0 debugging examples
- StreamingRateExample: Streaming app using rate source for continuous debugging - SimpleBatchExample: Simple batch app for basic debugging scenarios - Comprehensive README with setup and usage instructions - Gradle and SBT build configurations for IDE support - Examples allow setting breakpoints in connector code during execution
1 parent ce8c148 commit 6ec7406

File tree

7 files changed

+577
-0
lines changed

7 files changed

+577
-0
lines changed

spark-4.0/examples/README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Spark 4.0 ClickHouse Connector Examples
2+
3+
This directory contains example applications for debugging and testing the ClickHouse connector with Spark 4.0.
4+
5+
## Prerequisites
6+
7+
1. **ClickHouse Server Running**
8+
```bash
9+
# Using Docker
10+
docker run -d --name clickhouse-server \
11+
-p 8123:8123 -p 9000:9000 \
12+
--ulimit nofile=262144:262144 \
13+
clickhouse/clickhouse-server
14+
```
15+
16+
2. **Build the Connector**
17+
```bash
18+
cd /Users/shimonsteinitz/Projects/spark-clickhouse-connector
19+
./gradlew -Dspark_binary_version=4.0 -Dscala_binary_version=2.13 :clickhouse-spark-4.0_2.13:build
20+
```
21+
22+
## Running Examples in IDE (for Debugging)
23+
24+
### IntelliJ IDEA / VS Code with Metals
25+
26+
1. **Import the project** as a Gradle project
27+
28+
2. **Set up Run Configuration**:
29+
- Main class: `examples.StreamingRateExample` or `examples.SimpleBatchExample`
30+
- VM options:
31+
```
32+
-Dspark_binary_version=4.0
33+
-Dscala_binary_version=2.13
34+
```
35+
- Working directory: `spark-4.0/examples`
36+
- Classpath: Include `clickhouse-spark-4.0_2.13` module
37+
38+
3. **Set Breakpoints** in connector code:
39+
- Write path: `com.clickhouse.spark.write.ClickHouseWriter`
40+
- Read path: `com.clickhouse.spark.read.ClickHouseReader`
41+
- Catalog operations: `com.clickhouse.spark.ClickHouseCatalog`
42+
43+
4. **Run in Debug Mode** and step through the connector code
44+
45+
## Examples
46+
47+
### 1. SimpleBatchExample
48+
49+
A straightforward batch processing example that:
50+
- Creates sample employee data
51+
- Writes to ClickHouse
52+
- Reads back and performs aggregations
53+
54+
**Good for debugging**:
55+
- Table creation logic
56+
- Batch write operations
57+
- Read operations
58+
- Schema inference
59+
60+
**Run**:
61+
```bash
62+
spark-submit \
63+
--class examples.SimpleBatchExample \
64+
--master local[*] \
65+
--jars clickhouse-spark-runtime-4.0_2.13.jar \
66+
examples/SimpleBatchExample.scala
67+
```
68+
69+
### 2. StreamingRateExample
70+
71+
A streaming application that:
72+
- Uses Spark's rate source (generates synthetic data)
73+
- Enriches data with multiple columns
74+
- Writes to ClickHouse in micro-batches every 5 seconds
75+
- Generates 10 rows per second
76+
77+
**Good for debugging**:
78+
- Streaming write operations
79+
- Micro-batch processing
80+
- Continuous data ingestion
81+
- Performance under load
82+
83+
**Run**:
84+
```bash
85+
spark-submit \
86+
--class examples.StreamingRateExample \
87+
--master local[*] \
88+
--jars clickhouse-spark-runtime-4.0_2.13.jar \
89+
examples/StreamingRateExample.scala
90+
```
91+
92+
**Monitor the stream**:
93+
```sql
94+
-- In ClickHouse client
95+
SELECT count(*) FROM default.streaming_events;
96+
97+
SELECT
98+
event_type,
99+
count(*) as cnt,
100+
avg(metric_value) as avg_metric
101+
FROM default.streaming_events
102+
GROUP BY event_type;
103+
104+
SELECT
105+
toStartOfMinute(event_time) as minute,
106+
count(*) as events_per_minute
107+
FROM default.streaming_events
108+
GROUP BY minute
109+
ORDER BY minute DESC
110+
LIMIT 10;
111+
```
112+
113+
## Debugging Tips
114+
115+
### Enable Debug Logging
116+
117+
Add to your SparkSession configuration:
118+
```scala
119+
.config("spark.sql.catalog.clickhouse.option.log.level", "DEBUG")
120+
```
121+
122+
Or set log level programmatically:
123+
```scala
124+
spark.sparkContext.setLogLevel("DEBUG")
125+
```
126+
127+
### Useful ClickHouse Queries
128+
129+
```sql
130+
-- Check table structure
131+
DESCRIBE TABLE default.streaming_events;
132+
133+
-- Check table engine and settings
134+
SHOW CREATE TABLE default.streaming_events;
135+
136+
-- Monitor inserts
137+
SELECT
138+
table,
139+
sum(rows) as total_rows,
140+
sum(bytes) as total_bytes
141+
FROM system.parts
142+
WHERE database = 'default' AND table IN ('streaming_events', 'employees')
143+
GROUP BY table;
144+
145+
-- Check recent parts
146+
SELECT
147+
partition,
148+
name,
149+
rows,
150+
bytes_on_disk,
151+
modification_time
152+
FROM system.parts
153+
WHERE database = 'default' AND table = 'streaming_events'
154+
ORDER BY modification_time DESC
155+
LIMIT 10;
156+
```
157+
158+
## Troubleshooting
159+
160+
### Connection Issues
161+
162+
If you see connection errors:
163+
```scala
164+
// Verify ClickHouse is accessible
165+
curl http://localhost:8123/ping
166+
```
167+
168+
### Clean Up
169+
170+
```sql
171+
-- Drop tables
172+
DROP TABLE IF EXISTS default.streaming_events;
173+
DROP TABLE IF EXISTS default.employees;
174+
```
175+
176+
```bash
177+
# Remove checkpoint directory
178+
rm -rf /tmp/clickhouse-streaming-checkpoint
179+
```

spark-4.0/examples/SimpleBatchExample.scala

Whitespace-only changes.

spark-4.0/examples/StreamingRateExample.scala

Whitespace-only changes.

spark-4.0/examples/build.gradle

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
/*
2+
* Licensed under the Apache License, Version 2.0 (the "License");
3+
* you may not use this file except in compliance with the License.
4+
* You may obtain a copy of the License at
5+
*
6+
* https://www.apache.org/licenses/LICENSE-2.0
7+
*
8+
* Unless required by applicable law or agreed to in writing, software
9+
* distributed under the License is distributed on an "AS IS" BASIS,
10+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
* See the License for the specific language governing permissions and
12+
* limitations under the License.
13+
*/
14+
15+
plugins {
16+
id "scala"
17+
id "idea"
18+
}
19+
20+
dependencies {
21+
implementation "org.scala-lang:scala-library:${scala_version}"
22+
23+
// Spark APIs are needed at runtime when running from IDE/Gradle
24+
implementation "org.apache.spark:spark-sql_${scala_binary_version}:${spark_40_version}"
25+
implementation "org.apache.spark:spark-streaming_${scala_binary_version}:${spark_40_version}"
26+
27+
// Use connector project classes, and include shaded runtime on classpath
28+
implementation project(":clickhouse-spark-4.0_${scala_binary_version}")
29+
runtimeOnly project(":clickhouse-spark-runtime-4.0_${scala_binary_version}")
30+
}
31+
32+
sourceCompatibility = JavaVersion.VERSION_17
33+
34+
tasks.withType(ScalaCompile).configureEach {
35+
scalaCompileOptions.additionalParameters = [
36+
"-deprecation",
37+
"-feature"
38+
]
39+
}
40+
41+
// Application convenience tasks
42+
tasks.register("runStreaming", JavaExec) {
43+
group = "application"
44+
description = "Run StreamingRateExample"
45+
classpath = sourceSets.main.runtimeClasspath
46+
mainClass = "examples.StreamingRateExample"
47+
jvmArgs "-Dspark_binary_version=4.0", "-Dscala_binary_version=2.13"
48+
}
49+
50+
tasks.register("runBatch", JavaExec) {
51+
group = "application"
52+
description = "Run SimpleBatchExample"
53+
classpath = sourceSets.main.runtimeClasspath
54+
mainClass = "examples.SimpleBatchExample"
55+
jvmArgs "-Dspark_binary_version=4.0", "-Dscala_binary_version=2.13"
56+
}

spark-4.0/examples/build.sbt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
/*
2+
* Licensed under the Apache License, Version 2.0 (the "License");
3+
* you may not use this file except in compliance with the License.
4+
* You may obtain a copy of the License at
5+
*
6+
* https://www.apache.org/licenses/LICENSE-2.0
7+
*
8+
* Unless required by applicable law or agreed to in writing, software
9+
* distributed under the License is distributed on an "AS IS" BASIS,
10+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
* See the License for the specific language governing permissions and
12+
* limitations under the License.
13+
*/
14+
15+
name := "clickhouse-spark-examples"
16+
17+
version := "1.0"
18+
19+
scalaVersion := "2.13.8"
20+
21+
val sparkVersion = "4.0.1"
22+
23+
libraryDependencies ++= Seq(
24+
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
25+
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"
26+
)
27+
28+
// For local development, include the connector from the parent project
29+
unmanagedClasspath in Compile ++= {
30+
val base = baseDirectory.value / ".." / "clickhouse-spark" / "build" / "classes"
31+
Seq(
32+
Attributed.blank(base / "scala" / "main"),
33+
Attributed.blank(base / "java" / "main")
34+
)
35+
}

0 commit comments

Comments
 (0)