Skip to content

Commit 7f00753

Browse files
authored
Spark 4 support (#452)
* Copy Spark 3.5 module to Spark 4.0 directory structure - Copied spark-3.5/clickhouse-spark to spark-4.0/clickhouse-spark - Copied spark-3.5/clickhouse-spark-it to spark-4.0/clickhouse-spark-it - Copied build.gradle with module name references updated to 4.0 - This is an exact copy of Spark 3.5 code before API adaptations - No functional changes, only directory structure and module naming * Add Spark 4.0 API compatibility changes - Updated AnalysisException constructor to require errorClass and messageParameters - Updated ArrowUtils.toArrowSchema to include largeVarTypes parameter (false for compatibility) - These are breaking API changes in Spark 4.0 that require code updates * Update Gradle configuration for Spark 4.0 support - Added Spark 4.0.1 version configuration to gradle.properties - Updated ANTLR version to 4.13.1 to match Spark 4.0 dependencies - Added Spark 4.0 examples module to settings.gradle - Updated build.gradle with Spark 4.0 compatibility settings * Fix test infrastructure trait linearization for Spark 4.0 - Added BeforeAndAfterAll trait to ClickHouseSingleMixIn - Fixed trait linearization order: BeforeAndAfterAll before ForAllTestContainer - Added logging for container lifecycle debugging - Resolves Spark 4.0 test compilation issues with trait conflicts * Add Spark 4.0 to CI workflows - Added Spark 4.0 with Scala 2.13 to build-and-test workflow - Added Spark 4.0 to check-license and style workflows with Java 17 - Added Spark 4.0 to cloud and tpcds workflows - Excluded Spark 4.0 + Scala 2.12 combinations (not supported) - Configured Java 17 requirement for Spark 4.0 builds * Add Spark 4.0 debugging examples - StreamingRateExample: Streaming app using rate source for continuous debugging - SimpleBatchExample: Simple batch app for basic debugging scenarios - Comprehensive README with setup and usage instructions - Gradle and SBT build configurations for IDE support - Examples allow setting breakpoints in connector code during execution * Replace println with proper SLF4J logging in test fixtures - Added Logging trait to ClickHouseSingleMixIn - Replaced all println statements with log.info() calls - Provides structured, filterable logging for test execution - Maintains test output visibility while enabling proper log management * Test all Spark versions with both Java 8 and Java 17 - Updated build-and-test.yml: Added Java 8 & 17 matrix to run-tests-with-specific-clickhouse - Updated cloud.yml: Test all Spark 3.x versions with both Java 8 & 17 - Updated tpcds.yml: Test all Spark 3.x versions with both Java 8 & 17 - Spark 4.0 continues to use Java 17 only (requires Java 11+) - Updated artifact names to include Java version for better debugging * Fix CI: Remove Java 8 from run-tests-with-specific-clickhouse ANTLR 4.13.2 (required for Spark 4.0) needs Java 11+ to compile. The run-tests job already handles Java 8 testing for Spark 3.x versions. This job now only uses Java 17 to avoid ANTLR compilation errors. * Fix comments and update dependency versions - Fix CI comments: Spark 4.0 requires Java 17+, not Java 11+ - Update commons-codec to 1.17.2 (align with Spark 4.0.1) - Update scalatest to 3.2.19 (align with Spark 4.0.1) - Update flexmark to 0.64.8 (latest stable) * Combine CI test jobs into comprehensive matrix Merged run-tests and run-tests-with-specific-clickhouse into a single job that tests all combinations of: - ClickHouse versions: 25.3, 25.6, 25.7, latest - Java versions: 8, 17 - Scala versions: 2.12, 2.13 - Spark versions: 3.3, 3.4, 3.5, 4.0 With proper exclusions: - Spark 4.0 only supports Scala 2.13 - Spark 4.0 requires Java 17+ Total: 52 test jobs (down from 64 with exclusions) - Spark 3.x: 16 jobs each (4 CH × 2 Java × 2 Scala) - Spark 4.0: 4 jobs (4 CH × Java 17 × Scala 2.13) * Update ClickHouse test versions to latest releases Changed from: 25.3, 25.6, 25.7, latest Changed to: 25.6, 25.7, 25.8, 25.9, latest This adds testing for newer ClickHouse versions (25.8, 25.9) while maintaining coverage of recent stable releases. Total test jobs increases from 52 to 65 (5 CH versions instead of 4). * Revert flexmark to 0.62.2 for Java 8 compatibility Flexmark 0.64.8 requires Java 11+ which breaks tests running with Java 8. Version 0.62.2 is the last version compatible with Java 8. * Handle missing Scala version properties gracefully Use findProperty with fallback values in build.gradle instead of getProperty. This allows us to omit spark_40_scala_212_version from gradle.properties since Spark 4.0 doesn't support Scala 2.12. If the property is missing, it falls back to sensible defaults: - scala_212_version: 2.12.18 - scala_213_version: 2.13.8
1 parent 32da21c commit 7f00753

File tree

77 files changed

+9147
-56
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+9147
-56
lines changed

.github/workflows/build-and-test.yml

Lines changed: 13 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,19 @@ jobs:
2121
strategy:
2222
fail-fast: false
2323
matrix:
24+
clickhouse: [ 25.6, 25.7, 25.8, 25.9, latest ]
2425
java: [ 8, 17 ]
25-
scala: [ 2.12, 2.13 ]
26-
spark: [ 3.3, 3.4, 3.5 ]
26+
scala: [ '2.12', '2.13' ]
27+
spark: [ '3.3', '3.4', '3.5', '4.0' ]
28+
exclude:
29+
# Spark 4.0 only supports Scala 2.13
30+
- spark: '4.0'
31+
scala: '2.12'
32+
# Spark 4.0 requires Java 17+
33+
- spark: '4.0'
34+
java: 8
35+
env:
36+
CLICKHOUSE_IMAGE: clickhouse/clickhouse-server:${{ matrix.clickhouse }}
2737
steps:
2838
- uses: actions/checkout@v4
2939
- uses: actions/setup-java@v4
@@ -40,34 +50,7 @@ jobs:
4050
if: failure()
4151
uses: actions/upload-artifact@v4
4252
with:
43-
name: log-java-${{ matrix.java }}-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}
44-
path: |
45-
**/build/unit-tests.log
46-
log/**
47-
48-
run-tests-with-specific-clickhouse:
49-
runs-on: ubuntu-22.04
50-
strategy:
51-
fail-fast: false
52-
matrix:
53-
clickhouse: [ 25.3, 25.6, 25.7, latest ]
54-
env:
55-
CLICKHOUSE_IMAGE: clickhouse/clickhouse-server:${{ matrix.clickhouse }}
56-
steps:
57-
- uses: actions/checkout@v4
58-
- uses: actions/setup-java@v4
59-
with:
60-
distribution: zulu
61-
java-version: 8
62-
cache: gradle
63-
- run: >-
64-
./gradlew clean test --no-daemon --refresh-dependencies
65-
-PmavenCentralMirror=https://maven-central.storage-download.googleapis.com/maven2/
66-
- name: Upload logs
67-
if: failure()
68-
uses: actions/upload-artifact@v4
69-
with:
70-
name: log-clickhouse-${{ matrix.clickhouse }}
53+
name: log-ch-${{ matrix.clickhouse }}-java-${{ matrix.java }}-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}
7154
path: |
7255
**/build/unit-tests.log
7356
log/**

.github/workflows/check-license.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,22 @@ jobs:
2929
strategy:
3030
fail-fast: false
3131
matrix:
32-
spark: [ 3.3, 3.4, 3.5 ]
32+
spark: [ "3.3", "3.4", "3.5", "4.0" ]
33+
include:
34+
- spark: "3.3"
35+
java: 8
36+
- spark: "3.4"
37+
java: 8
38+
- spark: "3.5"
39+
java: 8
40+
- spark: "4.0"
41+
java: 17
3342
steps:
3443
- uses: actions/checkout@v4
3544
- uses: actions/setup-java@v4
3645
with:
3746
distribution: zulu
38-
java-version: 8
47+
java-version: ${{ matrix.java }}
3948
- run: >-
4049
./gradlew rat --no-daemon
4150
-Dspark_binary_version=${{ matrix.spark }}

.github/workflows/cloud.yml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,16 @@ jobs:
3434
max-parallel: 1
3535
fail-fast: false
3636
matrix:
37-
spark: [ 3.3, 3.4, 3.5 ]
38-
scala: [ 2.12, 2.13 ]
37+
spark: [ '3.3', '3.4', '3.5', '4.0' ]
38+
scala: [ '2.12', '2.13' ]
39+
java: [ 8, 17 ]
40+
exclude:
41+
# Spark 4.0 only supports Scala 2.13
42+
- spark: '4.0'
43+
scala: '2.12'
44+
# Spark 4.0 requires Java 17+
45+
- spark: '4.0'
46+
java: 8
3947
env:
4048
CLICKHOUSE_CLOUD_HOST: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_HOST_SMT }}
4149
CLICKHOUSE_CLOUD_PASSWORD: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_PASSWORD_SMT }}
@@ -44,7 +52,7 @@ jobs:
4452
- uses: actions/setup-java@v4
4553
with:
4654
distribution: zulu
47-
java-version: 8
55+
java-version: ${{ matrix.java }}
4856
cache: gradle
4957
- name: Wake up ClickHouse Cloud instance
5058
env:
@@ -80,7 +88,7 @@ jobs:
8088
if: failure()
8189
uses: actions/upload-artifact@v4
8290
with:
83-
name: log-clickhouse-cloud-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}
91+
name: log-clickhouse-cloud-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}-java-${{ matrix.java }}
8492
path: |
8593
**/build/unit-tests.log
8694
log/**

.github/workflows/style.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,22 @@ jobs:
3030
strategy:
3131
fail-fast: false
3232
matrix:
33-
spark: [ 3.3, 3.4, 3.5 ]
33+
spark: [ "3.3", "3.4", "3.5", "4.0" ]
34+
include:
35+
- spark: "3.3"
36+
java: 8
37+
- spark: "3.4"
38+
java: 8
39+
- spark: "3.5"
40+
java: 8
41+
- spark: "4.0"
42+
java: 17
3443
steps:
3544
- uses: actions/checkout@v4
3645
- uses: actions/setup-java@v4
3746
with:
3847
distribution: zulu
39-
java-version: 8
48+
java-version: ${{ matrix.java }}
4049
cache: gradle
4150
- run: >-
4251
./gradlew spotlessCheck --no-daemon --refresh-dependencies

.github/workflows/tpcds.yml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,22 @@ jobs:
3030
strategy:
3131
fail-fast: false
3232
matrix:
33-
spark: [ 3.3, 3.4, 3.5 ]
34-
scala: [ 2.12, 2.13 ]
33+
spark: [ '3.3', '3.4', '3.5', '4.0' ]
34+
scala: [ '2.12', '2.13' ]
35+
java: [ 8, 17 ]
36+
exclude:
37+
# Spark 4.0 only supports Scala 2.13
38+
- spark: '4.0'
39+
scala: '2.12'
40+
# Spark 4.0 requires Java 17+
41+
- spark: '4.0'
42+
java: 8
3543
steps:
3644
- uses: actions/checkout@v4
3745
- uses: actions/setup-java@v4
3846
with:
3947
distribution: zulu
40-
java-version: 8
48+
java-version: ${{ matrix.java }}
4149
cache: gradle
4250
- run: >-
4351
./gradlew clean slowTest --no-daemon --refresh-dependencies
@@ -48,7 +56,7 @@ jobs:
4856
if: failure()
4957
uses: actions/upload-artifact@v4
5058
with:
51-
name: log-tpcds-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}
59+
name: log-tpcds-spark-${{ matrix.spark }}-scala-${{ matrix.scala }}-java-${{ matrix.java }}
5260
path: |
5361
**/build/unit-tests.log
5462
log/**

build.gradle

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ project.ext {
4949
spark_prefix = "spark_${spark_binary_version.replace('.', '')}"
5050
scala_prefix = "scala_${scala_binary_version.replace('.', '')}"
5151

52-
scala_212_version = project.getProperty("${spark_prefix}_scala_212_version")
53-
scala_213_version = project.getProperty("${spark_prefix}_scala_213_version")
52+
scala_212_version = project.findProperty("${spark_prefix}_scala_212_version") ?: "2.12.18"
53+
scala_213_version = project.findProperty("${spark_prefix}_scala_213_version") ?: "2.13.8"
5454
scala_version = project.getProperty("${scala_prefix}_version")
5555

5656
antlr_version = project.getProperty("${spark_prefix}_antlr_version")
@@ -106,7 +106,11 @@ allprojects {
106106
subprojects {
107107
apply plugin: "scala"
108108
apply plugin: "java-library"
109-
apply plugin: "org.scoverage"
109+
// Disable scoverage when running Metals' bloopInstall to avoid plugin resolution issues
110+
def isBloopInstall = gradle.startParameter.taskNames.any { it.contains('bloopInstall') }
111+
if (!project.hasProperty('disableScoverage') && !isBloopInstall) {
112+
apply plugin: "org.scoverage"
113+
}
110114
apply plugin: "com.diffplug.spotless"
111115
apply plugin: "com.github.maiflai.scalatest"
112116

@@ -168,11 +172,13 @@ subprojects {
168172
}
169173
}
170174

171-
scoverage {
172-
scoverageVersion = "2.0.11"
173-
reportDir.set(file("${rootProject.buildDir}/reports/scoverage"))
174-
highlighting.set(false)
175-
minimumRate.set(0.0)
175+
if (plugins.hasPlugin('org.scoverage')) {
176+
scoverage {
177+
scoverageVersion = "2.0.11"
178+
reportDir.set(file("${rootProject.buildDir}/reports/scoverage"))
179+
highlighting.set(false)
180+
minimumRate.set(0.0)
181+
}
176182
}
177183

178184
spotless {

clickhouse-core/src/testFixtures/scala/com/clickhouse/spark/base/ClickHouseSingleMixIn.scala

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,18 @@ package com.clickhouse.spark.base
1717
import com.clickhouse.spark.Utils
1818
import com.clickhouse.data.ClickHouseVersion
1919
import com.dimafeng.testcontainers.{ForAllTestContainer, JdbcDatabaseContainer, SingleContainer}
20+
import org.scalatest.BeforeAndAfterAll
2021
import org.scalatest.funsuite.AnyFunSuite
22+
import org.slf4j.LoggerFactory
2123
import org.testcontainers.containers.ClickHouseContainer
2224
import org.testcontainers.utility.{DockerImageName, MountableFile}
2325
import java.nio.file.{Path, Paths}
2426
import scala.collection.JavaConverters._
2527

26-
trait ClickHouseSingleMixIn extends AnyFunSuite with ForAllTestContainer with ClickHouseProvider {
28+
trait ClickHouseSingleMixIn extends AnyFunSuite with BeforeAndAfterAll with ForAllTestContainer
29+
with ClickHouseProvider {
30+
31+
private val logger = LoggerFactory.getLogger(getClass)
2732
// format: off
2833
private val CLICKHOUSE_IMAGE: String = Utils.load("CLICKHOUSE_IMAGE", "clickhouse/clickhouse-server:23.8")
2934
private val CLICKHOUSE_USER: String = Utils.load("CLICKHOUSE_USER", "default")
@@ -34,6 +39,8 @@ trait ClickHouseSingleMixIn extends AnyFunSuite with ForAllTestContainer with Cl
3439
private val CLICKHOUSE_TPC_PORT = 9000
3540
// format: on
3641

42+
logger.info(s"Initializing with ClickHouse image: $CLICKHOUSE_IMAGE")
43+
3744
override val clickhouseVersion: ClickHouseVersion = ClickHouseVersion.of(CLICKHOUSE_IMAGE.split(":").last)
3845

3946
protected val rootProjectDir: Path = {
@@ -80,4 +87,20 @@ trait ClickHouseSingleMixIn extends AnyFunSuite with ForAllTestContainer with Cl
8087
override def clickhousePassword: String = CLICKHOUSE_PASSWORD
8188
override def clickhouseDatabase: String = CLICKHOUSE_DB
8289
override def isSslEnabled: Boolean = false
90+
91+
override def beforeAll(): Unit = {
92+
val startTime = System.currentTimeMillis()
93+
logger.info(s"Starting ClickHouse container: $CLICKHOUSE_IMAGE")
94+
super.beforeAll() // This starts the container and makes mappedPort available
95+
val duration = System.currentTimeMillis() - startTime
96+
logger.info(
97+
s"ClickHouse container started in ${duration}ms at ${container.host}:${container.mappedPort(CLICKHOUSE_HTTP_PORT)}"
98+
)
99+
}
100+
101+
override def afterAll(): Unit = {
102+
logger.info("Stopping ClickHouse container")
103+
super.afterAll()
104+
logger.info("ClickHouse container stopped")
105+
}
83106
}

gradle.properties

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ mavenCentralMirror=https://repo1.maven.org/maven2/
1616
mavenSnapshotsRepo=https://central.sonatype.com/repository/maven-snapshots/
1717
mavenReleasesRepo=https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/
1818

19-
systemProp.scala_binary_version=2.12
19+
systemProp.scala_binary_version=2.13
2020
systemProp.known_scala_binary_versions=2.12,2.13
21-
systemProp.spark_binary_version=3.5
22-
systemProp.known_spark_binary_versions=3.3,3.4,3.5
21+
systemProp.spark_binary_version=4.0
22+
systemProp.known_spark_binary_versions=3.3,3.4,3.5,4.0
2323

2424
group=com.clickhouse.spark
2525

@@ -29,6 +29,7 @@ clickhouse_client_v2_version=0.9.4
2929
spark_33_version=3.3.4
3030
spark_34_version=3.4.2
3131
spark_35_version=3.5.1
32+
spark_40_version=4.0.1
3233

3334
spark_33_scala_212_version=2.12.15
3435
spark_34_scala_212_version=2.12.17
@@ -37,22 +38,26 @@ spark_35_scala_212_version=2.12.18
3738
spark_33_scala_213_version=2.13.8
3839
spark_34_scala_213_version=2.13.8
3940
spark_35_scala_213_version=2.13.8
41+
spark_40_scala_213_version=2.13.8
4042

4143
spark_33_antlr_version=4.8
4244
spark_34_antlr_version=4.9.3
4345
spark_35_antlr_version=4.9.3
46+
spark_40_antlr_version=4.13.1
4447

4548
spark_33_jackson_version=2.13.4
4649
spark_34_jackson_version=2.14.2
4750
spark_35_jackson_version=2.15.2
51+
spark_40_jackson_version=2.17.0
4852

4953
spark_33_slf4j_version=1.7.32
5054
spark_34_slf4j_version=2.0.6
5155
spark_35_slf4j_version=2.0.7
56+
spark_40_slf4j_version=2.0.7
5257

5358
# Align with Apache Spark, and don't bundle them in release jar.
5459
commons_lang3_version=3.12.0
55-
commons_codec_version=1.16.0
60+
commons_codec_version=1.17.2
5661

5762
# javax annotations removed in jdk 11
5863
# fix build error with jakarta annotations
@@ -61,5 +66,5 @@ jakarta_annotation_api_version=1.3.5
6166
# Test only
6267
kyuubi_version=1.9.2
6368
testcontainers_scala_version=0.41.2
64-
scalatest_version=3.2.16
69+
scalatest_version=3.2.19
6570
flexmark_version=0.62.2

settings.gradle

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,8 @@ project(":clickhouse-spark-runtime-${spark_binary_version}_$scala_binary_version
4242
include ":clickhouse-spark-it-${spark_binary_version}_$scala_binary_version"
4343
project(":clickhouse-spark-it-${spark_binary_version}_$scala_binary_version").projectDir = file("spark-${spark_binary_version}/clickhouse-spark-it")
4444
project(":clickhouse-spark-it-${spark_binary_version}_$scala_binary_version").name = "clickhouse-spark-it-${spark_binary_version}_$scala_binary_version"
45+
46+
// Examples module for running/debugging sample apps in IDE
47+
include ":clickhouse-examples-${spark_binary_version}_$scala_binary_version"
48+
project(":clickhouse-examples-${spark_binary_version}_$scala_binary_version").projectDir = file("spark-${spark_binary_version}/examples")
49+
project(":clickhouse-examples-${spark_binary_version}_$scala_binary_version").name = "clickhouse-examples-${spark_binary_version}_$scala_binary_version"

0 commit comments

Comments
 (0)