Update java client version #428

mzitnik · 2025-10-26T13:49:05Z

Summary

Checklist

Delete items not relevant to your PR:

Unit and integration tests covering the common scenarios were added
A human-readable description of the changes was provided to include in CHANGELOG
For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

# Conflicts: # clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala

…ot support

windsurf-bot

Other comments (20)

spark-3.5/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (71-72) The compression codec parameter has been commented out in the class definition, but there's still a reference to it in the commented code on line 72. Either remove the commented code entirely or update the implementation to handle compression with the new API.
clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala (176-183) The syncInsert method creates an empty ByteArrayInputStream (`is`) to pass to the deserializer after a successful insert, but this stream contains no data. The deserializer will likely fail when trying to process this empty stream. You should use the actual response data from the InsertResponse instead.
gradle.properties (16-16) The new Maven snapshots repository URL `https://central.sonatype.com/repository/maven-snapshots/` appears to be incorrect. This is a web UI URL, not a repository URL that can be used for dependency resolution.
The correct URL format should be something like https://s01.oss.sonatype.org/content/repositories/snapshots/ (the previous value) or https://oss.sonatype.org/content/repositories/snapshots/.
```
mavenSnapshotsRepo=https://s01.oss.sonatype.org/content/repositories/snapshots/
```
spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (44-44) The compression codec parameter is commented out on line 44, but it seems the code might still need this parameter. The call to `nodeClient.queryAndCheck()` on line 72 has a comment suggesting the codec parameter was removed, but this might cause issues if compression is still needed elsewhere in the code.
spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (74-74) The `totalBlocksRead` method now returns a hardcoded `0L` instead of getting the actual blocks read from the response. This will cause inaccurate metrics reporting. If the new API version doesn't support this metric, consider finding an alternative way to track blocks read or document this limitation.
spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (109-109) Converting a String to bytes without specifying an encoding could lead to platform-dependent results. Consider checking if the value is already a byte array first, or specify an explicit encoding:
```
      case BinaryType => value match {
        case bytes: Array[Byte] => bytes
        case str: String => str.getBytes("UTF-8")
        case _ => value.toString.getBytes("UTF-8")
      }
```
build.gradle (248-250) The PR changes the main dependency from `clickhouse-jdbc` to `client-v2` in the clickhouse-core project, but `clickhouse-jdbc` is still being used in the clickhouse-core-it project. This could lead to version conflicts or unexpected behavior if both libraries have different APIs or behaviors. Consider updating the test dependency to use the same client library as the main code.
spark-3.5/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (109-109) Casting a String to bytes for BinaryType might cause encoding issues if the value is already binary data. Consider checking if the value is already a byte array first.
gradle.properties (26-27) You're changing from a stable release version (`0.6.3`) to a SNAPSHOT version (`0.9.2-SNAPSHOT`) of the ClickHouse JDBC driver. Using SNAPSHOT dependencies in production code can lead to build instability and unpredictable behavior, as these versions can change without notice.
Is there a specific reason to use a SNAPSHOT instead of a stable release? If you need features from 0.9.x, consider waiting for a stable release or at least document why a SNAPSHOT is necessary.
clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala (175-179) There's a redundant creation of ByteArrayInputStream. You read all bytes into `payload` and then create a new ByteArrayInputStream with the same payload. You can simplify this by using the payload array directly in a single ByteArrayInputStream.
spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/write/ClickHouseWriter.scala (254-254) The comment `// , codec` doesn't provide any useful information. If this parameter was removed due to an API change in the Java client, it would be better to add a more descriptive comment explaining why the parameter was removed.
spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (44-44) The compression codec parameter has been removed from the queryAndCheck method call, but the codec field is just commented out rather than properly removed. If compression is no longer needed with the updated Java client, consider removing the field entirely rather than commenting it out.
clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala (161-165) The `readAllBytes` implementation is inefficient as it reads one byte at a time and builds an array using Stream. Consider using Java's built-in `InputStream.readAllBytes()` method (available since Java 9) or a more efficient buffered reading approach.
spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (109-109) Casting a String directly to bytes for BinaryType might cause encoding issues if the value is already binary data. Consider checking if the value is already a byte array first.
spark-3.5/clickhouse-spark/src/main/scala/com/clickhouse/spark/write/ClickHouseWriter.scala (256-256) The commented out `codec` parameter should be removed since it's no longer needed with the updated Java client version.
spark-3.5/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (89-89) This commented out code can be removed as it's not needed - the proper check is implemented on the next line.
spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (76-76) The method to get bytes read has changed from `resp.getSummary.getReadBytes` to `resp.getReadBytes()`, but there's still a commented reference to the old method. Consider removing the comment to avoid confusion for future developers.
spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (89-89) This commented out code should be removed as it's redundant with the active implementation on the next line that uses `isInstanceOf`.
spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/format/ClickHouseBinaryReader.scala (89-89) This commented out line should be removed as it's not needed and the implementation is already using `isInstanceOf` in the next line.
```
      case IntegerType if value.isInstanceOf[java.lang.Long] =>
```
spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala (71-71) There's an inline comment `// , codec` on line 71 that appears to be a reminder of a removed parameter. Consider removing this comment or replacing it with a more descriptive explanation of why the parameter was removed.

💡 To request another review, post a new comment with "/windsurf-review".

windsurf-bot · 2025-10-29T05:45:47Z

spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/write/ClickHouseWriter.scala

+      // , codec
+      client.syncInsertOutputJSONEachRow(database, table, format, new ByteArrayInputStream(data)) match {


There's a commented-out parameter and explanatory comment that should be removed before merging. If the codec parameter is no longer needed in the method call due to API changes, we should just remove it cleanly without leaving commented code.

windsurf-bot · 2025-10-29T05:45:47Z

spark-3.4/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala

+  lazy val resp: QueryResponse = nodeClient.queryAndCheck(scanQuery, format)

-  def totalBlocksRead: Long = resp.getSummary.getStatistics.getBlocks
+  def totalBlocksRead: Long = 0L // resp.getSummary.getStatistics.getBlocks


The totalBlocksRead method now returns a hardcoded 0L instead of getting the actual value from the response. This will cause the BLOCKS_READ metric to always report 0, which could be misleading for monitoring and debugging purposes.

windsurf-bot · 2025-10-29T05:45:47Z

spark-3.5/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala

+  lazy val resp: QueryResponse = nodeClient.queryAndCheck(scanQuery, format)

-  def totalBlocksRead: Long = resp.getSummary.getStatistics.getBlocks
+  def totalBlocksRead: Long = 0L // resp.getSummary.getStatistics.getBlocks


The totalBlocksRead method now returns a hardcoded 0L instead of getting the actual block count from the response. This will cause inaccurate metrics reporting. Consider updating this method to retrieve the block count from the new QueryResponse API.

spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala

* Wake up ClickHouse Cloud instance before tests (#429) * fix: Handle FixedString as plain text in JSON reader for all Spark versions Problem: ClickHouse returns FixedString as plain text in JSON format, but the connector was trying to decode it as Base64, causing InvalidFormatException. Solution: Use pattern matching with guard to check if the JSON node is textual. - If textual (FixedString): decode as UTF-8 bytes - If not textual (true binary): decode as Base64 Applied to Spark 3.3, 3.4, and 3.5. --------- Co-authored-by: Bentsi Leviav <[email protected]> Co-authored-by: Shimon Steinitz <[email protected]>

* Wake up ClickHouse Cloud instance before tests (#429) * feat: Add comprehensive read test coverage for Spark 3.3, 3.4, and 3.5 Add shared test trait ClickHouseReaderTestBase with 48 test scenarios covering: - All primitive types (Boolean, Byte, Short, Int, Long, Float, Double) - Large integers (UInt64, Int128, UInt128, Int256, UInt256) - Decimals (Decimal32, Decimal64, Decimal128) - Date/Time types (Date, Date32, DateTime, DateTime32, DateTime64) - String types (String, UUID, FixedString) - Enums (Enum8, Enum16) - IP addresses (IPv4, IPv6) - JSON data - Collections (Arrays, Maps) - Edge cases (empty strings, long strings, empty arrays, nullable variants) Test suites for Binary and JSON read formats. Test results: 96 tests per Spark version (288 total) - Binary format: 47/48 passing - JSON format: 47/48 passing - Overall: 94/96 passing per version (98% pass rate) Remaining failures are known bugs with fixes on separate branches. * feat: Add comprehensive write test coverage for Spark 3.3, 3.4, and 3.5 Add shared test trait ClickHouseWriterTestBase with 17 test scenarios covering: - Primitive types (Boolean, Byte, Short, Int, Long, Float, Double) - Decimal types - String types (regular and empty strings) - Date and Timestamp types - Collections (Arrays and Maps, including empty variants) - Nullable variants Test suites for JSON and Arrow write formats. Note: Binary write format is not supported (only JSON and Arrow). Test results: 34 tests per Spark version (102 total) - JSON format: 17/17 passing (100%) - Arrow format: 17/17 passing (100%) - Overall: 34/34 passing per version (100% pass rate) Known behavior: Boolean values write as BooleanType but read back as ShortType (0/1) due to ClickHouse storing Boolean as UInt8. * style: Apply spotless formatting * style: Apply spotless formatting for Spark 3.3 and 3.4 Remove trailing whitespace from test files to pass CI spotless checks. * fix: Change write format from binary to arrow in BinaryReaderSuite The 'binary' write format option doesn't exist. Changed to 'arrow' which is a valid write format option. Applied to Spark 3.3, 3.4, and 3.5. * test: Add nullable tests for ShortType, IntegerType, and LongType Added missing nullable variant tests to ensure comprehensive coverage: - decode ShortType - nullable with null values (Nullable(Int16)) - decode IntegerType - nullable with null values (Nullable(Int32)) - decode LongType - nullable with null values (Nullable(Int64)) These tests verify that nullable primitive types correctly handle NULL values in both Binary and JSON read formats. Applied to Spark 3.3, 3.4, and 3.5. Total tests per Spark version: 51 (was 48) Total across all versions: 153 (was 144) * Refactor ClickHouseReaderTestBase: Add nullable tests and organize alphabetically - Add missing nullable test cases for: Date32, Decimal32, Decimal128, UInt16, UUID, DateTime64 - Organize all 69 tests alphabetically by data type for better maintainability - Ensure comprehensive coverage with both nullable and non-nullable variants for all data types - Apply changes consistently across Spark 3.3, 3.4, and 3.5 * ci: Skip cloud tests on forks where secrets are unavailable Add repository check to cloud workflow to prevent failures on forks that don't have access to ClickHouse Cloud secrets. Tests will still run on the main repository where secrets are properly configured. * Refactor and enhance Reader/Writer tests for all Spark versions - Add BooleanType tests to Reader (2 tests) with format-aware assertions - Add 6 new tests to Writer: nested arrays, arrays with nullable elements, multiple Decimal precisions (18,4 and 38,10), Map with nullable values, and StructType - Reorder all tests lexicographically for better organization - Writer tests increased from 17 to 33 tests - Reader tests increased from 69 to 71 tests - Remove section header comments for cleaner code - Apply changes to all Spark versions: 3.3, 3.4, and 3.5 - All tests now properly sorted alphabetically by data type and variant * style: Apply spotless formatting to Reader/Writer tests --------- Co-authored-by: Bentsi Leviav <[email protected]> Co-authored-by: Shimon Steinitz <[email protected]>

- Fix DecimalType: Handle both BigInteger (Int256/UInt256) and BigDecimal (Decimal types) - Fix ArrayType: Direct call to BinaryStreamReader.ArrayValue.getArrayOfObjects() - Fix StringType: Handle UUID, InetAddress, and EnumValue types - Fix DateType: Handle both LocalDate and ZonedDateTime - Fix MapType: Handle all util.Map implementations Removed reflection and defensive pattern matching for better performance. All 34 Binary Reader test failures are now fixed (71/71 tests passing). Fixes compatibility with new Java client API in update-java-client-version branch.

- Add Decimal(18,4) test with 0.001 tolerance for JSON/Arrow formats - Documents precision limitation for decimals with >15-17 significant digits - Uses tolerance-based assertions to account for observed precision loss - Binary format preserves full precision (already tested in Binary Reader suite) - All 278 tests passing

…sion

CLAassistant · 2025-11-09T14:26:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ mzitnik
✅ ShimonSte
❌ Shimon Steinitz

Shimon Steinitz seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

- Convert mutable.ArraySeq to Array in ClickHouseJsonReader to ensure immutable collections - Add test workaround for Spark's Row.getSeq behavior in Scala 2.13 - Fix Spotless formatting: remove trailing whitespace in ClickHouseBinaryReader - Applied to all Spark versions: 3.3, 3.4, 3.5

mzitnik added 10 commits September 30, 2025 13:46

Upgrade Java Client to V2 syncQuery & syncInsert

6023292

Refactor to use the new client v2 api

e5e84d6

Add timeout to query operation

42b8130

Clean NodeClient

04a0f20

Change binary reader

7424653

Update client version

69c8157

Fix project to use snapshots

9550fc9

Merge branch 'main' into update-java-client-version

595ee08

# Conflicts: # clickhouse-core/src/main/scala/com/clickhouse/spark/client/NodeClient.scala

merge with main

54222aa

run spotlessScalaApply and implement readAllBytes since java 8 does n…

5a388f2

…ot support

mzitnik marked this pull request as ready for review October 29, 2025 05:41

mzitnik requested a review from BentsiLeviav October 29, 2025 05:41

windsurf-bot bot reviewed Oct 29, 2025

View reviewed changes

BentsiLeviav reviewed Oct 29, 2025

View reviewed changes

spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala Show resolved Hide resolved

BentsiLeviav reviewed Oct 29, 2025

View reviewed changes

spark-3.3/clickhouse-spark/src/main/scala/com/clickhouse/spark/read/ClickHouseReader.scala Show resolved Hide resolved

Remove unneeded remarks

fba180d

BentsiLeviav approved these changes Oct 30, 2025

View reviewed changes

mzitnik added 2 commits November 2, 2025 08:31

Chanage to client version 0.9.3

c138afb

Update socket timeout in new client

6677a1b

BentsiLeviav self-requested a review November 2, 2025 09:52

mzitnik and others added 10 commits November 2, 2025 16:54

Change max connections to 20

2c63484

ConnectTimeout to 1200000

5e4fe1e

Add 3 sec to sleep

79a2e56

Setting a new setConnectionRequestTimeout for experiment

4aebe3c

spotlessScalaApply fix

3d24e80

Merge fix/binary-reader-java-client-types into update-java-client-ver…

b3face6

…sion

Simplify build-and-test workflow trigger to run on all pushes

8beb2bc

ShimonSte force-pushed the update-java-client-version branch from 7ea939c to 8beb2bc Compare November 9, 2025 14:29

ShimonSte force-pushed the update-java-client-version branch from 982511d to 2687b1f Compare November 10, 2025 09:13

Update java version to 0.9.4

18b4fcb

mzitnik force-pushed the update-java-client-version branch from 98365d4 to 18b4fcb Compare November 13, 2025 09:45

mzitnik added 5 commits November 13, 2025 15:31

Enable compression

1e056f0

add logging TPCDSClusterSuite & change client buffers

d35b624

Change InputStream read code

5dfbb77

Remove hard coded settings for experiments

fe3548f

Clean log from insert method

724785f

BentsiLeviav approved these changes Nov 16, 2025

View reviewed changes

BentsiLeviav merged commit 32da21c into main Nov 16, 2025
40 of 52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update java client version #428

Update java client version #428

Uh oh!

mzitnik commented Oct 26, 2025

Uh oh!

windsurf-bot bot left a comment

Uh oh!

windsurf-bot bot Oct 29, 2025

Uh oh!

windsurf-bot bot Oct 29, 2025

Uh oh!

windsurf-bot bot Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Nov 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		// , codec
		client.syncInsertOutputJSONEachRow(database, table, format, new ByteArrayInputStream(data)) match {

Update java client version #428

Update java client version #428

Uh oh!

Conversation

mzitnik commented Oct 26, 2025

Summary

Checklist

Uh oh!

windsurf-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

windsurf-bot bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

windsurf-bot bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

windsurf-bot bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Nov 9, 2025 •

edited

Loading