Skip to content

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Nov 19, 2025

What problem does this PR solve?

when do sample, it will use table.getRowCount() as rowsCount, but the table.getRowCount() may be stale because it depend on BE's report, then it may occur rowsCount < ndv.

Then when if 10 * rowsCount < ndv, the analyze sql will fail.

Then the regression test statistics/analyze_stats.groovy is not stable, and cause error:

Exception:
java.sql.SQLException: errCode = 2, detailMessage = Failed to analyze following columns:[id] Reasons: java.lang.RuntimeException: ColStatsData is invalid, skip analyzing. ('1763112020393--1-id',0,1763112019723,1763112020393,-1,'id',null,1,16,0,'1','201',64,'2025-11-14 17:41:14','105 :0.06 ;104 :0.06 ;103 :0.06 ;102 :0.06 ;101 :0.06 ;10 :0.06 ;9 :0.06 ;8 :0.06 ;7 :0.06 ;6 :0.06')
  at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
  at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
  at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:371)
  at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
  at org.apache.doris.regression.util.JdbcUtils$_executeToList_closure1.doCall(JdbcUtils.groovy:47)
  at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
  at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
  at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)

so when do sample and scan whole table, we use count(1) to represent rowsCount.

Notice that this replace will not increase the excute cost, because the staticstic sql has contained count(1).

What's morel, when do sample it will collect min and max of the column, then at this SQL, we will also collect Count too.

Then we use count to replace table.getRowCount().

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Nov 19, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34374 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

------ Round 1 ----------------------------------
q1	17615	5053	4867	4867
q2	2062	325	209	209
q3	10229	1295	718	718
q4	10225	933	373	373
q5	7529	2344	2352	2344
q6	186	174	137	137
q7	898	737	605	605
q8	9344	1360	1139	1139
q9	6982	5409	5388	5388
q10	6817	2237	1848	1848
q11	495	299	285	285
q12	345	363	241	241
q13	17787	3619	3001	3001
q14	236	229	212	212
q15	565	500	500	500
q16	1030	1025	947	947
q17	577	861	355	355
q18	7569	7192	7167	7167
q19	1098	955	547	547
q20	354	343	230	230
q21	3696	3108	2277	2277
q22	1072	1029	984	984
Total cold run time: 106711 ms
Total hot run time: 34374 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4892	4885	4907	4885
q2	308	383	327	327
q3	2150	2645	2322	2322
q4	1391	1777	1322	1322
q5	4192	4354	4721	4354
q6	229	179	140	140
q7	2038	1973	1835	1835
q8	2601	2604	2564	2564
q9	7606	7521	7555	7521
q10	3029	3381	2813	2813
q11	596	515	501	501
q12	702	774	640	640
q13	3486	4147	3239	3239
q14	291	298	279	279
q15	551	486	483	483
q16	1038	1104	1079	1079
q17	1231	1593	1393	1393
q18	7933	7580	7501	7501
q19	783	780	902	780
q20	2029	2016	1926	1926
q21	5079	4455	4327	4327
q22	1093	1039	974	974
Total cold run time: 53248 ms
Total hot run time: 51205 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

query1	1037	410	389	389
query2	6552	1674	1678	1674
query3	6784	231	232	231
query4	25779	23393	22595	22595
query5	4919	679	532	532
query6	360	263	244	244
query7	4655	512	311	311
query8	319	277	259	259
query9	8694	2922	2935	2922
query10	514	361	320	320
query11	15933	15023	14778	14778
query12	202	126	148	126
query13	1683	579	438	438
query14	12465	9173	9067	9067
query15	242	190	176	176
query16	7700	667	515	515
query17	1588	764	621	621
query18	2028	428	325	325
query19	218	197	178	178
query20	129	129	123	123
query21	220	133	116	116
query22	4020	4240	4185	4185
query23	33856	33113	33178	33113
query24	7778	2360	2363	2360
query25	685	596	495	495
query26	1223	281	174	174
query27	2704	511	372	372
query28	4382	2272	2241	2241
query29	832	674	544	544
query30	302	234	203	203
query31	936	793	726	726
query32	99	87	88	87
query33	597	405	366	366
query34	790	865	524	524
query35	838	854	778	778
query36	964	994	912	912
query37	134	122	100	100
query38	3479	3620	3431	3431
query39	1533	1455	1407	1407
query40	244	138	143	138
query41	70	69	66	66
query42	131	121	120	120
query43	474	482	454	454
query44	1261	789	806	789
query45	194	189	176	176
query46	895	1008	649	649
query47	1794	1823	1757	1757
query48	416	429	339	339
query49	792	512	436	436
query50	647	688	410	410
query51	3909	3908	3843	3843
query52	119	124	114	114
query53	247	265	203	203
query54	350	340	317	317
query55	96	97	91	91
query56	368	366	357	357
query57	1194	1193	1103	1103
query58	371	298	288	288
query59	2554	2689	2570	2570
query60	383	368	351	351
query61	166	163	166	163
query62	784	708	658	658
query63	232	194	194	194
query64	4440	1184	899	899
query65	4075	3923	3977	3923
query66	1109	441	365	365
query67	15388	15136	14876	14876
query68	8477	954	628	628
query69	507	354	306	306
query70	1315	1333	1264	1264
query71	524	355	325	325
query72	6032	4948	4907	4907
query73	700	596	364	364
query74	9241	9145	8656	8656
query75	4093	3231	2796	2796
query76	3817	1135	761	761
query77	813	398	339	339
query78	9411	10209	8868	8868
query79	2670	869	627	627
query80	687	584	527	527
query81	482	252	231	231
query82	654	164	136	136
query83	287	267	247	247
query84	298	118	95	95
query85	909	486	456	456
query86	358	326	325	325
query87	3673	3703	3656	3656
query88	3264	2196	2235	2196
query89	391	332	295	295
query90	2055	240	228	228
query91	170	164	133	133
query92	87	79	76	76
query93	1170	988	687	687
query94	711	438	332	332
query95	421	336	335	335
query96	494	589	277	277
query97	2982	2924	2870	2870
query98	258	226	221	221
query99	1420	1406	1264	1264
Total cold run time: 278247 ms
Total hot run time: 188035 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

query1	0.05	0.04	0.05
query2	0.08	0.04	0.04
query3	0.25	0.08	0.08
query4	1.61	0.12	0.11
query5	0.29	0.25	0.25
query6	1.20	0.65	0.63
query7	0.03	0.02	0.03
query8	0.06	0.04	0.05
query9	0.57	0.53	0.52
query10	0.57	0.57	0.57
query11	0.17	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.60	0.61
query14	1.00	1.00	1.00
query15	0.84	0.82	0.83
query16	0.39	0.38	0.40
query17	1.01	1.02	1.01
query18	0.22	0.20	0.21
query19	1.87	1.88	1.87
query20	0.02	0.01	0.01
query21	15.45	0.20	0.13
query22	5.01	0.07	0.05
query23	15.69	0.26	0.11
query24	2.22	1.30	0.31
query25	0.07	0.07	0.06
query26	0.14	0.13	0.13
query27	0.06	0.06	0.05
query28	3.27	1.15	0.95
query29	12.56	3.83	3.19
query30	0.29	0.14	0.12
query31	2.82	0.58	0.38
query32	3.24	0.54	0.47
query33	3.03	2.99	3.05
query34	15.86	5.10	4.54
query35	4.57	4.51	4.62
query36	0.67	0.51	0.49
query37	0.09	0.07	0.07
query38	0.07	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.14	0.14
query41	0.09	0.04	0.04
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.5 s
Total hot run time: 27.27 s

@yujun777
Copy link
Contributor Author

run feut

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@yujun777
Copy link
Contributor Author

run p0

@yujun777
Copy link
Contributor Author

run feut

Jibing-Li
Jibing-Li previously approved these changes Nov 20, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants