-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[opt](staticstis) use count(1) for rowCount when scan full table #58153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34374 ms |
TPC-DS: Total hot run time: 188035 ms |
ClickBench: Total hot run time: 27.27 s |
|
run feut |
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
|
run feut |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
when do sample, it will use table.getRowCount() as rowsCount, but the table.getRowCount() may be stale because it depend on BE's report, then it may occur rowsCount < ndv.
Then when if 10 * rowsCount < ndv, the analyze sql will fail.
Then the regression test statistics/analyze_stats.groovy is not stable, and cause error:
so when do sample and scan whole table, we use count(1) to represent rowsCount.
Notice that this replace will not increase the excute cost, because the staticstic sql has contained
count(1).What's morel, when do sample it will collect min and max of the column, then at this SQL, we will also collect Count too.
Then we use count to replace table.getRowCount().
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)