-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix(export): CSV export for actor's query fails with postgres error #41879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ) | ||
| distinct_ids = cursor.fetchall() | ||
| distinct_ids = [] | ||
| batch_size = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to increase this batch_size more aggressively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
|
Thank you for your PR @lshaowei18. Logs in production show that these queries are timing out, so batching the query makes sense to me LogsFrom: grafana.prod-us.posthog.dev/goto/Q7vrSJivg?orgId=1 Discovered in this ticket: posthoghelp.zendesk.com/agent/tickets/42850 |
Thanks for checking the logs! Hmmm I was looking into My mild concern is that In that case #41767 of using multiple Just thinking out loud here, let me know if you have any thoughts :) |
@lshaowei18 yeah, I think you're right. My assumption was that Would you like to update this to the batched execute way instead? I'll approve it if you do. |
Thanks for investigating; I learned something new today :) I have updated the PR: 23f4258 Please feel free to take over or close since the solution is very similar to your PR + you have test coverage :) |
Problem
Closes #41845
Changes
My suspicion is that the
cursor.fetchallfordistinct_idsis fetching too much data at one go causing the db connection(?) to go out of memory?This approach might work for now, but eventually the python array will likely face an memory error.
As mentioned in #22519 (comment), I think we should consider not returning the
distinct_idsfor CSV exports, or limit the number of distinct_ids that we return.Listing down the possible solutions I can think of:
How did you test this code?
I tested that the export still works, but I only have 200+ person in my local so definitely not testing well enough.
Does anyone have any idea if I can setup my demo data easily to test this kind of volume, or write unit/integration tests for this?