Skip to content

Conversation

@lshaowei18
Copy link
Contributor

@lshaowei18 lshaowei18 commented Nov 21, 2025

Problem

Closes #41845

Changes

My suspicion is that the cursor.fetchall for distinct_ids is fetching too much data at one go causing the db connection(?) to go out of memory?

This approach might work for now, but eventually the python array will likely face an memory error.

As mentioned in #22519 (comment), I think we should consider not returning the distinct_ids for CSV exports, or limit the number of distinct_ids that we return.

Listing down the possible solutions I can think of:

  1. Don't include distinct id for person modal csv exports
  2. Don't include distinct id for some queries that have more than certain amount of actors
  3. Include distinct ids, but set a limit to the number of total distinct ids we return

How did you test this code?

I tested that the export still works, but I only have 200+ person in my local so definitely not testing well enough.

Does anyone have any idea if I can setup my demo data easily to test this kind of volume, or write unit/integration tests for this?

)
distinct_ids = cursor.fetchall()
distinct_ids = []
batch_size = 10000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to increase this batch_size more aggressively?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@andyzzhao andyzzhao requested a review from a team November 21, 2025 09:57
@andyzzhao
Copy link
Contributor

andyzzhao commented Nov 21, 2025

Thank you for your PR @lshaowei18. Logs in production show that these queries are timing out, so batching the query makes sense to me

Logs
Traceback (most recent call last):
  File \"/python-runtime/lib/python3.12/site-packages/django/db/backends/utils.py\", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/opentelemetry/instrumentation/psycopg/__init__.py\", line 367, in execute
    return _cursor_tracer.traced_execution(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/opentelemetry/instrumentation/dbapi/__init__.py\", line 593, in traced_execution
    return query_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/psycopg/cursor.py\", line 97, in execute
    raise ex.with_traceback(None)
psycopg.errors.ProtocolViolation: query timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File \"/python-runtime/lib/python3.12/site-packages/rest_framework/views.py\", line 512, in dispatch
    response = handler(request, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/code/posthog/api/monitoring.py\", line 44, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File \"/code/posthog/api/query.py\", line 144, in create
    result = process_query_model(
             ^^^^^^^^^^^^^^^^^^^^
  File \"/code/posthog/api/services/query.py\", line 197, in process_query_model
    result = query_runner.run(
             ^^^^^^^^^^^^^^^^^
  File \"/code/posthog/hogql_queries/query_runner.py\", line 1164, in run
    query_result = self.calculate()
                   ^^^^^^^^^^^^^^^^
  File \"/code/posthog/hogql_queries/query_runner.py\", line 1463, in calculate
    response = self._calculate()
               ^^^^^^^^^^^^^^^^^
  File \"/code/posthog/hogql_queries/actors_query_runner.py\", line 194, in _calculate
    return self._calculate_internal()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/code/posthog/hogql_queries/actors_query_runner.py\", line 148, in _calculate_internal
    actors_lookup = self.strategy.get_actors(actor_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/code/posthog/hogql_queries/actor_strategies.py\", line 71, in get_actors
    cursor.execute(
  File \"/python-runtime/lib/python3.12/site-packages/django/db/backends/utils.py\", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/django/db/backends/utils.py\", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/django/db/backends/utils.py\", line 84, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/django/db/utils.py\", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File \"/python-runtime/lib/python3.12/site-packages/django/db/backends/utils.py\", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/opentelemetry/instrumentation/psycopg/__init__.py\", line 367, in execute
    return _cursor_tracer.traced_execution(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/opentelemetry/instrumentation/dbapi/__init__.py\", line 593, in traced_execution
    return query_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/python-runtime/lib/python3.12/site-packages/psycopg/cursor.py\", line 97, in execute
    raise ex.with_traceback(None)
django.db.utils.OperationalError: query timeout

From: grafana.prod-us.posthog.dev/goto/Q7vrSJivg?orgId=1

Discovered in this ticket: posthoghelp.zendesk.com/agent/tickets/42850

@lshaowei18
Copy link
Contributor Author

Thank you for your PR @lshaowei18. Logs in production show that these queries are timing out, so batching the query makes sense to me

Logs

Thanks for checking the logs!

Hmmm I was looking into fetchMany and it seems like there is a client cursor and server cursor, I haven't quite wrap my head around it yet. https://medium.com/dev-bits/understanding-postgresql-cursors-with-python-ebc3da591fe7

My mild concern is that cursor.execute will try to load all the rows into the cursor, which may still cause the queries to time out.

In that case #41767 of using multiple cursor.execute will make more sense :0

Just thinking out loud here, let me know if you have any thoughts :)

@andyzzhao
Copy link
Contributor

andyzzhao commented Nov 21, 2025

My mild concern is that cursor.execute will try to load all the rows into the cursor, which may still cause the queries to time out.

@lshaowei18 yeah, I think you're right. My assumption was that fetchmany would do multiple queries but it seems like that's not the case.

https://github.com/psycopg/psycopg/blob/b9d533beb5d847ef6837fbd4a011f67730225ffd/psycopg/psycopg/cursor.py#L229-L246

Would you like to update this to the batched execute way instead? I'll approve it if you do.

@lshaowei18
Copy link
Contributor Author

My mild concern is that cursor.execute will try to load all the rows into the cursor, which may still cause the queries to time out.

@lshaowei18 yeah, I think you're right. My assumption was that fetchmany would do multiple queries but it seems like that's not the case.

https://github.com/psycopg/psycopg/blob/b9d533beb5d847ef6837fbd4a011f67730225ffd/psycopg/psycopg/cursor.py#L229-L246

Would you like to update this to the batched execute way instead? I'll approve it if you do.

Thanks for investigating; I learned something new today :)

I have updated the PR: 23f4258

Please feel free to take over or close since the solution is very similar to your PR + you have test coverage :)

@andyzzhao andyzzhao closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CSV export fails with "server closed the connection unexpectedly" for large person lists

2 participants