Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797

mac-chaffee · 2023-01-30T23:01:25Z

Describe the bug

When running something like select * from bigtable on a table with many rows, RAM usage of pgAdmin baloons to the size of the table size. This happens on the "query tool", which is supposed to fetch and display only 1000 rows at a time, which makes me think this is a bug.

To Reproduce

Steps to reproduce the behavior:

Create a table with lots of rows, say 2GB in size
Constantly record RAM usage of the pgAdmin server process with a high frequency
In the query tool, run select * from bigtable

Observe that RAM usage spiked about 2GB in size while the query was executing, then dropped back down.

Expected behavior

RAM usage should remain fairly low, about the size it would take to hold 1000 rows (give or take). You can see this expected behavior by doing the same query in plain psql after running \set FETCH_COUNT 1000.

Error message

If RAM usage was large enough to invoke the OOMKiller, user will see an error like "Error communicating with the pgAdmin server process".

Desktop (please complete the following information):

OS: Linux
Version: 6.17
Mode: Desktop
Package type: DEB

The text was updated successfully, but these errors were encountered:

mac-chaffee · 2023-02-04T15:09:58Z

I'm thinking the issue may be related to the decision not to use server-side cursors here: #947

My understanding is that both sync and async queries load the whole result into RAM. The only difference is that async queries don't block when they start doing that. So async queries give you the increased responsiveness, but not reduced RAM usage.

mac-chaffee · 2023-02-04T15:13:26Z

Also wanted to add: while testing I was noticing memory leaking. I'd run a large query then a small query, but the RAM usage of the large query would stick around, even after closing the query tool. But it wasn't super consistent

adityatoshniwal · 2023-02-06T05:05:37Z

Hi @mac-chaffee,
Can please share how did check the RAM usage of the large query? Screenshots will help.

mac-chaffee · 2023-02-06T13:05:37Z

@adityatoshniwal I ran top -p $(pgrep -f pgAdmin4.py) -d 0.2 to measure RAM usage on Linux. On macOS, ActivityMonitor can be used if you configure a faster refresh rate (I think). Not sure about Windows

mac-chaffee · 2023-02-07T23:30:44Z

If running pgAdmin in the official container, just top -d 0.2 will work.

I also think I know why I saw memory-leaky issues. Without server-side cursors, every time you run a query, the full result is loaded into RAM, just in the background. This is controlled by StartRunningQuery. But since the transaction object is designed to out-live the request cycle, running a large query means you have a large transaction objects still occupying RAM even if the user has closed the query tool. So server-side cursors should fix that issue as well.

(Running a different query after running a large query does seem to free up the RAM usage of the large query, so I take back what I said earlier. I only see memory leak when closing the query tool.)

adityatoshniwal · 2023-02-08T04:42:43Z

@mac-chaffee From what I can see, server side cursors are already used. The RAM usage must be because of UI storing the rows. Need to dig more.

mac-chaffee · 2023-02-08T15:00:08Z

Here's a screenshot of the devtools after I run a large query (~6 million rows, ~2GB) with SERVER_MODE=False:

Note that the initial request (/sqleditor/query_tool/start/) takes 4.39 seconds but the reponse size is tiny. While that request is running, I see the RAM usage quickly increase to the full size of the table on one of the gunicorn workers:

Mem: 345007488K used, 1238222416K free, 25181496K shrd, 8308K buff, 199792396K cached
CPU:   2% usr   0% sys   0% nic  97% idle   0% io   0% irq   0% sirq
Load average: 3.54 3.78 3.76 5/4683 115
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   99    12 root     S    2190m   0%  28   0% {gunicorn} /venv/bin/python3 /venv/bin/gunicorn --limit-request-line 8190 --timeout 86400 --bind [::]:80 -w 1 --threads 25 --access-
   12     1 root     S    27936   0%  28   0% {gunicorn} /venv/bin/python3 /venv/bin/gunicorn --limit-request-line 8190 --timeout 86400 --bind [::]:80 -w 1 --threads 25 --access-

After that first request returns, the content is fetched via /sqleditor/poll/. That returns right away (since the rows are all present on the pgAdmin server side) and the response size is only 141KB. I can confirm that the response was exactly 1000 rows.

At this point, top is still showing the exact same amount of RAM usage: ~2GB. Even closing the query tool entirely does not free up that RAM.

Server cursors are supported here:

pgadmin4/web/pgadmin/utils/driver/psycopg2/connection.py

Line 630 in cf94860

def __cursor(self, server_cursor=False):

But that function is never called with server_cursor=True. You'd need to pass that parameter here:

pgadmin4/web/pgadmin/utils/driver/psycopg2/connection.py

Lines 1024 to 1040 in cf94860

    
               def execute_async(self, query, params=None, formatted_exception_msg=True): 
        
                   """ 
        
                   This function executes the given query asynchronously and returns 
        
                   result. 
        
                   Args: 
        
                       query: SQL query to run. 
        
                       params: extra parameters to the function 
        
                       formatted_exception_msg: if True then function return the 
        
                       formatted exception message 
        
                   """ 
        
                   # Convert the params based on python_encoding 
        
                   params = self.escape_params_sqlascii(params) 
        
                   self.__async_cursor = None 
        
                   status, cur = self.__cursor()

Note the drawbacks mentioned in the other ticket I linked above, not sure if this is still applicable or not: https://www.postgresql.org/message-id/flat/CAFiP3vxaM9ixpmaM04JAkTT1ONwmnMPwUCbK%2BkzCN%2Br6F%2B7-ZA%40mail.gmail.com#CAFiP3vxaM9ixpmaM04JAkTT1ONwmnMPwUCbK+kzCN+r6F+7-ZA@mail.gmail.com

mac-chaffee · 2023-02-08T15:39:01Z

I tried just setting server_cursor=True and got this error:

2023-02-08 15:32:50,920: ERROR    pgadmin:    asynchronous connections cannot produce named cursors                                                                               
Traceback (most recent call last):                                                                                                                                                
  File "/pgadmin4/pgadmin/utils/driver/psycopg2/connection.py", line 694, in __cursor                                                                                             
    cur = self.conn.cursor(                                                                                                                                                       
psycopg2.ProgrammingError: asynchronous connections cannot produce named cursors                                                                                                  
2023-02-08 15:32:50,937: ERROR    pgadmin:    Failed to create cursor for psycopg2 connection with error message for the server#1:omop_covid:                                     
asynchronous connections cannot produce named cursors                                                                                                                             
2023-02-08 15:32:50,938: ERROR    pgadmin:    'NoneType' object has no attribute 'execute'                                                                                        
Traceback (most recent call last):                                                                                                                                                
  File "/venv/lib/python3.10/site-packages/flask/app.py", line 1517, in full_dispatch_request                                                                                     
    rv = self.dispatch_request()                                                                                                                                                  
  File "/venv/lib/python3.10/site-packages/flask/app.py", line 1503, in dispatch_request                                                                                          
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)                                                                                                  
  File "/venv/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view                                                                                     
    return current_app.ensure_sync(func)(*args, **kwargs)                                                                                                                         
  File "/pgadmin4/pgadmin/tools/sqleditor/__init__.py", line 831, in start_query_tool                                                                                             
    return StartRunningQuery(blueprint, current_app.logger).execute(                                                                                                              
  File "/pgadmin4/pgadmin/tools/sqleditor/utils/start_running_query.py", line 88, in execute                                                                                      
    result, status = self.__execute_query(                                                                                                                                        
  File "/pgadmin4/pgadmin/tools/sqleditor/utils/start_running_query.py", line 139, in __execute_query                                                                             
    status, result = conn.execute_async(sql)                                                                                                                                      
  File "/pgadmin4/pgadmin/utils/driver/psycopg2/connection.py", line 1086, in execute_async                                                                                       
    cur.execute(query, params)                                                                                                                                                    
AttributeError: 'NoneType' object has no attribute 'execute'

Seems you can't mix async queries and server-side cursors together.

EDIT: If you also set self.async_ = 0, I get ERROR: can't use a named cursor outside of transactions.

mac-chaffee · 2023-02-08T15:58:23Z

It may also be good to take inspiration from psql itself. It pretty much always fetches the whole result synchronously except for when FETCH_COUNT is set or for a few other edge cases where it uses the equivalent of server-side cursors: https://github.com/postgres/postgres/blob/03023a2664f8950ad522385ff75ce004bc932a7c/src/bin/psql/common.c#L1108-L1122

…arge volumes of data. pgadmin-org#5797

mac-chaffee added the Bug label Jan 30, 2023

akshay-joshi added this to Not Refined Feb 1, 2023

pravesh-sharma added EDB Sprint 135 EDB Sprint 136 and removed EDB Sprint 135 labels Feb 3, 2023

pravesh-sharma assigned khushboovashi Feb 3, 2023

pravesh-sharma removed this from Not Refined Feb 3, 2023

pravesh-sharma added this to Ready for Sprint Bugs Feb 3, 2023

akshay-joshi added EDB Sprint 137 and removed EDB Sprint 136 labels Feb 15, 2023

akshay-joshi added this to Current Sprint (195) Mar 1, 2023

akshay-joshi removed this from Ready for Sprint Bugs Mar 1, 2023

akshay-joshi moved this to 🆕 New in Current Sprint (195) Mar 1, 2023

akshay-joshi added EDB Sprint 138 and removed EDB Sprint 137 labels Mar 15, 2023

khushboovashi moved this from 🆕 New to 🏗 In Progress in Current Sprint (195) Mar 27, 2023

akshay-joshi added EDB Sprint 139 and removed EDB Sprint 138 labels Mar 29, 2023

khushboovashi moved this from 🏗 In Progress to 🆕 New in Current Sprint (195) Apr 3, 2023

akshay-joshi added EDB Sprint 140 and removed EDB Sprint 139 labels Apr 12, 2023

akshay-joshi added the EDB Sprint 183 label Dec 18, 2024

akshay-joshi added EDB Sprint 184 and removed EDB Sprint 183 labels Jan 2, 2025

khushboovashi moved this from 🆕 New to 🏗 In Progress in Current Sprint (195) Jan 7, 2025

khushboovashi moved this from 🏗 In Progress to 🆕 New in Current Sprint (195) Jan 15, 2025

akshay-joshi added EDB Sprint 185 and removed EDB Sprint 184 labels Jan 15, 2025

akshay-joshi added EDB Sprint 186 and removed EDB Sprint 185 labels Jan 29, 2025

khushboovashi mentioned this issue Jan 31, 2025

pgadmin is OOMKilled by kubernetes when querying big-ish tables (6.5 million rows) #8402

Open

khushboovashi moved this from 🆕 New to 🏗 In Progress in Current Sprint (195) Feb 3, 2025

akshay-joshi added EDB Sprint 187 and removed EDB Sprint 186 labels Feb 12, 2025

akshay-joshi added EDB Sprint 188 and removed EDB Sprint 187 labels Feb 26, 2025

akshay-joshi added EDB Sprint 189 and removed EDB Sprint 188 labels Mar 12, 2025

akshay-joshi added EDB Sprint 190 and removed EDB Sprint 189 labels Mar 26, 2025

akshay-joshi added EDB Sprint 191 and removed EDB Sprint 190 labels Apr 9, 2025

akshay-joshi added EDB Sprint 192 and removed EDB Sprint 191 labels Apr 23, 2025

akshay-joshi added EDB Sprint 193 and removed EDB Sprint 192 labels May 7, 2025

akshay-joshi added EDB Sprint 194 and removed EDB Sprint 193 labels May 21, 2025

akshay-joshi added EDB Sprint 195 and removed EDB Sprint 194 labels Jun 4, 2025

khushboovashi added a commit to khushboovashi/pgadmin4 that referenced this issue Jun 9, 2025

Implement a server-side cursor to improve performance when fetching l…

8d35c53

…arge volumes of data. pgadmin-org#5797

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797

Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797

mac-chaffee commented Jan 30, 2023 •

edited

Loading

mac-chaffee commented Feb 4, 2023

Uh oh!

mac-chaffee commented Feb 4, 2023

Uh oh!

adityatoshniwal commented Feb 6, 2023

Uh oh!

mac-chaffee commented Feb 6, 2023

Uh oh!

mac-chaffee commented Feb 7, 2023

Uh oh!

adityatoshniwal commented Feb 8, 2023

Uh oh!

mac-chaffee commented Feb 8, 2023 •

edited

Loading

Uh oh!

mac-chaffee commented Feb 8, 2023 •

edited

Loading

Uh oh!

mac-chaffee commented Feb 8, 2023

Uh oh!

Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797

Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797

Comments

mac-chaffee commented Jan 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mac-chaffee commented Feb 4, 2023

Uh oh!

mac-chaffee commented Feb 4, 2023

Uh oh!

adityatoshniwal commented Feb 6, 2023

Uh oh!

mac-chaffee commented Feb 6, 2023

Uh oh!

mac-chaffee commented Feb 7, 2023

Uh oh!

adityatoshniwal commented Feb 8, 2023

Uh oh!

mac-chaffee commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mac-chaffee commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mac-chaffee commented Feb 8, 2023

Uh oh!

mac-chaffee commented Jan 30, 2023 •

edited

Loading

mac-chaffee commented Feb 8, 2023 •

edited

Loading

mac-chaffee commented Feb 8, 2023 •

edited

Loading