Skip to content

Improve performance issue for DELETE command #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2015

Conversation

zzet
Copy link
Contributor

@zzet zzet commented Oct 29, 2015

Before

                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on log_hierarchies  (cost=4403.03..6923802.33 rows=169553156 width=6) (actual time=52728.567..52728.567 rows=0 loops=1)
   ->  Seq Scan on log_hierarchies  (cost=4403.03..6923802.33 rows=169553156 width=6) (actual time=52728.564..52728.564 rows=0 loops=1)
         Filter: ((hashed SubPlan 1) OR (descendant_id = 1022566))
         Rows Removed by Filter: 338996939
         SubPlan 1
           ->  Unique  (cost=0.57..4403.03 rows=1 width=4) (actual time=0.085..0.085 rows=0 loops=1)
                 ->  Index Only Scan using log_anc_desc_idx on log_hierarchies log_hierarchies_1  (cost=0.57..4396.98 rows=2420 width=4) (actual time=0.085..0.085 rows=0 loops=1)
                       Index Cond: (ancestor_id = 1022566)
                       Heap Fetches: 0
 Planning time: 0.223 ms
 Execution time: 52728.657 ms
(11 rows)

After

                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on log_hierarchies  (cost=1.15..5006.48 rows=11962 width=34) (actual time=11.178..11.178 rows=0 loops=1)
   ->  Nested Loop  (cost=1.15..5006.48 rows=11962 width=34) (actual time=0.126..10.466 rows=561 loops=1)
         ->  Subquery Scan on "ANY_subquery"  (cost=0.57..4403.04 rows=1 width=32) (actual time=0.077..0.172 rows=33 loops=1)
               ->  Unique  (cost=0.57..4403.03 rows=1 width=4) (actual time=0.050..0.111 rows=33 loops=1)
                     ->  Index Only Scan using log_anc_desc_idx on log_hierarchies log_hierarchies_1  (cost=0.57..4396.98 rows=2420 width=4) (actual time=0.048..0.080 rows=33 loops=1)
                           Index Cond: (ancestor_id = 1022533)
                           Heap Fetches: 33
         ->  Index Scan using log_desc_idx on log_hierarchies  (cost=0.57..483.82 rows=11962 width=10) (actual time=0.024..0.306 rows=17 loops=33)
               Index Cond: (descendant_id = "ANY_subquery".descendant_id)
 Planning time: 0.290 ms
 Execution time: 11.246 ms
(11 rows)

This commit fix issue with OR in condition. If we have OR we can't use indexes -> fullscan.

We can run 2 fast queries, because we run them in transaction.

@zzet
Copy link
Contributor Author

zzet commented Oct 29, 2015

similar PR: #96

@seuros
Copy link
Member

seuros commented Oct 29, 2015

Awesome work @zzet .

@seuros
Copy link
Member

seuros commented Oct 29, 2015

I wonder why travis didn't pickup the PR.

ping @mceachen

@seuros seuros closed this Oct 29, 2015
@seuros seuros reopened this Oct 29, 2015
@zzet zzet force-pushed the improve_delete_performance branch from c8c6182 to 34ae69d Compare October 29, 2015 12:03
@zzet
Copy link
Contributor Author

zzet commented Oct 29, 2015

@seuros I updated commit:

explain analyze SELECT DISTINCT descendant_id
  FROM (SELECT descendant_id
    FROM "log_hierarchies"
    WHERE ancestor_id = 1022510 or descendant_id = 1022510
  ) AS x;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=53382.86..53382.88 rows=2 width=4) (actual time=1.008..1.011 rows=22 loops=1)
   Group Key: log_hierarchies.descendant_id
   ->  Bitmap Heap Scan on log_hierarchies  (cost=354.58..53347.42 rows=14179 width=4) (actual time=0.218..0.661 rows=1708 loops=1)
         Recheck Cond: ((ancestor_id = 1022510) OR (descendant_id = 1022510))
         Heap Blocks: exact=65
         ->  BitmapOr  (cost=354.58..354.58 rows=14179 width=0) (actual time=0.202..0.202 rows=0 loops=1)
               ->  Bitmap Index Scan on log_anc_desc_idx  (cost=0.00..86.72 rows=2420 width=0) (actual time=0.045..0.045 rows=56 loops=1)
                     Index Cond: (ancestor_id = 1022510)
               ->  Bitmap Index Scan on log_desc_idx  (cost=0.00..260.76 rows=11759 width=0) (actual time=0.152..0.152 rows=1687 loops=1)
                     Index Cond: (descendant_id = 1022510)
 Planning time: 0.181 ms
 Execution time: 1.067 ms
(12 rows)

At now - one query.

@zzet
Copy link
Contributor Author

zzet commented Oct 29, 2015

@mceachen mceachen changed the title Improve perfomance issue for DELETE command Improve performance issue for DELETE command Nov 2, 2015
@mceachen
Copy link
Collaborator

mceachen commented Nov 2, 2015

Thanks so much for the PR! Love the detail, especially the benchmark.
@martin-schmidt fixed the build which I just merged with master, and I'll cut 6.0.0 as soon as this PR builds green on master.

@mceachen mceachen merged commit 34ae69d into ClosureTree:master Nov 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants