Re: Fix for PL/Python slow input arrays traversal issue

Lists: pgsql-hackers
From: Alexey Grishchenko <agrishchenko(at)pivotal(dot)io>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexey Grishchenko <programmerag(at)gmail(dot)com>
Subject: Fix for PL/Python slow input arrays traversal issue
Date: 2016-07-28 12:55:30
Message-ID: CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3xq-2NR9jUfv3qwHA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi

Following issue exists with PL/Python: when your function takes array as
input parameters, processing arrays of fixed-size elements containing null
values is many times slower than processing same array without nulls. Here
is an example:

-- Function

create or replace function test(a int8[]) returns int8 as $BODY$
return sum([x for x in a if x is not None])
$BODY$ language plpythonu volatile;

pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,100000) as a
pl_regression(# ) as q;
test
------------
5000050000
(1 row)

Time: 22.248 ms
pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,100000) as a
pl_regression(# union all
pl_regression(# select null::int8 as a
pl_regression(# ) as q;
test
------------
5000050000
(1 row)

Time: 7179.921 ms

As you can see, single null in array introduces 320x slowdown. The reason
for this is following:
Original implementation uses array_ref for each element of the array. Each
call to array_ref causes subsequent call to array_seek. Function array_seek
in turn has a shortcut for fixed-size arrays with no nulls. But if your
array is not of fixed-size elements, or if it contains nulls, each call to
array_seek would cause calculation of the Kth element offset starting from
the first element. This is O(N^2) algorithm, resulting in high processing
time for arrays of non-fixed-size elements and arrays with nulls.

The fix I propose applies same logic used at array_out function for
efficient array traversal, keeping the pointer to the last fetched
element's offset, which results in dramatical performance improvement for
affected cases. With this implementation, both arrays of fixed-size
elements without nulls, fixed-size elements with nulls and variable-size
elements are processed with the same speed. Here is the test after this fix
is applied:

pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,100000) as a
pl_regression(# ) as q;
test
------------
5000050000
(1 row)

Time: 21.056 ms
pl_regression=# select test(array_agg(a)::int8[])
pl_regression-# from (
pl_regression(# select generate_series(1,100000) as a
pl_regression(# union all
pl_regression(# select null::int8 as a
pl_regression(# ) as q;
test
------------
5000050000
(1 row)

Time: 22.839 ms

--
Best regards,
Alexey Grishchenko

Attachment Content-Type Size
0001-Fix-for-PL-Python-slow-input-arrays-traversal-issue.patch application/octet-stream 3.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexey Grishchenko <programmerag(at)gmail(dot)com>
Subject: Re: Fix for PL/Python slow input arrays traversal issue
Date: 2016-09-10 05:44:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

This entry, should be closed, because this patch is part of another patch

The new status of this patch is: Waiting on Author


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexey Grishchenko <programmerag(at)gmail(dot)com>
Subject: Re: Fix for PL/Python slow input arrays traversal issue
Date: 2016-09-18 13:21:23
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Pavel,

I will pick these up.


From: Dave Cramer <davecramer(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexey Grishchenko <programmerag(at)gmail(dot)com>
Subject: Re: Fix for PL/Python slow input arrays traversal issue
Date: 2016-09-19 19:16:19
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Yes, this should be closed as it is contained in https://commitfest.postgresql.org/10/697/