Quick Links

Re: Applying logical replication changes by more than one process

Lists:	pgsql-hackers

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Applying logical replication changes by more than one process
Date:	2016-03-19 10:46:00
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I am trying to use logical replication mechanism in implementation of PostgreSQL multimaster and faced with one conceptual problem.
Originally logical replication was intended to support asynchronous replication. In this case applying changes by single process should not be a bottleneck.
But if we are using distributed transaction manager to provide global consistency, then applying transaction by one worker leads to very bad performance and what is worser: cause unintended serialization of transactions, which is not taken in account by
distributed deadlock detection algorithm and so can cause
undetected deadlocks.

So I have implemented pool of background workers which can apply transactions concurrently.
It works and shows acceptable performance. But now I am thinking about HA and tracking origin LSNs which are needed to correctly specify slot position in case of recovery. And there is a problem: as far as I understand to correctly record origin LSN in WAL
and advance slot position it is necessary to setup session
using replorigin_session_setup. It is not so convenient in case of using pool of background workers, because we have to setup session for each commit.
But the main problem is that for each slot session can be associated only with one process:

else if (curstate->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication identifier %d is already active for PID %d",
curstate->roident, curstate->acquired_by)));
}

Which once again means that there can be only one process applying changes.

To provide correct state of replication node it is necessary to enforce that each logical replication record is replayed exactly once: we should not loose some change or try to apply it twice. So operation of recording original LSN position in WAL and
adjusting slot should be atomic. And during recovery we should restore slot current position based on the origin values extracted from WAL. I wonder if it can be done using current logical replication mechanism when changes of each slot are applied by
more than one process? Or the only alternative is to write/read origin LSNs in WAL myself, for example using custom WAL records?

Thanks in advance!

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 12:10:08
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 19/03/16 11:46, Konstantin Knizhnik wrote:
> Hi,
>
> I am trying to use logical replication mechanism in implementation of
> PostgreSQL multimaster and faced with one conceptual problem.
> Originally logical replication was intended to support asynchronous
> replication. In this case applying changes by single process should not
> be a bottleneck.
> But if we are using distributed transaction manager to provide global
> consistency, then applying transaction by one worker leads to very bad
> performance and what is worser: cause unintended serialization of
> transactions, which is not taken in account by distributed deadlock
> detection algorithm and so can cause
> undetected deadlocks.
>
> So I have implemented pool of background workers which can apply
> transactions concurrently.
> It works and shows acceptable performance. But now I am thinking about
> HA and tracking origin LSNs which are needed to correctly specify slot
> position in case of recovery. And there is a problem: as far as I
> understand to correctly record origin LSN in WAL and advance slot
> position it is necessary to setup session
> using replorigin_session_setup. It is not so convenient in case of using
> pool of background workers, because we have to setup session for each
> commit.
> But the main problem is that for each slot session can be associated
> only with one process:
>
> else if (curstate->acquired_by != 0)
> {
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_IN_USE),
> errmsg("replication identifier %d is already active for
> PID %d",
> curstate->roident, curstate->acquired_by)));
> }
>
> Which once again means that there can be only one process applying changes.
>

That's not true, all it means is that you can do
replorigin_session_setup for same origin only in one process but you
don't need to have it setup for session to update it, the
replorigin_advance() works just fine.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 12:10:15
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 12:44:29
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21.03.2016 15:10, Petr Jelinek wrote:
> Hi,
>
> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>> Hi,
>>
>> I am trying to use logical replication mechanism in implementation of
>> PostgreSQL multimaster and faced with one conceptual problem.
>> Originally logical replication was intended to support asynchronous
>> replication. In this case applying changes by single process should not
>> be a bottleneck.
>> But if we are using distributed transaction manager to provide global
>> consistency, then applying transaction by one worker leads to very bad
>> performance and what is worser: cause unintended serialization of
>> transactions, which is not taken in account by distributed deadlock
>> detection algorithm and so can cause
>> undetected deadlocks.
>>
>> So I have implemented pool of background workers which can apply
>> transactions concurrently.
>> It works and shows acceptable performance. But now I am thinking about
>> HA and tracking origin LSNs which are needed to correctly specify slot
>> position in case of recovery. And there is a problem: as far as I
>> understand to correctly record origin LSN in WAL and advance slot
>> position it is necessary to setup session
>> using replorigin_session_setup. It is not so convenient in case of using
>> pool of background workers, because we have to setup session for each
>> commit.
>> But the main problem is that for each slot session can be associated
>> only with one process:
>>
>> else if (curstate->acquired_by != 0)
>> {
>> ereport(ERROR,
>> (errcode(ERRCODE_OBJECT_IN_USE),
>> errmsg("replication identifier %d is already active for
>> PID %d",
>> curstate->roident, curstate->acquired_by)));
>> }
>>
>> Which once again means that there can be only one process applying
>> changes.
>>
>
> That's not true, all it means is that you can do
> replorigin_session_setup for same origin only in one process but you
> don't need to have it setup for session to update it, the
> replorigin_advance() works just fine.

But RecordTransactionCommit is using replorigin_session_advance, not
replorigin_advance.
And replorigin_session_advance requires that session was setup:

void
replorigin_session_advance(XLogRecPtr remote_commit, XLogRecPtr
local_commit)
{
Assert(session_replication_state != NULL);
}

"session_replication_state" is private variable which is set by
replorigin_session_setup.
But attempt to call replorigin_session_setup from multiple process cause
above error.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 13:08:54
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21/03/16 13:44, Konstantin Knizhnik wrote:
>
>
> On 21.03.2016 15:10, Petr Jelinek wrote:
>> Hi,
>>
>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>> Hi,
>>>
>>> I am trying to use logical replication mechanism in implementation of
>>> PostgreSQL multimaster and faced with one conceptual problem.
>>> Originally logical replication was intended to support asynchronous
>>> replication. In this case applying changes by single process should not
>>> be a bottleneck.
>>> But if we are using distributed transaction manager to provide global
>>> consistency, then applying transaction by one worker leads to very bad
>>> performance and what is worser: cause unintended serialization of
>>> transactions, which is not taken in account by distributed deadlock
>>> detection algorithm and so can cause
>>> undetected deadlocks.
>>>
>>> So I have implemented pool of background workers which can apply
>>> transactions concurrently.
>>> It works and shows acceptable performance. But now I am thinking about
>>> HA and tracking origin LSNs which are needed to correctly specify slot
>>> position in case of recovery. And there is a problem: as far as I
>>> understand to correctly record origin LSN in WAL and advance slot
>>> position it is necessary to setup session
>>> using replorigin_session_setup. It is not so convenient in case of using
>>> pool of background workers, because we have to setup session for each
>>> commit.
>>> But the main problem is that for each slot session can be associated
>>> only with one process:
>>>
>>> else if (curstate->acquired_by != 0)
>>> {
>>> ereport(ERROR,
>>> (errcode(ERRCODE_OBJECT_IN_USE),
>>> errmsg("replication identifier %d is already active for
>>> PID %d",
>>> curstate->roident, curstate->acquired_by)));
>>> }
>>>
>>> Which once again means that there can be only one process applying
>>> changes.
>>>
>>
>> That's not true, all it means is that you can do
>> replorigin_session_setup for same origin only in one process but you
>> don't need to have it setup for session to update it, the
>> replorigin_advance() works just fine.
>
> But RecordTransactionCommit is using replorigin_session_advance, not
> replorigin_advance.

Only when the origin is actually setup for the current session. You need
to call the replorigin_advance yourself from your apply code.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>,Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>,PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 13:15:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On March 21, 2016 2:08:54 PM GMT+01:00, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>On 21/03/16 13:44, Konstantin Knizhnik wrote:
>>
>>
>> On 21.03.2016 15:10, Petr Jelinek wrote:
>>> Hi,
>>>
>>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>>> Hi,
>>>>
>>>> I am trying to use logical replication mechanism in implementation
>of
>>>> PostgreSQL multimaster and faced with one conceptual problem.
>>>> Originally logical replication was intended to support asynchronous
>>>> replication. In this case applying changes by single process should
>not
>>>> be a bottleneck.
>>>> But if we are using distributed transaction manager to provide
>global
>>>> consistency, then applying transaction by one worker leads to very
>bad
>>>> performance and what is worser: cause unintended serialization of
>>>> transactions, which is not taken in account by distributed deadlock
>>>> detection algorithm and so can cause
>>>> undetected deadlocks.
>>>>
>>>> So I have implemented pool of background workers which can apply
>>>> transactions concurrently.
>>>> It works and shows acceptable performance. But now I am thinking
>about
>>>> HA and tracking origin LSNs which are needed to correctly specify
>slot
>>>> position in case of recovery. And there is a problem: as far as I
>>>> understand to correctly record origin LSN in WAL and advance slot
>>>> position it is necessary to setup session
>>>> using replorigin_session_setup. It is not so convenient in case of
>using
>>>> pool of background workers, because we have to setup session for
>each
>>>> commit.
>>>> But the main problem is that for each slot session can be
>associated
>>>> only with one process:
>>>>
>>>> else if (curstate->acquired_by != 0)
>>>> {
>>>> ereport(ERROR,
>>>> (errcode(ERRCODE_OBJECT_IN_USE),
>>>> errmsg("replication identifier %d is already active
>for
>>>> PID %d",
>>>> curstate->roident, curstate->acquired_by)));
>>>> }
>>>>
>>>> Which once again means that there can be only one process applying
>>>> changes.
>>>>
>>>
>>> That's not true, all it means is that you can do
>>> replorigin_session_setup for same origin only in one process but you
>>> don't need to have it setup for session to update it, the
>>> replorigin_advance() works just fine.
>>
>> But RecordTransactionCommit is using replorigin_session_advance, not
>> replorigin_advance.
>
>Only when the origin is actually setup for the current session. You
>need
>to call the replorigin_advance yourself from your apply code.

That's problematic from a durability POV.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 13:18:27
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21/03/16 14:15, Andres Freund wrote:
>
>
> On March 21, 2016 2:08:54 PM GMT+01:00, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> On 21/03/16 13:44, Konstantin Knizhnik wrote:
>>>
>>>
>>> On 21.03.2016 15:10, Petr Jelinek wrote:
>>>> Hi,
>>>>
>>>> On 19/03/16 11:46, Konstantin Knizhnik wrote:
>>>>> Hi,
>>>>>
>>>>> I am trying to use logical replication mechanism in implementation
>> of
>>>>> PostgreSQL multimaster and faced with one conceptual problem.
>>>>> Originally logical replication was intended to support asynchronous
>>>>> replication. In this case applying changes by single process should
>> not
>>>>> be a bottleneck.
>>>>> But if we are using distributed transaction manager to provide
>> global
>>>>> consistency, then applying transaction by one worker leads to very
>> bad
>>>>> performance and what is worser: cause unintended serialization of
>>>>> transactions, which is not taken in account by distributed deadlock
>>>>> detection algorithm and so can cause
>>>>> undetected deadlocks.
>>>>>
>>>>> So I have implemented pool of background workers which can apply
>>>>> transactions concurrently.
>>>>> It works and shows acceptable performance. But now I am thinking
>> about
>>>>> HA and tracking origin LSNs which are needed to correctly specify
>> slot
>>>>> position in case of recovery. And there is a problem: as far as I
>>>>> understand to correctly record origin LSN in WAL and advance slot
>>>>> position it is necessary to setup session
>>>>> using replorigin_session_setup. It is not so convenient in case of
>> using
>>>>> pool of background workers, because we have to setup session for
>> each
>>>>> commit.
>>>>> But the main problem is that for each slot session can be
>> associated
>>>>> only with one process:
>>>>>
>>>>> else if (curstate->acquired_by != 0)
>>>>> {
>>>>> ereport(ERROR,
>>>>> (errcode(ERRCODE_OBJECT_IN_USE),
>>>>> errmsg("replication identifier %d is already active
>> for
>>>>> PID %d",
>>>>> curstate->roident, curstate->acquired_by)));
>>>>> }
>>>>>
>>>>> Which once again means that there can be only one process applying
>>>>> changes.
>>>>>
>>>>
>>>> That's not true, all it means is that you can do
>>>> replorigin_session_setup for same origin only in one process but you
>>>> don't need to have it setup for session to update it, the
>>>> replorigin_advance() works just fine.
>>>
>>> But RecordTransactionCommit is using replorigin_session_advance, not
>>> replorigin_advance.
>>
>> Only when the origin is actually setup for the current session. You
>> need
>> to call the replorigin_advance yourself from your apply code.
>
> That's problematic from a durability POV.
>

Huh? How come?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 13:25:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
> On 21/03/16 14:15, Andres Freund wrote:
> >>Only when the origin is actually setup for the current session. You
> >>need
> >>to call the replorigin_advance yourself from your apply code.
> >
> >That's problematic from a durability POV.
> >
>
> Huh? How come?

If you use the session mechanism the replication progress is synced with
the apply process, even if there are crashes. Crash recovery updates the
progress. There's no such interlock with apply otherwise, and I don't
see how you can build one with reasonable effort.

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-21 13:30:19
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21/03/16 14:25, Andres Freund wrote:
> On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
>> On 21/03/16 14:15, Andres Freund wrote:
>>>> Only when the origin is actually setup for the current session. You
>>>> need
>>>> to call the replorigin_advance yourself from your apply code.
>>>
>>> That's problematic from a durability POV.
>>>
>>
>> Huh? How come?
>
> If you use the session mechanism the replication progress is synced with
> the apply process, even if there are crashes. Crash recovery updates the
> progress. There's no such interlock with apply otherwise, and I don't
> see how you can build one with reasonable effort.
>

Ah you mean because with wal_log=true the origin advance is in different
WAL record than commit? OK yeah you might be one transaction behind
then, true.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-22 06:32:22
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mar 21, 2016, at 4:30 PM, Petr Jelinek wrote:

> On 21/03/16 14:25, Andres Freund wrote:
>> On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
>>> On 21/03/16 14:15, Andres Freund wrote:
>>>>> Only when the origin is actually setup for the current session. You
>>>>> need
>>>>> to call the replorigin_advance yourself from your apply code.
>>>>
>>>> That's problematic from a durability POV.
>>>>
>>>
>>> Huh? How come?
>>
>> If you use the session mechanism the replication progress is synced with
>> the apply process, even if there are crashes. Crash recovery updates the
>> progress. There's no such interlock with apply otherwise, and I don't
>> see how you can build one with reasonable effort.
>>
>
> Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.
If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).
Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?
I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.
Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.
It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.
But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.
I didn't think much about such configuration - may be it possible to propose more efficient mechanism for replication in this case.

>
> --
> Petr Jelinek http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-22 07:10:21
Message-ID:	CAMsr+YFEnKi0mSrskFt7QHgwQzDGDKMi6frO693PEPPmWdozdA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22 March 2016 at 14:32, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
wrote:

> Ah you mean because with wal_log=true the origin advance is in different
> WAL record than commit? OK yeah you might be one transaction behind then,
> true.
>
>
> It actually means that we can not enforce database consistency. If we do replorigin_advance
> before commit and then crash happen, then we will loose some changes.
> If we call replorigin_advance after commit but crash happen before, then
> some changes can be applied multiple times. For example we can insert some
> record twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for
> each commit and use locking to prevent concurrent session setup for the
> same slot by multiple process, doesn't it?
>

Yes.

How would you expect it to work if you attempted to replorigin_advance
without a session? From multiple concurrent backends?

Parallel apply is complicated business. You have to make sure you apply
xacts in an order that's free from deadlocks and from insert/delete
anomalies - though you can at least detect those, ERROR that xact and all
subsequent ones, and retry. For progress tracking to be consistent and
correct you'd have to make sure you committed strictly in the same order as
upstream. Just before each commit you can set the origin LSN and advance
the replication origin, which will commit atomically along with the commit
it confirms. I don't really see the problem.

> I have tried it, fortunately it doesn't cause any noticeable performance
> degradation. But unfortunately can't consider such approach as elegant.
> Why it is actually necessary to bind replication slot to process? Why it
> is not possible to have multiple concurrent sessions for the same slot?
>

Especially since most slot changes LWLock- and/or spinlock-protected
already.

The client would have to manage replay confirmations appropriately so that
it doesn't confirm past the point where some other connection still needs
it.

We'd have to expose a "slot" column in pg_stat_replication and remove the
"pid" column from pg_replication_slots to handle the 1:n relationship
between slot clients and slots, and it'd be a pain to show which normal
user backends were using a slot. Not really sure how to handle that.

To actually make this useful would require a lot more though. A way to
request that replay start from a new LSN without a full
disconnect/reconnect each time. Client-side parallel consume/apply.
Inter-transaction ordering information so the client can work out a viable
xact apply order (possibly using SSI information per the discussion with
Kevin?). Etc.

I haven't really looked into this and I suspect there are some hairy areas
involved in replaying a slot from more than one client. The reason I'm
interested in it personally is for initial replica state setup as Oleksandr
prototyped and described earlier. We could attach to the slot's initial
snapshot then issue a new replication command that, given a table name or
oid, scans the table from the snapshot and passes each tuple to a new
callback (like, but not the same as, the insert callback) on the output
plugin.

That way clients could parallel-copy the initial state of the DB across the
same replication protocol they then consume new changes from, with no need
to make normal libpq connections and COPY initial state.

I'm interested in being able to do parallel receive of new changes from the
slot too, but suspect that'd be a bunch harder.

> Also I concern about using sequential search for slot location
> in replorigin_session_setup and many other functions - there is loop
> through all max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For
> multimaster this assumption is true - even Oracle RAC rarely has two-digit
> number of nodes.
> But if we want to perform sharding and use logical replication for
> providing redundancy, then number of nodes and slots can be essentially
> larger.
>

Sounds like premature optimisation. Deal with it if it comes up in profiles
in scale testing with 100 clients. I'll be surprised if it does.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-22 08:14:38
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/03/16 07:32, konstantin knizhnik wrote:
>
> On Mar 21, 2016, at 4:30 PM, Petr Jelinek wrote:
>
>> On 21/03/16 14:25, Andres Freund wrote:
>>> On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
>>>> On 21/03/16 14:15, Andres Freund wrote:
>>>>>> Only when the origin is actually setup for the current session. You
>>>>>> need
>>>>>> to call the replorigin_advance yourself from your apply code.
>>>>>
>>>>> That's problematic from a durability POV.
>>>>>
>>>>
>>>> Huh? How come?
>>>
>>> If you use the session mechanism the replication progress is synced with
>>> the apply process, even if there are crashes. Crash recovery updates the
>>> progress. There's no such interlock with apply otherwise, and I don't
>>> see how you can build one with reasonable effort.
>>>
>>
>> Ah you mean because with wal_log=true the origin advance is in
>> different WAL record than commit? OK yeah you might be one transaction
>> behind then, true.
>
> It actually means that we can not enforce database consistency. If we do
> replorigin_advance before commit and then crash happen, then we will
> loose some changes.
> If we call replorigin_advance after commit but crash happen before, then
> some changes can be applied multiple times. For example we can insert
> some record twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for
> each commit and use locking to prevent concurrent session setup for the
> same slot by multiple process, doesn't it?

You can do that, or you can move the tracking to the receiving process
and spill the data to the disk (hurts IO obviously), or save the
progress to table (also hurts IO), or write patch which solves this (no
idea how though).

>
> Also I concern about using sequential search for slot location
> in replorigin_session_setup and many other functions - there is loop
> through all max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For
> multimaster this assumption is true - even Oracle RAC rarely has
> two-digit number of nodes.
> But if we want to perform sharding and use logical replication for
> providing redundancy, then number of nodes and slots can be essentially
> larger.
> I didn't think much about such configuration - may be it possible to
> propose more efficient mechanism for replication in this case.
>

And each slot means connection with logical decoding attached to it so
you don't really want to have thousands of those anyway. I think you'll
hit other problems faster than loop over slots becomes problem if you
plan to keep all of them active.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-22 08:21:53
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mar 22, 2016, at 10:10 AM, Craig Ringer wrote:

> On 22 March 2016 at 14:32, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>> Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.
>
> It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.
> If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).
> Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?
>
> Yes.
>
> How would you expect it to work if you attempted to replorigin_advance without a session? From multiple concurrent backends?

I would not work. But I wonder why I would need to call replorigin_advance without a session.
Please excuse me, I am not thinking now about the general case of using logical replication, I just focused on multimaster.
What I need is some efficient, durable and atomic mechanism for applying changes.
I see only two ways to provide atomicity:
1. Tracking of origins should be done inside xact as part of normal commit.
2. Use custom WAL records.

1) is easier now and it really works if I correctly synchronize access to slots. And surprisingly it even doesn't add substantial overhead.

>
> Parallel apply is complicated business. You have to make sure you apply xacts in an order that's free from deadlocks and from insert/delete anomalies - though you can at least detect those, ERROR that xact and all subsequent ones, and retry.

Well, this is exactly what our multimaster does. We do not try to enforce order of applying xacts. But we detect global deadlocks and use 2PC to provide data consistency.
So it is not task of logical replication, it is done by DTM overriding visibility checks and transaction commit protocol using XTM.

> For progress tracking to be consistent and correct you'd have to make sure you committed strictly in the same order as upstream. Just before each commit you can set the origin LSN and advance the replication origin, which will commit atomically along with the commit it confirms. I don't really see the problem.

Sorry, I do not completely understand you. What you mean by "will commit atomically along with the commit it confirms"? How this atomicity will be enforced?

>
> I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.
> Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?
>
> Especially since most slot changes LWLock- and/or spinlock-protected already.
>
> The client would have to manage replay confirmations appropriately so that it doesn't confirm past the point where some other connection still needs it.
>
> We'd have to expose a "slot" column in pg_stat_replication and remove the "pid" column from pg_replication_slots to handle the 1:n relationship between slot clients and slots, and it'd be a pain to show which normal user backends were using a slot. Not really sure how to handle that.
>
> To actually make this useful would require a lot more though. A way to request that replay start from a new LSN without a full disconnect/reconnect each time. Client-side parallel consume/apply. Inter-transaction ordering information so the client can work out a viable xact apply order (possibly using SSI information per the discussion with Kevin?). Etc.
>
> I haven't really looked into this and I suspect there are some hairy areas involved in replaying a slot from more than one client. The reason I'm interested in it personally is for initial replica state setup as Oleksandr prototyped and described earlier. We could attach to the slot's initial snapshot then issue a new replication command that, given a table name or oid, scans the table from the snapshot and passes each tuple to a new callback (like, but not the same as, the insert callback) on the output plugin.
>
> That way clients could parallel-copy the initial state of the DB across the same replication protocol they then consume new changes from, with no need to make normal libpq connections and COPY initial state.
>
> I'm interested in being able to do parallel receive of new changes from the slot too, but suspect that'd be a bunch harder.
>
>
> Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.
> It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.
> But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.
>
> Sounds like premature optimisation. Deal with it if it comes up in profiles in scale testing with 100 clients. I'll be surprised if it does.
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

From:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Applying logical replication changes by more than one process
Date:	2016-03-22 08:32:16
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mar 22, 2016, at 11:14 AM, Petr Jelinek wrote:
>
> And each slot means connection with logical decoding attached to it so you don't really want to have thousands of those anyway. I think you'll hit other problems faster than loop over slots becomes problem if you plan to keep all of them active.

Assume that cluster have thousands of nodes and we use sharding to scatter data through cluster nodes.
But to provide HA we want to perform sharding with some level of redundancy, for example save the same record at 3 different nodes.
Once possible approach (pg_shard) is to execute the same query at three different shards.
But there is no warranty that result of execution will be the same at all nodes.
Alternative approach is to execute transaction at one node and then replicate it using logical replication to replicas.
So we do not perform logical replication to all 1000 nodes. Just to 2 of them. But each time it will be different pair of nodes. So we still need to have 1000 active replication slots.

May be logical replication can not be used at all in such scenario - I have not thought much about it yet. Our first step will be multimaster without sharding.

>
> --
> Petr Jelinek http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers