Quick Links

Re: pglogical - logical replication contrib module

Lists:	pgsql-hackers

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	pglogical - logical replication contrib module
Date:	2015-12-31 23:34:30
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I'd like to submit the replication solution which is based on the
pglogical_output [1] module (which is obviously needed for this to compile).

The pglogical contrib module provides extension which does the
master-slave logical replication based on the logical decoding.

The basic documentation is in README.md, I didn't bother making sgml
docs yet since I expect that there will be ongoing changes happening and
it's easier for me to update the markdown docs than sgml. I will do the
conversion once we start approaching committable state.

What it implements
- logical replication
- partial replication (replication sets)
- multiple sources for single subscriber
- origin filtering (so that if replication is setup both ways, there is
no cyclic replication)

It currently doesn't do multi-master or automatic DDL. I think DDL
should be relatively easy if somebody finishes the deparse extension as
the infrastructure for replicating arbitrary commands is present in this
patch.

It's rather large patch so I will just go very briefly over high level
overview of how it works, the details need to be discussed separately IMHO:
Catalogs:
- node - stores information about "nodes" (postgresql databases)
- node_interface - represents connection string(s) to nodes, we
separate interfaces to different catalog mainly to allow for setups
where different subscribers see different address of the provider server
- local_node - stores exactly one tuple which points to the nodes
catalog tuple that represents the local node of the current database
- subscription - represents subscription between two nodes, it
includes configuration of the subscription like replication set and
origin filters

Upstream:
- basically just implements the pglogical_output hooks according to
the catalogs

Downstream:
- several background workers
- supervisor is worker which manages all the other workers
- manager is per database worker which manages individual database
(node) and its subscriptions
- apply does the actual replication, one apply process is started per
subscription, connects to the walsender on the other side and applies
the changes received from there

[1] https://commitfest.postgresql.org/8/418/

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-pglogical-v1.patch.gz	application/gzip	93.9 KB

From:	"Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-04 16:46:30
Message-ID:	CACACo5T1s0Asr23ZXKONPcR+sT+yFztzAVSBqUW=22HhhOFpLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 1, 2016 at 12:34 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> Hi,
>
> I'd like to submit the replication solution which is based on the
> pglogical_output [1] module (which is obviously needed for this to compile).
>

Hi,

Impressive stuff!

Apparently this depends on a newer, yet-to-be-published version of the
pglogical_output patch:

.../contrib/pglogical/pglogical_hooks.c: In function
‘pglogical_row_filter_hook’:
.../contrib/pglogical/pglogical_hooks.c:173:35: error: ‘struct
PGLogicalRowFilterArgs’ has no member named ‘change’
HeapTuple tup = &rowfilter_args->change->data.tp.newtuple->tuple;
^

It currently doesn't do multi-master or automatic DDL. I think DDL should
> be relatively easy if somebody finishes the deparse extension as the
> infrastructure for replicating arbitrary commands is present in this patch.
>

I wish could find the time to get back to this patch. I didn't check it in
quite a while...

+PGconn *
+pglogical_connect(const char *connstring, const char *connname)
+{
+ PGconn *conn;
+ StringInfoData dsn;
+
+ initStringInfo(&dsn);
+ appendStringInfo(&dsn,
+ "%s fallback_application_name='%s'",
+ connstring, connname);
+
+ conn = PQconnectdb(dsn.data);

This is prone to errors when connstring is specified in URI format. A
workaround is provided in this commit for
walreceiver: b3fc6727ce54a16ae9227bcccfebfa028ac5b16f

--
Alex

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	"Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-07 08:28:17
Message-ID:	CAMsr+YH-=kpPBEHoMsgS2nGtyWpXhSb=kv4dzK5vpUHdL-3Z+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 January 2016 at 00:46, Shulgin, Oleksandr <oleksandr(dot)shulgin(at)zalando(dot)de
> wrote:

> On Fri, Jan 1, 2016 at 12:34 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com>
> wrote:
>
>> Hi,
>>
>> I'd like to submit the replication solution which is based on the
>> pglogical_output [1] module (which is obviously needed for this to compile).
>>
>
> Hi,
>
> Impressive stuff!
>
> Apparently this depends on a newer, yet-to-be-published version of the
> pglogical_output patch:
>
> .../contrib/pglogical/pglogical_hooks.c: In function
> ‘pglogical_row_filter_hook’:
> .../contrib/pglogical/pglogical_hooks.c:173:35: error: ‘struct
> PGLogicalRowFilterArgs’ has no member named ‘change’
> HeapTuple tup = &rowfilter_args->change->data.tp.newtuple->tuple;
>

Good point. Looks like I forgot to push. Done now:

https://github.com/2ndquadrant/postgres/tree/dev/pglogical-output

https://github.com/2ndQuadrant/postgres/tree/pglogical-output-v5

I hadn't posted the rev to -hackers yet because I still have to finish
SGMLifying the docs before it can be a real candidate for inclusion. The
current docs have drifted a little as a result of that WIP. I'm not really
working for another week and a half though, so I might as well post the
current status as-is.

Still have to finish the docs conversion but that's the only remaining open
item.

Note that this patch has 9.4 support. I'd be pretty happy to be able to
retain that, mostly to avoid the need to carry a backported version as a
separately packaged extension, but I'm not sure what the general opinion
will be on that.

> +PGconn *
> +pglogical_connect(const char *connstring, const char *connname)
> +{
> + PGconn *conn;
> + StringInfoData dsn;
> +
> + initStringInfo(&dsn);
> + appendStringInfo(&dsn,
> + "%s fallback_application_name='%s'",
> + connstring, connname);
> +
> + conn = PQconnectdb(dsn.data);
>
> This is prone to errors when connstring is specified in URI format. A
> workaround is provided in this commit for
> walreceiver: b3fc6727ce54a16ae9227bcccfebfa028ac5b16f
>
>
Thanks for the heads-up there.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
pglogical-v5.patch	text/x-patch	251.2 KB

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	"Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-07 08:45:30
Message-ID:	CAMsr+YEwsyGpkSNbu9JUO3MgH1Aa6C5g40Se+z+NfqCjWFabSA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

... and this is why we don't post while jetlagged and tired.

The patch on the prior mail is the output plugin. Wrong thread, wrong
filename. It's the output plugin update needed for the pglogical downstream
in this thread.

Corrected post of v5 output plugin here:

http://www.postgresql.org/message-id/CAMsr+YEGtE8gYnpAo7=n=iMS9OLcc8oeMvMrh+9ki9WB5CykGA@mail.gmail.com

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-09 02:43:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

make check gives me

for extra in ../../contrib/pglogical_output contrib/pglogical; do make
-C '../..'/$extra DESTDIR='/usr/local/src/postgresql'/tmp_install
install >>'/usr/local/src/postgresql'/tmp_install/log/install.log ||
exit; done
make[1]: *** ../../../../contrib/pglogical_output: No such file or
directory. Stop.
../../src/Makefile.global:325: recipe for target 'temp-install' failed
make: *** [temp-install] Error 2
ssinger(at)ssinger-laptop:/usr/local/src/postgresql/contrib/pglogical$

The attached patch fixes that but it then is creating the test database
'contrib_regression' not 'regression'
changing pglogical.provider_dsn = 'contrib_regression' still leaves me
with a lot of failures.

Attachment	Content-Type	Size
makefile.diff	text/x-diff	856 bytes

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-09 18:30:45
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/31/2015 06:34 PM, Petr Jelinek wrote:
> Hi,
>
> I'd like to submit the replication solution which is based on the
> pglogical_output [1] module (which is obviously needed for this to
> compile).
>
> The pglogical contrib module provides extension which does the
> master-slave logical replication based on the logical decoding.
>
> The basic documentation is in README.md, I didn't bother making sgml
> docs yet since I expect that there will be ongoing changes happening
> and it's easier for me to update the markdown docs than sgml. I will
> do the conversion once we start approaching committable state.
I am going to send my comments/issues out in batches as I find them
instead of waiting till I look over everything.

I find this part of the documentation a bit unclear

+Once the provider node is setup, subscribers can be subscribed to it.
First the
+subscriber node must be created:
+
+ SELECT pglogical.create_node(
+ node_name := 'subscriber1',
+ dsn := 'host=thishost port=5432 dbname=db'
+ );
+

My initial reading was that I should execute this on the provider node.
Perhaps instead
-----------------
Once the provider node is setup you can then create subscriber nodes.
Create the subscriber nodes and
then execute the following commands on each subscriber node

create extension pglogical

select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
dbname=db port=5432');

-------------------

Also the documentation for create_subscription talks about

+ - `synchronize_structure` - specifies if to synchronize structure from
+ provider to the subscriber, default true

I did the following

test2=# select pglogical.create_subscription(subscription_name:='default
sub',provider_dsn:='host=localhost dbname=test1 port=5436');
create_subscription
---------------------
247109879

Which then resulted in the following showing up in my PG log

LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited
with exit code 1
ERROR: replication slot name "pgl_test2_test1_default sub" contains
invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL: could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_default sub" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR: replication slot name
"pgl_test2_test1_default sub" contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.

The create_subscription command should check if the subscription name is
valid (meets the rules that will be applied against the slot command).

I wondered how I could fix my mistake.

The docs say

+- `pglogical.pglogical_drop_subscription(subscription_name name,
ifexists bool)`
+ Disconnects the subscription and removes it from the catalog.
+

test2=# select pglogical.pglogical_drop_subscription('default sub', true);
ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean)
does not exist

The command is actually called pglogical.drop_subscription the docs
should be fixed to show the actual command name

I then wanted to add a second table to my database. ('b').

select pglogical.replication_set_add_table('default','public.b',true);
replication_set_add_table
---------------------------
t
(1 row)

In my pglog I then got

LOG: starting sync of table public.b for subscriber defaultsub
ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b"
contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL: could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR: replication slot name
"pgl_test2_test1_defaultsub_public.b" contains invalid character
HINT: Replication slot names may only contain lower case letters,
numbers, and the underscore character.

I then did

test1=# select pglogical.replication_set_remove_table('default','public.b');
replication_set_remove_table
------------------------------
t
(1 row)

but my log still keep repeating the error, so I tried connecting to the
replica and did the same

test2=# select pglogical.replication_set_remove_table('default','public.b');
ERROR: replication set mapping -303842815:16726 not found

Is there any way to recover from this situation?

The documenation says I can drop a replication set, maybe that will let
replication continue.

+- `pglogical.delete_replication_set(set_name text)`
+ Removes the replication set.
+

select pglogical.delete_replication_set('default');
ERROR: function pglogical.delete_replication_set(unknown) does not exist
LINE 1: select pglogical.delete_replication_set('default');
^
HINT: No function matches the given name and argument types. You might
need to add explicit type casts.

The function is actually pglogical.drop_replication_set , the docs
should be updated.
(note that didn't fix my problem either but then dropping the
subscription did seem to work).

I then re-added the default set to the origin and resubscribed my replica

test2=# select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test1 port=5436');
create_subscription
---------------------
2974019075

I then saw a bunch of
LOG: worker process: pglogical apply 16542:2974019075 (PID 26778)
exited with exit code 1
ERROR: subscriber defaultsub initialization failed during
nonrecoverable step (s), please try the setup again
LOG: worker process: pglogical apply 16542:2974019075 (PID 26779)
exited with exit code 1

in the log but then those stopped and I see

test2=# select pglogical.show_subscription_status();
show_subscription_status

--------------------------------------------------------------------------------
--------------------------------------------------
(defaultsub,down,test1,"host=localhost dbname=test1
port=5436",pgl_test2_test1_
defaultsub,"{default,default_insert_only}",{all})
(1 row)

I'm not really sure what to do to 'recover' my cluster at this point so
I'll send this off and rebuild my cluster and start over.

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-10 19:57:12
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/09/2016 01:30 PM, Steve Singer wrote:
> On 12/31/2015 06:34 PM, Petr Jelinek wrote:
>

> I'm not really sure what to do to 'recover' my cluster at this point
> so I'll send this off and rebuild my cluster and start over.
>
>

I had a setup test1--->test2 (with 2 tables in the default set)

I then created a third database (all three hosted on the same PG cluster)

In the third database (test3)
test3=# create extension pglogical;
CREATE EXTENSION
test3=# select pglogical.create_node(node_name:='test3',
dsn:='host=localhost dbname=test3 port=5436');
create_node
-------------
2001662995
(1 row)

test3=# select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test2 port=5436');
create_subscription
---------------------
2974019075

It copied the schema over but not the data (if I use test2 as the
provider_dsn then it does copy the data).

I then tried inserting a row into a table on test1. Things crashed and
after crash recovery I keep getting

2016-01-10 13:03:15 EST LOG: database system is ready to accept connections
2016-01-10 13:03:15 EST LOG: autovacuum launcher started
2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot
"pgl_test3
_test2_defaultsub"
2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing
after 0/
18292D8, reading WAL from 0/18292D8
2016-01-10 13:03:15 EST test2LOG: logical decoding found consistent
point at 0/
18292D8
2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using
saved sn
apshot.
TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c",
Line: 1127)
2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection
2016-01-10 13:03:15 EST LOG: worker process: pglogical apply
17016:2974019075 (
PID 24746) was terminated by signal 6: Aborted

The stack trace is

#3 0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
v2=v2(at)entry=0, v3=v3(at)entry=0, v4=v4(at)entry=0) at catcache.c:1127
#4 0x00000000007c503e in SearchSysCache (cacheId=cacheId(at)entry=47,
key1=<optimized out>, key2=key2(at)entry=0, key3=key3(at)entry=0,
key4=key4(at)entry=0) at syscache.c:981
#5 0x00000000006996d4 in replorigin_by_name (
roname=0xe51f30 "pgl_test2_test1_defaultsub",
missing_ok=missing_ok(at)entry=0 '\000') at origin.c:216
#6 0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
at pglogical_apply.c:235
#7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
#8 apply_work (streamConn=streamConn(at)entry=0xe84fb0) at
pglogical_apply.c:1309
#9 0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>)
at pglogical_apply.c:1691
#10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726
---Type <return> to continue, or q <return> to quit---
#11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at
postmaster.c:5501
#12 maybe_start_bgworker () at postmaster.c:5676
#13 0x0000000000680206 in sigusr1_handler
(postgres_signal_arg=<optimized out>)
at postmaster.c:4937
#14 <signal handler called>
#15 0x00007fdb54fa2293 in __select_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#16 0x0000000000468285 in ServerLoop () at postmaster.c:1648
#17 0x000000000068161e in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0xddede0) at postmaster.c:1292
#18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223

I tried dropping the subscription and re-adding it. I keep getting

2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent
point at 0/1830080
2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions.
2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot:
"000004DE-1" with 0 transaction IDs
2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists
2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a (
a integer NOT NULL,
b integer
);

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a
ssinger
pg_restore: [archiver (db)] could not execute query: ERROR: relation "a"
already exists
Command was: CREATE TABLE a (
a integer NOT NULL,
b integer
);

2016-01-10 13:21:48 EST ERROR: could not execute command
"/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data"
--exit-on-error -1 -d "host=localhost dbname=test3 port=5436"
"/tmp/pglogical-28079.dump""
2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection
with an open transaction
2016-01-10 13:21:48 EST LOG: worker process: pglogical apply
17016:844915593 (PID 28079) exited with exit code 1
2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization
failed during nonrecoverable step (s), please try the setup again

Which is probably also the cause of the error I reported yesterday (that
I tried creating a subscription without dropping the tables).
From a usability point of view I think we need a way of making this
errors available in the output of pglogical.show_subscription_status().

I asked to subscribe something through psql, even thought it is
asynchronous, if the async operation fails I should be able to learn
about the problem through psql. If I am writing a script to subscribe a
node it needs a way in my script of checking if the subscription has
failed and reporting the error.
My subscription script might not have easy access to the server log.

+- `pglogical.show_subscription_table(subscription_name name,
+ relation regclass)`
+ Shows synchronization status of a table.
+
+ Parameters:
+ - `subscription_name` - name of the existing subscription
+ - `relation` - name of existing table, optionally qualified
+

It isn't clear from the documentation what the output of this function
means, nor could I tell looking at it. Is this function just supposed
to tell us if a table is part of the replication set or if it is
'up-to-date'. It still reports 'synchornized' when a table is behind.

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-15 17:01:14
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-01-09 19:30, Steve Singer wrote:\
> I am going to send my comments/issues out in batches as I find them
> instead of waiting till I look over everything.
>

Thanks for looking at this! Yes going in batches/steps makes sense, this
is huge patch.

>
> I find this part of the documentation a bit unclear
>
>
> +Once the provider node is setup, subscribers can be subscribed to it.
> First the
> +subscriber node must be created:
> +
> + SELECT pglogical.create_node(
> + node_name := 'subscriber1',
> + dsn := 'host=thishost port=5432 dbname=db'
> + );
> +
>
> My initial reading was that I should execute this on the provider node.
> Perhaps instead
> -----------------
> Once the provider node is setup you can then create subscriber nodes.
> Create the subscriber nodes and
> then execute the following commands on each subscriber node
>
> create extension pglogical
>
> select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
> dbname=db port=5432');
>
> -------------------

Makes sense I guess, this is probably relic of how this internally
evolved (we used to have providers and subscribers before we merged them
into nodes).

>
> Also the documentation for create_subscription talks about
>
> + - `synchronize_structure` - specifies if to synchronize structure from
> + provider to the subscriber, default true
>

Not sure what's your comment on this.

>
>
> I did the following
>
> test2=# select pglogical.create_subscription(subscription_name:='default
> sub',provider_dsn:='host=localhost dbname=test1 port=5436');
> create_subscription
> ---------------------
> 247109879
>
>
> Which then resulted in the following showing up in my PG log
>
> LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited
> with exit code 1
> ERROR: replication slot name "pgl_test2_test1_default sub" contains
> invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL: could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_default sub" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR: replication slot name
> "pgl_test2_test1_default sub" contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>
>
> The create_subscription command should check if the subscription name is
> valid (meets the rules that will be applied against the slot command).
>

Yes, fixed. Also added some other sensitization code since we also use
dbname in slot name and that can contain whatever.

> I wondered how I could fix my mistake.
>
> The docs say
>
> +- `pglogical.pglogical_drop_subscription(subscription_name name,
> ifexists bool)`
> + Disconnects the subscription and removes it from the catalog.
> +
>
> test2=# select pglogical.pglogical_drop_subscription('default sub', true);
> ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean)
> does not exist
>
>
> The command is actually called pglogical.drop_subscription the docs
> should be fixed to show the actual command name
>

Yep, got this from other people as well, fixed.

>
> I then wanted to add a second table to my database. ('b').
>
> select pglogical.replication_set_add_table('default','public.b',true);
> replication_set_add_table
> ---------------------------
> t
> (1 row)
>
> In my pglog I then got
>
> LOG: starting sync of table public.b for subscriber defaultsub
> ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b"
> contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
> FATAL: could not send replication command "CREATE_REPLICATION_SLOT
> "pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
> PGRES_FATAL_ERROR: ERROR: replication slot name
> "pgl_test2_test1_defaultsub_public.b" contains invalid character
> HINT: Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>

Right, needed the sensitization as well (I am actually using the hash
now as there is only 8 chars left anyway).

>
> I then did
>
> test1=# select
> pglogical.replication_set_remove_table('default','public.b');
> replication_set_remove_table
> ------------------------------
> t
> (1 row)
>
>
> but my log still keep repeating the error, so I tried connecting to the
> replica and did the same
>
> test2=# select
> pglogical.replication_set_remove_table('default','public.b');
> ERROR: replication set mapping -303842815:16726 not found
>
> Is there any way to recover from this situation?
>

Not really, there is no api yet to remove table from synchronization
process so you'd have to manually delete row from
pglogical.local_sync_status on subscriber, kill the sync process and
remove the slot. I will think about what would be good api to solve this.

> The documenation says I can drop a replication set, maybe that will let
> replication continue.
>
> +- `pglogical.delete_replication_set(set_name text)`
> + Removes the replication set.
> +
>
> select pglogical.delete_replication_set('default');
> ERROR: function pglogical.delete_replication_set(unknown) does not exist
> LINE 1: select pglogical.delete_replication_set('default');
> ^
> HINT: No function matches the given name and argument types. You might
> need to add explicit type casts.
>
> The function is actually pglogical.drop_replication_set , the docs
> should be updated.
> (note that didn't fix my problem either but then dropping the
> subscription did seem to work).

Yeah it doesn't as the problem is on subscriber, replication sets only
affect provider. And fixed the docs.

>
>
>
> I then re-added the default set to the origin and resubscribed my replica
>
>
>
> test2=# select
> pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
> dbname=test1 port=5436');
> create_subscription
> ---------------------
> 2974019075
>
>
> I then saw a bunch of
> LOG: worker process: pglogical apply 16542:2974019075 (PID 26778)
> exited with exit code 1
> ERROR: subscriber defaultsub initialization failed during
> nonrecoverable step (s), please try the setup again
> LOG: worker process: pglogical apply 16542:2974019075 (PID 26779)
> exited with exit code 1
>
> in the log but then those stopped and I see
>
> test2=# select pglogical.show_subscription_status();
> show_subscription_status
>
> --------------------------------------------------------------------------------
>
> --------------------------------------------------
> (defaultsub,down,test1,"host=localhost dbname=test1
> port=5436",pgl_test2_test1_
> defaultsub,"{default,default_insert_only}",{all})
> (1 row)
>
>
> I'm not really sure what to do to 'recover' my cluster at this point so
> I'll send this off and rebuild my cluster and start over.
>

I think the problem here is that you resubscribed with
syncrhonize_structure := true while the conflicting structure already
existed, that option only works correctly when there is no conflicting
structure (we don't try to make diffs or anything, just dump/restore).
Recovering should be drop the uninitialized subscription and create new
one where you don't synchronize structure.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-15 17:07:46
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-01-10 20:57, Steve Singer wrote:
> On 01/09/2016 01:30 PM, Steve Singer wrote:
>> On 12/31/2015 06:34 PM, Petr Jelinek wrote:
>>
>
>> I'm not really sure what to do to 'recover' my cluster at this point
>> so I'll send this off and rebuild my cluster and start over.
>>
>>
>
> I had a setup test1--->test2 (with 2 tables in the default set)
>
> I then created a third database (all three hosted on the same PG cluster)
>
> In the third database (test3)
> test3=# create extension pglogical;
> CREATE EXTENSION
> test3=# select pglogical.create_node(node_name:='test3',
> dsn:='host=localhost dbname=test3 port=5436');
> create_node
> -------------
> 2001662995
> (1 row)
>
> test3=# select
> pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
> dbname=test2 port=5436');
> create_subscription
> ---------------------
> 2974019075
>
> It copied the schema over but not the data (if I use test2 as the
> provider_dsn then it does copy the data).

Yes, because you probably don't have any replication sets defined there.
That's by design, replication sets are defined per provider and their
definition is not replicated. This seems to be the only sane way to
actually support merging data from multiple provider nodes. I guess this
could be documented better, but cascading is something that's still WIP.

>
> I then tried inserting a row into a table on test1. Things crashed and
> after crash recovery I keep getting
>
> 2016-01-10 13:03:15 EST LOG: database system is ready to accept
> connections
> 2016-01-10 13:03:15 EST LOG: autovacuum launcher started
> 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
> 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub
> 2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot
> "pgl_test3
> _test2_defaultsub"
> 2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing
> after 0/
> 18292D8, reading WAL from 0/18292D8
> 2016-01-10 13:03:15 EST test2LO
> I asked to subscribe something through psql, even thought it is
> asynchronous, if the async operation fails I should be able to learn
> about the problem through psql. If I am writing a script to subscribe a
> node it needs a way in my script of checking if the subscription has
> failed and reporting the error.
> My subscription script might not have easy access to the server log.
>G: logical decoding found consistent
> point at 0/
> 18292D8
> 2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using
> saved sn
> apshot.
> TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c",
> Line: 1127)
> 2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection
> 2016-01-10 13:03:15 EST LOG: worker process: pglogical apply
> 17016:2974019075 (
> PID 24746) was terminated by signal 6: Aborted
>
> The stack trace is
>
> #3 0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
> v2=v2(at)entry=0, v3=v3(at)entry=0, v4=v4(at)entry=0) at catcache.c:1127
> #4 0x00000000007c503e in SearchSysCache (cacheId=cacheId(at)entry=47,
> key1=<optimized out>, key2=key2(at)entry=0, key3=key3(at)entry=0,
> key4=key4(at)entry=0) at syscache.c:981
> #5 0x00000000006996d4 in replorigin_by_name (
> roname=0xe51f30 "pgl_test2_test1_defaultsub",
> missing_ok=missing_ok(at)entry=0 '\000') at origin.c:216
> #6 0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
> at pglogical_apply.c:235
> #7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
> #8 apply_work (streamConn=streamConn(at)entry=0xe84fb0) at
> pglogical_apply.c:1309
> #9 0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>)
> at pglogical_apply.c:1691
> #10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726
> ---Type <return> to continue, or q <return> to quit---
> #11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at
> postmaster.c:5501
> #12 maybe_start_bgworker () at postmaster.c:5676
> #13 0x0000000000680206 in sigusr1_handler
> (postgres_signal_arg=<optimized out>)
> at postmaster.c:4937
> #14 <signal handler called>
> #15 0x00007fdb54fa2293 in __select_nocancel ()
> at ../sysdeps/unix/syscall-template.S:81
> #16 0x0000000000468285 in ServerLoop () at postmaster.c:1648
> #17 0x000000000068161e in PostmasterMain (argc=argc(at)entry=3,
> argv=argv(at)entry=0xddede0) at postmaster.c:1292
> #18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223
>
>

That's bug, fixed.

>
> I tried dropping the subscription and re-adding it. I keep getting
>
> 2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent
> point at 0/1830080
> 2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions.
> 2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot:
> "000004DE-1" with 0 transaction IDs
> 2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists
> 2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a (
> a integer NOT NULL,
> b integer
> );
>
>
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a
> ssinger
> pg_restore: [archiver (db)] could not execute query: ERROR: relation "a"
> already exists
> Command was: CREATE TABLE a (
> a integer NOT NULL,
> b integer
> );
>
>
>
> 2016-01-10 13:21:48 EST ERROR: could not execute command
> "/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data"
> --exit-on-error -1 -d "host=localhost dbname=test3 port=5436"
> "/tmp/pglogical-28079.dump""
> 2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection
> with an open transaction
> 2016-01-10 13:21:48 EST LOG: worker process: pglogical apply
> 17016:844915593 (PID 28079) exited with exit code 1
> 2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization
> failed during nonrecoverable step (s), please try the setup again
>
> Which is probably also the cause of the error I reported yesterday (that
> I tried creating a subscription without dropping the tables).
> From a usability point of view I think we need a way of making this
> errors available in the output of pglogical.show_subscription_status().
>

Yes it is same reason I explained in previous email, can be solved with
synchronize_structure := false in the create_subscription.

And yes the show_subscription_status should show the last error. I
didn't find good way to do that yet as some errors result in being
unable to write to database anymore so this needs to be done either via
libpq connection which is ugly or via shmem communication with another
process (the manager process seems like good candidate for this), but we
don't yet have infrastructure in pglogical to do this. This is
definitely on my TODO.

>
> +- `pglogical.show_subscription_table(subscription_name name,
> + relation regclass)`
> + Shows synchronization status of a table.
> +
> + Parameters:
> + - `subscription_name` - name of the existing subscription
> + - `relation` - name of existing table, optionally qualified
> +
>
> It isn't clear from the documentation what the output of this function
> means, nor could I tell looking at it. Is this function just supposed
> to tell us if a table is part of the replication set or if it is
> 'up-to-date'. It still reports 'synchornized' when a table is behind.
>

There are several statuses the table goes through, during the COPY it's
in synchronizing status, so next logical step seemed to be synchronized.
Maybe it should be renamed to 'replicating' instead as that's what it
actually means (table has finished synchronization and is now
replicating normally).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-16 19:55:50
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/15/2016 12:07 PM, Petr Jelinek wrote:
> That's bug, fixed.
>

Can you posted an updated patch with whatever fixes you have so far made?

> There are several statuses the table goes through, during the COPY
> it's in synchronizing status, so next logical step seemed to be
> synchronized. Maybe it should be renamed to 'replicating' instead as
> that's what it actually means (table has finished synchronization and
> is now replicating normally).
>

I agree 'replicating' is clearer

From:	leo <dazhoufei(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-17 06:46:56
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

I also run into same problem and waiting for bug fix.
please update if new patch has published.

THX

--
View this message in context: http://postgresql.nabble.com/pglogical-logical-replication-contrib-module-tp5879755p5882564.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	leo <dazhoufei(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-18 04:30:02
Message-ID:	CAMsr+YEe3q3DG=oJpeN-v2EbVZKKUqqvdDRh0nJ-_=N9GPAWWw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 January 2016 at 14:46, leo <dazhoufei(at)gmail(dot)com> wrote:

> I also run into same problem and waiting for bug fix.
> please update if new patch has published.
>
>
There's a point release coming soon that'll incorporate these fixes and a
number of others. It'll be posted here in a few days.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-23 03:17:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This reply will covers a 10,000 foot level review of the feature (some of my other replies to the thread cover specifics that came up in testing and code level review will come later)

1) Do we want logical replication in core/contrib

10 year ago a popular feeling in the postgresql project was that replication didn't belong in core because there were too many different styles. People then went on to complain that it were too many replication projects to choose from and that they were hard to use and had lots of corner cases. The evolution of WAL based replication showed us how popular in-core replication is. Users like being able to use in-core features and our community process tends to produce better quality in-core solutions than external projects.

I am of the opinion that if we can come up with a solution that meets some common use cases then it would be good to have those features in core/contrib. At this stage I am not going to get into a discussion of a contrib extension versus built in as not an extension. I don't think a single replication solution
will ever meet all use-cases. I feel that the extensible infrastructure we have so far built for logical replication means that people who want to develop solutions for use-cases not covered will be in a good position. This doesn't mean we can't or shouldn't try to cover some use cases in core.

2) Does this patch provide a set of logical replication features that meet many popular use-cases

Below I will review some use-cases and try to assess how pglogical meets them.

** Streaming Postgresql Upgrade

pg_upgrade is great for many situations but sometimes you don't want an in place upgrade but you want a streaming upgrade. Possibly because you don't want application downtime but instead you just want to point your applications at the upgraded database server in a controlled manner. Othertimes you
might want an option of upgrading to a newer version of PG but maintain the option of having to rollback to the older version if things go badly.

I think pglogical should be able to handle this use case pretty well (assuming the source version of PG is actually new enough to include pglogical).
Support for replicating sequences would need to be added before this is as smooth but once sequence support was added I think this would work well.
I also don't see any reason why you couldn't replicate from 9.7 -> 9.6 thought since the wire format is abstracted from the internal representation. This is of course dependent not the application not doing anything that is inherently in-compatible between the two versions

** Query only replicas (with temp tables or additional indexes)

Sometimes you want a replica for long running or heavy queries. Requirements for temp tables, additional indexes or maybe the effect on vacuum means that our existing WAL based replicas are unsuitable.

I think pglogical should be able to handle this use case pretty well with the caveat being that your replica is an asynchronous replica and will always lag
the origin by some amount.

** Replicating a subset of tables into a different database

Sometimes you wan to replicate a handful of tables from one database to another database. Maybe the first database is the system of record for the data and the second database needs an up to date copy for querying.

Pglogical should meet this use case pretty well, it has flexible support for selecting which tables get replicated from which source. Pglogical doesn't have any facilities to rename the tables between the origin and replica but they could be added later.

** Sharding

Systems that do application level sharding (or even sharding with a fdw) often have non-sharded tables that need to be available on all shards for relational integrity or joins. Logical replication is one way to make sure that the replicated data gets to all the shards. Sharding systems also sometimes want
to take the data from individual shards and replicate it to a consolidation server for reporting purposes.

Pglogical seems to meet this use case, I guess you would have a designated origin for the shared data/global data that all shards would subscribe to
with a set containing the designated data. For the consolidation use case you would have the consolidation server subscribe to all shards

I am less clear about how someone would want DDL changes to work for these cases. The DDL support in the patch is pretty limited so I am not going to think much now about how we would want DDL to work.

** Schema changes involving rewriting big tables

Sometimes you have a DDL change on a large table that will involve a table rewrite and the best way of deploying the change is to make the DDL change
on a replicate then once it is finished promote the replica to the origin in some controlled fashion. This avoids having to lock the table on the origin
for hours.

pglogical seems to allow minor schema changes on the replica such as changing a type but it doesn't seem to allow a DO INSTEAD trigger on the replica. I don't think pglogical currently meets this use case particularly well

** Failover

WAL replication is probably a better choice for someone just looking for failover support from replication. Someone who is looking at pglogical for failover related use cases probably has one or more of the other uses cases I mentioned and wants a logical node to
take over for a failed origin. If a node fails you can take some of the remaining subscribers and have them resubscribe to one of the remaining nodes but there is no support for a) Figuring out which of the remaining nodes is most ahead b) Letting the subscribers figure out which updates from the old origin that are missing and getting them from a surviving node (they can truncate and re-copy the data but that might be very expensive)

I am not sure what would be involved in taking a streaming WAL replica and have it stand it replace the failed node.

Lack of replicating sequences would also make failing over to a pglogical replica awkward.

** Geographically distributed applications

Sometimes people have database in different geographical locations and they want to perform queries and writes locally but replicate all the data to all the other locations. This is a multi-master eventually consistent use case.

The lack of sequence support would be an issue for these use cases. I think you could also only configure the cluster in a fully connected grid (with forward_origins='none'). A lot of deployments you would want some amount of cascading and structure which isn't yet supported. I also suspect that managing a grid cluster with more than a handful of nodes will be unwieldy (particularly compared to some of the eventual consistent nosql alternatives)

The features BDR has that were removed for pglogical are probably really useful for this use-case (which I think was the original BDR use-case)

** Scaling across multiple machines

Sometimes people ask for replication systems that let them support more load than a single database server supports but with consistency.
Other use-case applies if you want 'eventually consistent' this use case is for situations where you want something other than eventual consistent.

I don't think pglogical is intended to address this.

3) Do we like the design of pglogical

I like the fact that background workers are used instead of external daemon processes
I like the fact that you can configure almost everything through SQL
I like that the output plugin is separate because this patch is big enough as it is and I can see uses for it other than pglogical.

The core abstractions are
* Nodes-, every database in the pglogical cluster is a node
* Sets - A collection of tables that behave similarly
* Subscriptions - A link between a provider and a replica. There can only be one subscription between a provider and replica

Metadata is not transferred between nodes. What I mean by this is that nodes don't have a global view of the cluster they know about their own subscriptions but nothing else. This is different than a system like slony where sl_node and sl_subscription contain a global view of your cluster state. Not sending metadata to all nodes in the cluster simplifies a bunch of things (you don't have to worry about sending metadata around and if a given piece of metadata is stale) but the downside is that I think the tooling to perform a lot of cluster reconfigure operations will need to be a lot smarter.

Petr, and Craig have you thought about how you might support getting the cluster back into a sane state after a node fails with minimal pain.
A lot of the reason why slony needs all this metadata is to support that kind of thing. I don't think we need this for the first version but it would be nice to know that the design could accommodate such a thing.

4) Do we like the syntax

I think the big debate about syntax is do we want functions or pure SQL (ie CREATE SUBSCRIPTION default1 provider_dsn=...). If we want this as an extension
then it needs to be functions. I support this decision I think the benefits of keeping pglogical as an extension is the right tradeoff. In a few releases we can always add SQL syntax if we want it to no longer be an extension

If I have any comments have on the names or arguments of individual functions I will send them later.

5) Is the maintenance burden of this patch too high

The contrib module is big, significantly bigger than most (all?) of the other contrib modules and that doesn't include the output plugin. I see a lot of potential use-cases that I think pglogical can (or will eventually) be able to handle and I think that justifies the maintenance burden. If others disagree they should speak up.

I am concerned about testing, I don't think the .sql based regression tests are going to adequately test a replication system that supports concurrent activity on different databases/servers. I remember hearing talk about a python based test suite that was rejected in another thread. Having perl tests that use -DBI has also been rejected.

The other day Tom made this comment as part of the 'Releasing in September' thread:

---
I do not think we should necessarily try to include every testing tool
in the core distribution. What is important is that they be readily
available: easy to find, easy to use, documented, portable.
-----

The standard make contrib check tests need to do some testing of pglogical but I think it will require testing much more thorough than what we can get with in core test tooling. I think this is a good case to look at test tooling that doesn't live in core.

Overall I am very impressed with pglogical and see a lot of potential

From:	leo <dazhoufei(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-26 12:33:13
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Steve Singer,

I find the pglogical package has updated, I reinstall the new RPM package
and test again. But I find the same error in subscription node after I run
pglogical.create_subscription command:
Error message:
< 2016-01-26 12:23:59.642 UTC >LOG: worker process: pglogical apply
19828:2377587811 (PID 12299) exited with exit code 1
< 2016-01-26 12:23:59.642 UTC >LOG: unregistering background worker
"pglogical apply 19828:2377587811"
< 2016-01-26 12:23:59.642 UTC >LOG: registering background worker
"pglogical apply 19828:2377587811"
< 2016-01-26 12:23:59.642 UTC >LOG: starting background worker process
"pglogical apply 19828:2377587811"
< 2016-01-26 12:23:59.643 UTC >ERROR: subscriber replicate_gis_data_from_11
initialization failed during nonrecoverable step (s), please try the setup
again
I also find the provide node has error message:
< 2016-01-26 04:16:51.173 UTC >LOG: exported logical decoding
snapshot: "0003F483-1" with 0 transaction IDs
< 2016-01-26 04:16:51.282 UTC >LOG: unexpected EOF on client connection
with an open transaction
< 2016-01-26 04:16:51.549 UTC >LOG: logical decoding found consistent point
at 4F/8CD1A090
< 2016-01-26 04:16:51.549 UTC >DETAIL: There are no running transactions.
< 2016-01-26 04:16:51.549 UTC >LOG: exported logical decoding snapshot:
"0003F484-1" with 0 transaction IDs
< 2016-01-26 04:16:51.675 UTC >LOG: unexpected EOF on client connection
with an open transaction
< 2016-01-26 04:16:51.968 UTC >LOG: logical decoding found consistent point
at 4F/8CD1A0F8
< 2016-01-26 04:16:51.968 UTC >DETAIL: There are no running transactions.
< 2016-01-26 04:16:51.968 UTC >LOG: exported logical decoding snapshot:
"0003F485-1" with 0 transaction IDs
< 2016-01-26 04:16:52.399 UTC >ERROR: schema "topology" already exists
< 2016-01-26 04:16:52.436 UTC >LOG: unexpected EOF on client connection
with an open transaction

I test pglogical according to README document. Could you tell me what
is wrong?

Thanks,
Leo

--
View this message in context: http://postgresql.nabble.com/pglogical-logical-replication-contrib-module-tp5879755p5884242.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	leo <dazhoufei(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-26 14:43:30
Message-ID:	CAMsr+YF0myoZpT1nCFLP8wPGzP+eR9YintHk2w=2NMc12z8hzg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26 January 2016 at 20:33, leo <dazhoufei(at)gmail(dot)com> wrote:

> Hi Steve Singer,
>
> I find the pglogical package has updated, I reinstall the new RPM
> package
> and test again. But I find the same error in subscription node after I run
> pglogical.create_subscription command:
>
>
Please don't side-track threads about patch review and development with
requests for support of a released version.

This thread is about getting pglogical into PostgreSQL 9.6 core, rather
than the separately released product that's been released to support 9.4
and 9.5.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Steve Singer <steve(at)ssinger(dot)info>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-26 15:43:18
Message-ID:	CAMsr+YGTL6Vrrkidn6EKVTVt0Abofn-q+0SDYzk6aHVM5DmPxw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23 January 2016 at 11:17, Steve Singer <steve(at)ssinger(dot)info> wrote:

>
> 2) Does this patch provide a set of logical replication features that meet
> many popular use-cases
>
> Below I will review some use-cases and try to assess how pglogical meets
> them.
>
> ** Streaming Postgresql Upgrade
>
> pg_upgrade is great for many situations but sometimes you don't want an in
> place upgrade but you want a streaming upgrade. Possibly because you don't
> want application downtime but instead you just want to point your
> applications at the upgraded database server in a controlled manner.
> Othertimes you
> might want an option of upgrading to a newer version of PG but maintain
> the option of having to rollback to the older version if things go badly.
>
> I think pglogical should be able to handle this use case pretty well
> (assuming the source version of PG is actually new enough to include
> pglogical).
>

Yep, it's designed significantly for that case. That's also why support
for 9.4 and 9.5 is maintained as a standalone extension, so you can get
data out of 9.4 and 9.5 easily (and for that matter, upgrade 9.4 to 9.5).

> Support for replicating sequences would need to be added before this is as
> smooth but once sequence support was added I think this would work well.
>

This will unfortunately have to be 9.6 only. We can work around it with
some limitations in a pglogical downstream in older versions, but I really
want to get time to write a v2 of the sequence decoding patch so I can get
that into 9.6.

> ** Query only replicas (with temp tables or additional indexes)
>
> Sometimes you want a replica for long running or heavy queries.
> Requirements for temp tables, additional indexes or maybe the effect on
> vacuum means that our existing WAL based replicas are unsuitable.
>
> I think pglogical should be able to handle this use case pretty well with
> the caveat being that your replica is an asynchronous replica and will
> always lag the origin by some amount.
>

You can actually run it as a synchronous replica too, with the usual
limitations that you can have only one synchronous standby at a time, etc.
Or should be able to - I haven't had a chance to write proper tests for
sync rep using pglogical yet.

Performance will currently hurt if you do big xacts. That's why we need
interleaved xact streaming support down the track.

> Pglogical doesn't have any facilities to rename the tables between the
> origin and replica but they could be added later.
>

Yep, we could do that with a hook. You couldn't use initial schema sync if
you did that, of course.

> ** Sharding
>
> Systems that do application level sharding (or even sharding with a fdw)
> often have non-sharded tables that need to be available on all shards for
> relational integrity or joins. Logical replication is one way to make
> sure that the replicated data gets to all the shards. Sharding systems
> also sometimes want
> to take the data from individual shards and replicate it to a
> consolidation server for reporting purposes.
>
> Pglogical seems to meet this use case, I guess you would have a designated
> origin for the shared data/global data that all shards would subscribe to
> with a set containing the designated data. For the consolidation use case
> you would have the consolidation server subscribe to all shards
>
> I am less clear about how someone would want DDL changes to work for these
> cases. The DDL support in the patch is pretty limited so I am not going to
> think much now about how we would want DDL to work.
>

DDL support is "version 2" material, basically.

9.5 has hooks that allow DDL deparsing to be implemented as an extension.
That extension needs to be finished off (there's some work-in-progress code
floating around from 9.5 dev) and needs to expose an API for other
extensions. Then pglogical can register hooks with the ddl deparse
extension and use that for DDL replication.

As we learned with BDR, though, DDL replication is *hard*.

For one thing PostgreSQL has global objects like users that we can't
currently capture DDL for, and then creates db-local objects that have
dependences on them. So you have to manually replicate the global objects
still. I can see some possible solutions for this, but nothing's really on
the horizon.

Additionally there a some operations that are a bit problematic for logical
replication. Full table rewrites being the main one - they clobber
replication origin information among other issues. We really need a way to
decode

ALTER TABLE blah ADD COLUMN fred integer NOT NULL DEFAULT 42;

BEGIN;
ALTER TABLE blah ADD COLUMN fred integer;
ALTER TABLE blah ALTER COLUMN fred DEFAULT 42;
UPDATE blah SET fred = 42;
ALTER TABLE blah ALTER COLUMN fred NOT NULL;
COMMIT;

which involves some "interesting" co-operation between DDL deparse and
logical replication. The mapping of the decoded full table rewrite to the
underlying table is a bit interesting; we just get a decode stream for a
synthetic table named "pg_temp_xxxx" where the xxxx is the table upstream
oid. A nicer API for that would be good.

** Schema changes involving rewriting big tables
>
> Sometimes you have a DDL change on a large table that will involve a table
> rewrite and the best way of deploying the change is to make the DDL change
> on a replicate then once it is finished promote the replica to the origin
> in some controlled fashion. This avoids having to lock the table on the
> origin for hours.
>
> pglogical seems to allow minor schema changes on the replica such as
> changing a type but it doesn't seem to allow a DO INSTEAD trigger on the
> replica. I don't think pglogical currently meets this use case
> particularly well
>

I'm not sure I fully understand that one.

> ** Failover
>
> WAL replication is probably a better choice for someone just looking for
> failover support from replication.

"Physical" replication as I've been trying to call it, since logical rep is
also WAL based.

I agree with you. It very definitely is.

I have a roadmap in mind for logical rep based failover. We need sequence
advance replication (or even better, sequence access mehods), an
upstream<->downstream LSN mapping and failover slots and logical decoding
of logical slots. A few bits and pieces.

Someone who is looking at pglogical for failover related use cases probably
> has one or more of the other uses cases I mentioned and wants a logical
> node to take over for a failed origin. If a node fails you can take some
> of the remaining subscribers and have them resubscribe to one of the
> remaining nodes but there is no support for a) Figuring out which of the
> remaining nodes is most ahead b) Letting the subscribers figure out which
> updates from the old origin that are missing and getting them from a
> surviving node (they can truncate and re-copy the data but that might be
> very expensive)
>

Yep. Failover slots are part of that picture, and the logical decoding of
slot positions + lsn map stuff carries on from it.

> ** Geographically distributed applications
>
> Sometimes people have database in different geographical locations and
> they want to perform queries and writes locally but replicate all the data
> to all the other locations. This is a multi-master eventually consistent
> use case.
>

Yep. That's what BDR aims for, and why the plan in 2ndQ is to rebuild BDR
around pglogical to continue the work of streaming BDR into core. You can
think of pglogical and pglogical_output as _parts of BDR_ that have been
extracted to submit into core, they've just been heavily polished up, made
much more general purpose, and had things that won't work in core yet
removed.

Hopefully we'll have full MM on top in time, but that can't all be done in
one release.

The lack of sequence support would be an issue for these use cases.

That's why we need sequence access methods. There's a patch for that in the
9.6 CF too.

> I think you could also only configure the cluster in a fully connected
> grid (with forward_origins='none'). A lot of deployments you would want
> some amount of cascading and structure which isn't yet supported. I also
> suspect that managing a grid cluster with more than a handful of nodes will
> be unwieldy (particularly compared to some of the eventual consistent nosql
> alternatives)
>

I envision a management layer on top for that, where pglogical forms an
underlying component.

The features BDR has that were removed for pglogical are probably really
> useful for this use-case (which I think was the original BDR use-case)
>

Yep. They were removed mainly because they can't work with core until some
other patches get in too. Also just to keep the first pglogical submission
vaguely practical and manageable.

> ** Scaling across multiple machines
>
> Sometimes people ask for replication systems that let them support more
> load than a single database server supports but with consistency. Other
> use-case applies if you want 'eventually consistent' this use case is for
> situations where you want something other than eventual consistent.
>
> I don't think pglogical is intended to address this.
>

Correct. That's more like postgres-XL, where you have a distributed lock
manager, distributed transaction manager, etc.

pglogical (or the output plugin at least) can form part of such a solution,
and there's an experiment being contemplated right now to use pglogical as
the data replication transport in postgres-XL. But it doesn't attempt to
provide a whole solution there, only one component.

> Metadata is not transferred between nodes. What I mean by this is that
> nodes don't have a global view of the cluster they know about their own
> subscriptions but nothing else. This is different than a system like slony
> where sl_node and sl_subscription contain a global view of your cluster
> state. Not sending metadata to all nodes in the cluster simplifies a bunch
> of things (you don't have to worry about sending metadata around and if a
> given piece of metadata is stale) but the downside is that I think the
> tooling to perform a lot of cluster reconfigure operations will need to be
> a lot smarter.
>

Yep. We're going to need a management layer on top for building and
monitoring non-trivial node graphs. Whether in core or not.

Petr and I found that trying to design a schema that could fit all use
cases while preserving a system-wide view of the node graph was
impractical, if not outright impossible. There are quite conflicting use
cases: mesh multi-master wants to see everything, whereas if you have three
upstreams feeding into a data aggregator that then replicates to other
nodes you don't particularly want the leaf nodes worrying about the
upstream origin servers.

> Petr, and Craig have you thought about how you might support getting the
> cluster back into a sane state after a node fails with minimal pain.
>

Yes.

There are really two approaches. One is having a physical standby where you
fail over to a streaming physical replica and your slot state on logical
slots is preserved. For that we need failover slots (per the patch to 9.6).

The other is to use logical failover, where there's a logical replica that
you can switch leaf nodes to point to. For that we need a way to record
slot advances on one node, send them on the wire and interpret them
usefully on another node. Hence the outlined support for logical decoding
of logical slot create/drop/update, and a lsn map. I haven't thought as
hard about this one yet.

There's another thing you need for multimaster/mesh systems where there's a
graph not a simple tree. That's the ability to lazily advance a slot so
that when a node fails you can find the peer that replayed the furthest in
that node's history and ask it to send you the changes from the lost node.
You have to be able to go back in time on the slot to the most recent point
you have a local copy of the other node's state. Turns out that's not hard,
you just delay advancing the slot. You also have to be able to replay it
again with a filter that sends you only that node's changes. That's also
not hard using replication origins. There are some hairy complexities when
it comes to multi-master conflict resolution though, where changes to data
come from more than one node. That's a "for later" problem.

> I am concerned about testing, I don't think the .sql based regression
> tests are going to adequately test a replication system that supports
> concurrent activity on different databases/servers.

I agree that we can't rely only on that.

This is part of a bigger picture in Pg where we just don't test multi-node
stuff. Failover, replication, etc is ignored in the tests. The TAP based
stuff looks to change that and I suspect we'd have to investigate whether
it's possible to build on top of that for more comprehensive testing.

Thanks again for the review work, I know it takes serious time and effort
and I appreciate it.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-01-27 04:14:26
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/31/2015 03:34 PM, Petr Jelinek wrote:
> Hi,
>
> I'd like to submit the replication solution which is based on the
> pglogical_output [1] module (which is obviously needed for this to
> compile).

This is fantastic! However, history presents itself here and PostgreSQL
in the past has not "blessed" a single solution for Replication.
Obviously that changed a bit with streaming replication but this is a
bit different than that. As I understand it, PgLogical is Logical
Replication (similar to Slony and Londiste). I wouldn't be surprised
(although I don't know) if Slony were to start using some of the
pglogical_output module features in the future.

If we were to accept PgLogical into core, it will become the default
blessed solution for PostgreSQL. While that is great in some ways it is
a different direction than the project has taken in the past. Is this
what we want to do?

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-03 02:25:35
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: not tested
Documentation: tested, failed

Here is some more review

+- `pglogical.replication_set_add_table(set_name name, table_name regclass, synchronize boolean)`
+ Adds a table to replication set.
+
+ Parameters:
+ - `set_name` - name of the existing replication set
+ - `table_name` - name or OID of the table to be added to the set
+ - `synchronize` - if true, the table data is synchronized on all subscribers
+ which are subscribed to given replication set, default false
+

The argument to this function is actually named "relation" not "table_name" though we might want to update the function to name the argument table_name.

Also we don't explain what 'synchronize' means I first thought that a value of false would mean that existing data won't be copied but any new changes will be.
A value of false actually seems to mean that nothing will happen with the table until the synchronize function is manually called. We seem to be using the word 'synchronize' in different sense in different places I find it confusing (ie synchronize_data and syncronize_structure in create_subscription).

*** a/contrib/pglogical/pglogical_sync.c
--- b/contrib/pglogical/pglogical_sync.c
+ static void
+ dump_structure(PGLogicalSubscription *sub, const char *snapshot)
+ {
+ char pg_dump[MAXPGPATH];
+ uint32 version;
+ int res;
+ StringInfoData command;
+
+ if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, &version, pg_dump))
+ elog(ERROR, "pglogical subscriber init failed to find pg_dump relative to binary %s",
+ my_exec_path);
+
+ if (version / 100 != PG_VERSION_NUM / 100)
+ elog(ERROR, "pglogical subscriber init found pg_dump with wrong major version %d.%d, expected %d.%d",
+ version / 100 / 100, version / 100 % 100,
+ PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100);
+
+ initStringInfo(&command);
+ #if PG_VERSION_NUM < 90500
+ appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -N pglogical_origin -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",
+ #else
+ appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",

1) I am not sure we can assume/require that the pg_dump binary be in the same location as the postgres binary. I don't know think we've ever required that client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as postgres. pg_upgrade does require this so maybe this isn't a problem in practice but I thought I'd point it out. Ideally wouldn't need to call an external program to get a schema dump but turning pg_dump into a library is beyond the scope of this patch.

2) I don't think we can hard-coded /tmp as the directory for the schema dump. I don't think will work on most windows systems and even on a unix system $TMPDIR might be set to something else. Maybe writing this into pgsql_tmp would be a better choice.

Furtherdown in
pglogical_sync_subscription(PGLogicalSubscription *sub)
+ switch (status)
+ {
+ /* Already synced, nothing to do except cleanup. */
+ case SYNC_STATUS_READY:
+ MemoryContextDelete(myctx);
+ return;
+ /* We can recover from crashes during these. */
+ case SYNC_STATUS_INIT:
+ case SYNC_STATUS_CATCHUP:
+ break;
+ default:
+ elog(ERROR,
+ "subscriber %s initialization failed during nonrecoverable step (%c), please try the setup again",
+ sub->name, status);
+ break;
+ }

I think the default case needs to do something to unregister the background worker. We already discussed trying to get the error message to a user in a better way either way there isn't any sense in this background worker being launched again if the error is nonrecoverable.

+
+ tables = copy_replication_sets_data(sub->origin_if->dsn,
+ sub->target_if->dsn,
+ snapshot,
+ sub->replication_sets);
+
+ /* Store info about all the synchronized tables. */
+ StartTransactionCommand();
+ foreach (lc, tables)

Shouldn't we be storing the info about the synchronized tables as part of the same transaction that does the sync?

I'll keeping going through the code as I have time. I think it is appropriate to move this to the next CF since the CF is past the end date and the patch has received some review. When you have an updated version of the patch post it, don't wait until March.

From:	Steve Singer <steve(at)ssinger(dot)info>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-03 02:31:54
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/26/2016 10:43 AM, Craig Ringer wrote:
> On 23 January 2016 at 11:17, Steve Singer <steve(at)ssinger(dot)info
> <mailto:steve(at)ssinger(dot)info>> wrote:
>

> ** Schema changes involving rewriting big tables
>
> Sometimes you have a DDL change on a large table that will involve
> a table rewrite and the best way of deploying the change is to
> make the DDL change
> on a replicate then once it is finished promote the replica to the
> origin in some controlled fashion. This avoids having to lock the
> table on the origin for hours.
>
> pglogical seems to allow minor schema changes on the replica such
> as changing a type but it doesn't seem to allow a DO INSTEAD
> trigger on the replica. I don't think pglogical currently meets
> this use case particularly well
>
>
> I'm not sure I fully understand that one.

Say you have a table A with column b

and the next version of your application you want to create a second
table B that has column B

create table B (b text);
insert into B select b from a;
alter table a drop column b;

but you want to do this on a replica because it is a very big table and
you want to minimize downtown.

You could have a trigger on the replica that performed updates on B.b
instead of A except triggers don't seem to get run on the replica.

> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

Steve

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-16 14:38:20
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 26, 2016 at 08:14:26PM -0800, Joshua Drake wrote:
> On 12/31/2015 03:34 PM, Petr Jelinek wrote:
> >Hi,
> >
> >I'd like to submit the replication solution which is based on the
> >pglogical_output [1] module (which is obviously needed for this to
> >compile).
>
> This is fantastic! However, history presents itself here and
> PostgreSQL in the past has not "blessed" a single solution for
> Replication. Obviously that changed a bit with streaming replication
> but this is a bit different than that. As I understand it, PgLogical
> is Logical Replication (similar to Slony and Londiste). I wouldn't
> be surprised (although I don't know) if Slony were to start using
> some of the pglogical_output module features in the future.
>
> If we were to accept PgLogical into core, it will become the default
> blessed solution for PostgreSQL. While that is great in some ways
> it is a different direction than the project has taken in the past.
> Is this what we want to do?

Replying late here, but I think with binary replication, we decided
that, assuming you were happy with the features provided, our streaming
binary replication solution was going to be the best and recommended way
of doing it.

I don't think we ever had that feeling with Slony or Londiste in that
there were so many limitations and so many different ways of
implementing logical replication that we never recommended a best way.

So, the question is, do we feel that PgLogical is best and recommended
way to do logical replication. If it is, then having it in core makes
sense.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

From:	Oleg Bartunov <obartunov(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-16 16:54:54
Message-ID:	CAF4Au4zmspWiqq=nCFF5Shy-3foaDiwNt+Xk2s7TLa-83D=9Hg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 16, 2016 at 5:38 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Jan 26, 2016 at 08:14:26PM -0800, Joshua Drake wrote:
> > On 12/31/2015 03:34 PM, Petr Jelinek wrote:
> > >Hi,
> > >
> > >I'd like to submit the replication solution which is based on the
> > >pglogical_output [1] module (which is obviously needed for this to
> > >compile).
> >
> > This is fantastic! However, history presents itself here and
> > PostgreSQL in the past has not "blessed" a single solution for
> > Replication. Obviously that changed a bit with streaming replication
> > but this is a bit different than that. As I understand it, PgLogical
> > is Logical Replication (similar to Slony and Londiste). I wouldn't
> > be surprised (although I don't know) if Slony were to start using
> > some of the pglogical_output module features in the future.
> >
> > If we were to accept PgLogical into core, it will become the default
> > blessed solution for PostgreSQL. While that is great in some ways
> > it is a different direction than the project has taken in the past.
> > Is this what we want to do?
>
> Replying late here, but I think with binary replication, we decided
> that, assuming you were happy with the features provided, our streaming
> binary replication solution was going to be the best and recommended way
> of doing it.
>
> I don't think we ever had that feeling with Slony or Londiste in that
> there were so many limitations and so many different ways of
> implementing logical replication that we never recommended a best way.
>
> So, the question is, do we feel that PgLogical is best and recommended
> way to do logical replication. If it is, then having it in core makes
> sense.
>

DDL support is what it's missed for now.

>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	obartunov(at)gmail(dot)com
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 01:33:56
Message-ID:	CAMsr+YHMDm0YjjZ63Z65mkHjv9hoRowO2PCozZXKT0cCQtVtZw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 February 2016 at 00:54, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

>
> DDL support is what it's missed for now.
>
>
TBH, based on experience with DDL replication and deparse in BDR, it's
going to be missing for a while yet too, or at least not comprehensively
present without caveats or exceptions.

Some DDL operations don't translate well to a series of replicatable
actions. The case I hit the most is

ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
some_function();

This is executed (simplified) by taking an ACCESS EXCLUSIVE lock, changing
the catalogs but not making the changes visible yet, rewriting the table,
and committing to make the rewritten table and the catalog changes visible.

That won't work well with logical replication. We currently capture DDL
with event triggers and log them to a table for later logical decoding and
replay - that's the "recognised" way. The trouble being that replaying that
statement will result in an unnecessary full table rewrite on the
downstream. Then we have to decode and send stream of changes to a table
called pg_temp_<oid_of_mytable>, truncate the copy of mytable on the
downstream that we just rewrote and apply those rows instead.

Of course all that only works sensibly if you have exactly one upstream and
the downstream copy of the table is treated as (or enforced as) read-only.

Improving this probably needs DDL deparse to be smarter. Rather than just
emitting something that can be reconstructed into the SQL text of the DDL
it needs to emit one or more steps that are semantically the same but allow
us to skip the rewrite. Along the lines of:

* ALTER TABLE mytable ADD COLUMN somecolumn sometype;
* ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
* <wait for rewrite data for mytable>
* ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;

Alternately the downstream would need a hook that lets it intercept and
prevent table rewrites caused by ALTER TABLE and similar. So it can instead
just do a truncate and wait for the new rows to come from the master.

Note that all this means the standby has to hold an ACCESS EXCLUSIVE lock
on the table during all of replay. That shouldn't be necessary, all we
really need is an EXCLUSIVE lock since concurrent SELECTs are fine. No idea
how to do that.

Deparse is also just horribly complicated to get right. There are so many
clauses and subclauses and variants of statements. Each of which must be
perfect.

Not everything has a simple and obvious mapping on the downstream side
either. TRUNCATE ... CASCADE is the obvious one. You do a cascade truncate
on the master - do you want that to replicate as a cascaded truncate on the
replica, or a truncate of only those tables that actually got truncated on
the master? If the replica has additional tables with FKs pointing at
tables replica the TRUNCATE would truncate those too if you replicate it as
CASCADE; if you don't the truncate will fail instead. Really, both are
probably wrong as far as the user is concerned, but we can't truncate just
the tables truncated on the master, ignore the FK relationships, and leave
dangling FK references either.

All this means that DDL replication is probably only going to make sense in
scenarios where there's exactly one master and the replica obeys some rules
like "don't create FKs pointing from non-replicated tables to tables
replicated from somewhere else". A concept we currently have no way to
express or enforce like we do persistent-to-UNLOGGED FKs.

Then there's global objects. Something as simple as:

CREATE ROLE fred;

CREATE TABLE blah(...) OWNER fred;

will break replication because we only see the CREATE TABLE, not the CREATE
ROLE. If we instead replayed the CREATE ROLE and there were multiple
connections between different DBs on an upstream and downstream apply would
fail on all but one. But we can't anyway since there's no way to capture
that CREATE ROLE from any DB except the one it was executed in, which might
not even be one of the ones doing replication.

I strongly suspect we'll need logical decoding to be made aware of such
global DDL and decode it from the WAL writes to the system catalogs. Which
will be fun - but at least modifications to the shared catalogs are a lot
simpler than the sort of gymnastics done by ALTER TABLE, etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 08:24:46
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Craig,

Thanks for your explanation. I have to agree with your arguments that in
general case replication of DDL statement using logical decoding seems
to be problematic. But we are mostly considering logical decoding in
quite limited context: replication between two identical Postgres
database nodes (multimaster).

Do you think that it in this case replication of DLL can be done as
sequence of low level operations with system catalog tables
including manipulation with locks? So in your example with ALTER TABLE
statement, can we correctly replicate it to other nodes
as request to set exclusive lock + some manipulations with catalog
tables and data table itself?
If so, instead of full support of DDL in logical decoding we can only:

1. Add option whether to include operations on system catalog tables in
logical replication or not.
2. Make it possible to replicate lock requests (can be useful not only
for DDLs)

I looked how DDL was implemented in BDR and did it in similar way in our
multimaster.
But it is awful: we need to have two different channels for propagating
changes. Additionally, in multimaster we want to enforce cluster wide
ACID. It certainly includes operations with metadata. It will be very
difficult to implement if replication of DML and DDL is done in two
different ways...

Let me ask one more question concerning logical replication: how
difficult it will be from your point of view to support two phase commit
in logical replication? Are there some principle problems?

Thanks in advance,
Konstantin

On 17.02.2016 04:33, Craig Ringer wrote:
> On 17 February 2016 at 00:54, Oleg Bartunov <obartunov(at)gmail(dot)com
> <mailto:obartunov(at)gmail(dot)com>> wrote:
>
>
> DDL support is what it's missed for now.
>
>
> TBH, based on experience with DDL replication and deparse in BDR, it's
> going to be missing for a while yet too, or at least not
> comprehensively present without caveats or exceptions.
>
>
> Some DDL operations don't translate well to a series of replicatable
> actions. The case I hit the most is
>
> ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
> some_function();
>
> This is executed (simplified) by taking an ACCESS EXCLUSIVE lock,
> changing the catalogs but not making the changes visible yet,
> rewriting the table, and committing to make the rewritten table and
> the catalog changes visible.
>
> That won't work well with logical replication. We currently capture
> DDL with event triggers and log them to a table for later logical
> decoding and replay - that's the "recognised" way. The trouble being
> that replaying that statement will result in an unnecessary full table
> rewrite on the downstream. Then we have to decode and send stream of
> changes to a table called pg_temp_<oid_of_mytable>, truncate the copy
> of mytable on the downstream that we just rewrote and apply those rows
> instead.
>
> Of course all that only works sensibly if you have exactly one
> upstream and the downstream copy of the table is treated as (or
> enforced as) read-only.
>
> Improving this probably needs DDL deparse to be smarter. Rather than
> just emitting something that can be reconstructed into the SQL text of
> the DDL it needs to emit one or more steps that are semantically the
> same but allow us to skip the rewrite. Along the lines of:
>
> * ALTER TABLE mytable ADD COLUMN somecolumn sometype;
> * ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
> * <wait for rewrite data for mytable>
> * ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;
>
> Alternately the downstream would need a hook that lets it intercept
> and prevent table rewrites caused by ALTER TABLE and similar. So it
> can instead just do a truncate and wait for the new rows to come from
> the master.
>
> Note that all this means the standby has to hold an ACCESS EXCLUSIVE
> lock on the table during all of replay. That shouldn't be necessary,
> all we really need is an EXCLUSIVE lock since concurrent SELECTs are
> fine. No idea how to do that.
>
>
>
> Deparse is also just horribly complicated to get right. There are so
> many clauses and subclauses and variants of statements. Each of which
> must be perfect.
>
>
>
> Not everything has a simple and obvious mapping on the downstream side
> either. TRUNCATE ... CASCADE is the obvious one. You do a cascade
> truncate on the master - do you want that to replicate as a cascaded
> truncate on the replica, or a truncate of only those tables that
> actually got truncated on the master? If the replica has additional
> tables with FKs pointing at tables replica the TRUNCATE would truncate
> those too if you replicate it as CASCADE; if you don't the truncate
> will fail instead. Really, both are probably wrong as far as the user
> is concerned, but we can't truncate just the tables truncated on the
> master, ignore the FK relationships, and leave dangling FK references
> either.
>
>
> All this means that DDL replication is probably only going to make
> sense in scenarios where there's exactly one master and the replica
> obeys some rules like "don't create FKs pointing from non-replicated
> tables to tables replicated from somewhere else". A concept we
> currently have no way to express or enforce like we do
> persistent-to-UNLOGGED FKs.
>
>
>
> Then there's global objects. Something as simple as:
>
> CREATE ROLE fred;
>
> CREATE TABLE blah(...) OWNER fred;
>
> will break replication because we only see the CREATE TABLE, not the
> CREATE ROLE. If we instead replayed the CREATE ROLE and there were
> multiple connections between different DBs on an upstream and
> downstream apply would fail on all but one. But we can't anyway since
> there's no way to capture that CREATE ROLE from any DB except the one
> it was executed in, which might not even be one of the ones doing
> replication.
>
> I strongly suspect we'll need logical decoding to be made aware of
> such global DDL and decode it from the WAL writes to the system
> catalogs. Which will be fun - but at least modifications to the shared
> catalogs are a lot simpler than the sort of gymnastics done by ALTER
> TABLE, etc.
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	obartunov(at)gmail(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 08:27:50
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-02-17 09:33:56 +0800, Craig Ringer wrote:
> Some DDL operations don't translate well to a series of replicatable
> actions. The case I hit the most is
>
> ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
> some_function();
>
> This is executed (simplified) by taking an ACCESS EXCLUSIVE lock, changing
> the catalogs but not making the changes visible yet, rewriting the table,
> and committing to make the rewritten table and the catalog changes visible.
>
> That won't work well with logical replication.

FWIW, I think this is much less a fundamental, and more an
implementation issue. Falling back to just re-replicating the table, and
then optimizing a few common cases (only immutable DEFALUT/USING
involved) should be enough for a while.

Lets get the basics right, before reaching for the moon.

Greetings,

Andres Freund

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 09:16:38
Message-ID:	CAMsr+YE5DnMkWF7iRCN_zRLHYte90xZYbfHpf2rbK3nku6cFUA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 February 2016 at 16:24, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru
> wrote:

> Thanks for your explanation. I have to agree with your arguments that in
> general case replication of DDL statement using logical decoding seems to
> be problematic. But we are mostly considering logical decoding in quite
> limited context: replication between two identical Postgres database nodes
> (multimaster).
>

Yep, much like BDR. Where all this infrastructure came from and is/was
aimed at.

> Do you think that it in this case replication of DLL can be done as
> sequence of low level operations with system catalog tables
> including manipulation with locks?
>

No.

For one thing logical decoding doesn't see catalog tuple changes right now.
Though I imagine that could be changed easily enough.

More importantly - oids. You add a column to a table:

ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL DEFAULT
some_function()

This writes to catalogs including:

pg_attribute
pg_constraint
pg_index
pg_class (for the index relation)

... probably more. It also refers to pg_class (for the definition of
mytable), pg_type (definition of some_type), pg_proc (definition of
some_function), the b-tree operator class for some_type in pg_opclass, the
b-tree indexam in pg_am, ... more.

Everything is linked by oids, and the oids are all node local. You can't
just blindly re-use them. If "some_type" is hstore, the oid of hstore in
pg_type might be different on the upstream and downstream. The only
exception is the oids of built-in types and even then that's not guaranteed
across major versions.

So if you blindly replicate catalog row changes you'll get a horrible mess.
That's before considering a table's relfilenode, which is initially the
same as its oid, but subject to change if truncated or rewritten.

To even begin to do this half-sanely you'd have to maintain a mapping of
upstream object oids->names on the downstream, with invalidations
replicated from the upstream. That's only the beginning. There's handling
of extensions and lots more fun.

> So in your example with ALTER TABLE statement, can we correctly replicate
> it to other nodes
> as request to set exclusive lock + some manipulations with catalog tables
> and data table itself?
>

Nope. No hope, not unless "some manipulations with catalog tables and data
table its self" is a lot more comprehensive than I think you mean.

> 1. Add option whether to include operations on system catalog tables in
> logical replication or not.
>

I would like to have this anyway.

> 2. Make it possible to replicate lock requests (can be useful not only for
> DDLs)
>

I have no idea how you'd even begin to do that.

> I looked how DDL was implemented in BDR and did it in similar way in our
> multimaster.
> But it is awful: we need to have two different channels for propagating
> changes.
>

Yeah, it's not beautiful, but maybe you misunderstood something? The DDL is
written to a table, and that table's changes are replayed along with
everything else. It's consistent and keeps DDL changes as part of the xact
that performed them. Maybe you misunderstood how it works in BDR and missed
the indirection via a table?

> Additionally, in multimaster we want to enforce cluster wide ACID. It
> certainly includes operations with metadata. It will be very difficult to
> implement if replication of DML and DDL is done in two different ways...
>

That's pretty much why BDR does it this way, warts and all. Though it
doesn't offer cluster-wide ACID it does need atomic commit of xacts that
may contain DML, DDL, or some mix of the two.

> Let me ask one more question concerning logical replication: how difficult
> it will be from your point of view to support two phase commit in logical
> replication? Are there some principle problems?
>

I haven't looked closely yet. Andres will know more.

I very, very badly want to be able to decode 2PC prepared xacts myself.

The main issue I'm aware of is locking - specifically the inability to
impersonate another backend and treat locks held by that backend (which
might be a fake backend for a pg_prepared_xacts entry) as held by ourselves
for the purpose of being able to access relations, etc.

The work Robert is doing on group locking looks absolutely ideal for this,
but won't land before 9.7.

(Closely related, I also want to be able to hook into commit and transform
a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>, COMMIT
PREPARED with the application that issued the commit none the wiser. This
will allow pessimistic 2PC-based conflict handling for must-succeed xacts
like those that do DDL).

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Oleg Bartunov <obartunov(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 09:25:06
Message-ID:	CAMsr+YGuZg+3LBhGtWftkb0T5O_aL4XGM2cs4hykFqojqtc7TQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 February 2016 at 16:27, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2016-02-17 09:33:56 +0800, Craig Ringer wrote:
> > Some DDL operations don't translate well to a series of replicatable
> > actions. The case I hit the most is
> >
> > ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
> > some_function();
> >
> > This is executed (simplified) by taking an ACCESS EXCLUSIVE lock,
> changing
> > the catalogs but not making the changes visible yet, rewriting the table,
> > and committing to make the rewritten table and the catalog changes
> visible.
> >
> > That won't work well with logical replication.
>
> FWIW, I think this is much less a fundamental, and more an
> implementation issue. Falling back to just re-replicating the table

Do you mean taking a new schema dump from the upstream? Or just the table
data?

We already receive the table data in a pg_temp_<nnnn> virtual relation.
While it'd be nice to have a better way to map that to the relation being
rewritten without having to do string compares on table names all the time,
it works. If we do a low level truncate on the table *then* execute the DDL
on the empty table and finally rewrite it based on that stream as we
receive it that should work OK.

> Lets get the basics right, before reaching for the moon.
>

Yeah, it's got to be incremental. Though I do think we'll need to address
DDL affecting shared catalogs.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 10:39:47
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ok, what about the following plan:

1. Support custom WAL records (as far as I know 2ndQuadrant has such patch).
2. Add one more function to logical decoding allowing to deal with
custom records.

So the idea is that we somehow record DDL in WAL (for example using
executor hook),
then them are proceeded using logical decoding, calling special logical
deocding plugin function to handle this records.
For example we can store DDL in WAL just as SQL statements and so easily
replay them.

In this case DDL will be replicated using the same mechanism and through
the same channel as DML.

On 17.02.2016 12:16, Craig Ringer wrote:
> On 17 February 2016 at 16:24, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
> Thanks for your explanation. I have to agree with your arguments
> that in general case replication of DDL statement using logical
> decoding seems to be problematic. But we are mostly considering
> logical decoding in quite limited context: replication between two
> identical Postgres database nodes (multimaster).
>
>
> Yep, much like BDR. Where all this infrastructure came from and is/was
> aimed at.
>
> Do you think that it in this case replication of DLL can be done
> as sequence of low level operations with system catalog tables
> including manipulation with locks?
>
>
> No.
>
> For one thing logical decoding doesn't see catalog tuple changes right
> now. Though I imagine that could be changed easily enough.
>
> More importantly - oids. You add a column to a table:
>
> ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL
> DEFAULT some_function()
>
> This writes to catalogs including:
>
> pg_attribute
> pg_constraint
> pg_index
> pg_class (for the index relation)
>
> ... probably more. It also refers to pg_class (for the definition of
> mytable), pg_type (definition of some_type), pg_proc (definition of
> some_function), the b-tree operator class for some_type in pg_opclass,
> the b-tree indexam in pg_am, ... more.
>
> Everything is linked by oids, and the oids are all node local. You
> can't just blindly re-use them. If "some_type" is hstore, the oid of
> hstore in pg_type might be different on the upstream and downstream.
> The only exception is the oids of built-in types and even then that's
> not guaranteed across major versions.
>
> So if you blindly replicate catalog row changes you'll get a horrible
> mess. That's before considering a table's relfilenode, which is
> initially the same as its oid, but subject to change if truncated or
> rewritten.
>
> To even begin to do this half-sanely you'd have to maintain a mapping
> of upstream object oids->names on the downstream, with invalidations
> replicated from the upstream. That's only the beginning. There's
> handling of extensions and lots more fun.
>
> So in your example with ALTER TABLE statement, can we correctly
> replicate it to other nodes
> as request to set exclusive lock + some manipulations with catalog
> tables and data table itself?
>
>
> Nope. No hope, not unless "some manipulations with catalog tables and
> data table its self" is a lot more comprehensive than I think you mean.
>
> 1. Add option whether to include operations on system catalog
> tables in logical replication or not.
>
>
> I would like to have this anyway.
>
> 2. Make it possible to replicate lock requests (can be useful not
> only for DDLs)
>
>
> I have no idea how you'd even begin to do that.
>
> I looked how DDL was implemented in BDR and did it in similar way
> in our multimaster.
> But it is awful: we need to have two different channels for
> propagating changes.
>
>
> Yeah, it's not beautiful, but maybe you misunderstood something? The
> DDL is written to a table, and that table's changes are replayed along
> with everything else. It's consistent and keeps DDL changes as part of
> the xact that performed them. Maybe you misunderstood how it works in
> BDR and missed the indirection via a table?
>
> Additionally, in multimaster we want to enforce cluster wide ACID.
> It certainly includes operations with metadata. It will be very
> difficult to implement if replication of DML and DDL is done in
> two different ways...
>
>
> That's pretty much why BDR does it this way, warts and all. Though it
> doesn't offer cluster-wide ACID it does need atomic commit of xacts
> that may contain DML, DDL, or some mix of the two.
>
> Let me ask one more question concerning logical replication: how
> difficult it will be from your point of view to support two phase
> commit in logical replication? Are there some principle problems?
>
>
> I haven't looked closely yet. Andres will know more.
>
> I very, very badly want to be able to decode 2PC prepared xacts myself.
>
> The main issue I'm aware of is locking - specifically the inability to
> impersonate another backend and treat locks held by that backend
> (which might be a fake backend for a pg_prepared_xacts entry) as held
> by ourselves for the purpose of being able to access relations, etc.
>
> The work Robert is doing on group locking looks absolutely ideal for
> this, but won't land before 9.7.
>
> (Closely related, I also want to be able to hook into commit and
> transform a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>,
> COMMIT PREPARED with the application that issued the commit none the
> wiser. This will allow pessimistic 2PC-based conflict handling for
> must-succeed xacts like those that do DDL).
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 12:58:45
Message-ID:	CAMsr+YG2qd9L+PQw+86H5sp2EB08fbDCRegiGT377SV8yf+44Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 February 2016 at 18:39, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru
> wrote:

> Ok, what about the following plan:
>
> 1. Support custom WAL records (as far as I know 2ndQuadrant has such
> patch).
> 2. Add one more function to logical decoding allowing to deal with custom
> records.
>
> So the idea is that we somehow record DDL in WAL (for example using
> executor hook),
> then them are proceeded using logical decoding, calling special logical
> deocding plugin function to handle this records.
> For example we can store DDL in WAL just as SQL statements and so easily
> replay them.
>
> In this case DDL will be replicated using the same mechanism and through
> the same channel as DML.
>
>

Sure, you can do that, but you don't need to.

Go read the relevant BDR code again, you've missed how it works.

When DDL is fired the registered event trigger captures the DDL and passes
it to DDL deparse to extract a normalized representation. It is inserted
into a queued ddl commands table in the BDR schema during the transaction
that performed the DDL.

Later, when that transaction is decoded by logical decoding we see an
insert into the queued ddl commands table and replicate that to the
client(s).

Clients see the inserts into the queued DDL commands table and special-case
them on replay. As well as executing the original insert they also execute
the DDL command that was inserted into the table. This happens at the same
point in the transaction as the original insert, which is when the DDL was
run. So it's all consistent.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	obartunov(at)gmail(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-02-17 21:06:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer wrote:

> Improving this probably needs DDL deparse to be smarter. Rather than just
> emitting something that can be reconstructed into the SQL text of the DDL
> it needs to emit one or more steps that are semantically the same but allow
> us to skip the rewrite. Along the lines of:
>
> * ALTER TABLE mytable ADD COLUMN somecolumn sometype;
> * ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
> * <wait for rewrite data for mytable>
> * ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;

Compared to the effort involved in getting the current DDL-event capture
stuff in event triggers, and writing the extension that creates the JSON
representation that expands to SQL commands, this seems easy to do.
What currently happens is that we get a list of ALTER TABLE subcommands
and then produce a single ALTER TABLE that covers them all. To fix this
problem we could mark those ALTER TABLE subcommands that require a table
rewrite, and mark them specially; emit a list of the ones before the
rewrite as one command, then emit some sort of token that indicates the
table rewrite, then emit a list of the ones after the rewrite. The
replay code can then "wait" for the rewrite to occur.

Since this is all in extension code, it's possible to tailor to the
needs you have. (This is the point where I'm extremely happy we ended
up creating the hooks and the new pseudo-type separately from the
extension containing the JSON-generating bits, instead of having it all
be a single piece.)

One slight pain point in the above is the handling of ALTER COLUMN / SET
NOT NULL. That one currently requires a table scan, which would be nice
to avoid, but I don't see any way to do that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Steve Singer <steve(at)ssinger(dot)info>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pglogical - logical replication contrib module
Date:	2016-03-01 00:19:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/02/16 03:25, Steve Singer wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, failed
> Implements feature: tested, failed
> Spec compliant: not tested
> Documentation: tested, failed
>
> Here is some more review
>
> +- `pglogical.replication_set_add_table(set_name name, table_name regclass, synchronize boolean)`
> + Adds a table to replication set.
> +
> + Parameters:
> + - `set_name` - name of the existing replication set
> + - `table_name` - name or OID of the table to be added to the set
> + - `synchronize` - if true, the table data is synchronized on all subscribers
> + which are subscribed to given replication set, default false
> +
>
> The argument to this function is actually named "relation" not "table_name" though we might want to update the function to name the argument table_name.
>
> Also we don't explain what 'synchronize' means I first thought that a value of false would mean that existing data won't be copied but any new changes will be.
> A value of false actually seems to mean that nothing will happen with the table until the synchronize function is manually called. We seem to be using the word 'synchronize' in different sense in different places I find it confusing (ie synchronize_data and syncronize_structure in create_subscription).
>

False should mean exactly what you thought it would, will have to look
what's the issue there. Obviously UPDATEs or DELETEs won't really do
anything when there are no data but INSERTs should be replicated even
with false.

But I agree we need to define sychronized better, as we discussed we
also want to change status to replicated instead of synchronized. I am
btw thinking that default value for synchronizing schema should be false
in the create_subsription.

>
>
> *** a/contrib/pglogical/pglogical_sync.c
> --- b/contrib/pglogical/pglogical_sync.c
> + static void
> + dump_structure(PGLogicalSubscription *sub, const char *snapshot)
> + {
> + char pg_dump[MAXPGPATH];
> + uint32 version;
> + int res;
> + StringInfoData command;
> +
> + if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, &version, pg_dump))
> + elog(ERROR, "pglogical subscriber init failed to find pg_dump relative to binary %s",
> + my_exec_path);
> +
> + if (version / 100 != PG_VERSION_NUM / 100)
> + elog(ERROR, "pglogical subscriber init found pg_dump with wrong major version %d.%d, expected %d.%d",
> + version / 100 / 100, version / 100 % 100,
> + PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100);
> +
> + initStringInfo(&command);
> + #if PG_VERSION_NUM < 90500
> + appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -N pglogical_origin -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",
> + #else
> + appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",
>
> 1) I am not sure we can assume/require that the pg_dump binary be in the same location as the postgres binary. I don't know think we've ever required that client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as postgres. pg_upgrade does require this so maybe this isn't a problem in practice but I thought I'd point it out. Ideally wouldn't need to call an external program to get a schema dump but turning pg_dump into a library is beyond the scope of this patch.
>

Well for now I don't see that as big issue, especially given that the
pg_dump needs to be same version as the server. We can make it GUC if
needed but that's not something that seems problematic so far. I agree
ideal solution would be to have library but that's something that will
take much longer I am afraid.

>
> 2) I don't think we can hard-coded /tmp as the directory for the schema dump. I don't think will work on most windows systems and even on a unix system $TMPDIR might be set to something else. Maybe writing this into pgsql_tmp would be a better choice.
>

Yeah I turned that into GUC.

> Furtherdown in
> pglogical_sync_subscription(PGLogicalSubscription *sub)
> + switch (status)
> + {
> + /* Already synced, nothing to do except cleanup. */
> + case SYNC_STATUS_READY:
> + MemoryContextDelete(myctx);
> + return;
> + /* We can recover from crashes during these. */
> + case SYNC_STATUS_INIT:
> + case SYNC_STATUS_CATCHUP:
> + break;
> + default:
> + elog(ERROR,
> + "subscriber %s initialization failed during nonrecoverable step (%c), please try the setup again",
> + sub->name, status);
> + break;
> + }
>
> I think the default case needs to do something to unregister the background worker. We already discussed trying to get the error message to a user in a better way either way there isn't any sense in this background worker being launched again if the error is nonrecoverable.
>

Agreed, for this specific case we can actually pretty easily put the
error into some catalog and just disable the subscription.

>
> +
> + tables = copy_replication_sets_data(sub->origin_if->dsn,
> + sub->target_if->dsn,
> + snapshot,
> + sub->replication_sets);
> +
> + /* Store info about all the synchronized tables. */
> + StartTransactionCommand();
> + foreach (lc, tables)
>
> Shouldn't we be storing the info about the synchronized tables as part of the same transaction that does the sync?
>

Well that's complicated as we also have post copy stuff to do (creating
indexes and stuff), so far we wan to begin from beginning I think if the
table fails so we consider it unsynced until also post-data part is
done. But I think the initial sync needs a lot of work in general.

>
> I'll keeping going through the code as I have time. I think it is appropriate to move this to the next CF since the CF is past the end date and the patch has received some review. When you have an updated version of the patch post it, don't wait until March.
>

Sorry for not being very active in this thread, I really appreciate that
you take time to review this, I was just quite busy last few weeks (and
stolen laptop during business trip didn't help that much either). I
wasn't specifically waiting for March, but I have more WIP things
(privately) on this that I wanted to submit as a whole but not enough
time to get it to -hackers (one of those things is replica trigger
firing that you mentioned upthread). If you are interested I have the
"hackers preparation" branch at
https://github.com/2ndQuadrant/postgres/tree/dev/pglogical , it does not
have WIP stuff, mostly only things I am already happy with and it's what
I use for git format-patch for hackers submission.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services