Tom Lane [Wed, 16 Nov 2011 23:21:34 +0000 (18:21 -0500)]
Code review for range-types catalog entries.
Fix assorted infelicities, such as dependency on OIDs that aren't
hardwired, as well as outright misdeclaration of daterange_canonical(),
which resulted in crashes if you invoked it directly. Add some more
regression tests to try to catch similar mistakes in future.
Robert Haas [Wed, 16 Nov 2011 01:34:47 +0000 (20:34 -0500)]
Don't elide blank lines when accumulating psql command history.
This can change the meaning of queries, if the blank line happens to
occur in the middle of a quoted literal, as per complaint from Tomas Vondra.
Back-patch to all supported branches.
Tom Lane [Tue, 15 Nov 2011 20:47:51 +0000 (15:47 -0500)]
Improve caching in range type I/O functions.
Cache the the element type's I/O info across calls, not only the range
type's info. In passing, also clean up hash_range a bit more.
Tom Lane [Tue, 15 Nov 2011 18:05:45 +0000 (13:05 -0500)]
Restructure function-internal caching in the range type code.
Move the responsibility for caching specialized information about range
types into the type cache, so that the catalog lookups only have to occur
once per session. Rearrange APIs a bit so that fn_extra caching is
actually effective in the GiST support code. (Use of OidFunctionCallN is
bad enough for performance in itself, but it also prevents the function
from exploiting fn_extra caching.)
The range I/O functions are still not very bright about caching repeated
lookups, but that seems like material for a separate patch.
Also, avoid unnecessary use of memcpy to fetch/store the range type OID and
flags, and don't use the full range_deserialize machinery when all we need
to see is the flags value.
Also fix API error in range_gist_penalty --- it was failing to set *penalty
for any case involving an empty range.
Tom Lane [Tue, 15 Nov 2011 02:42:04 +0000 (21:42 -0500)]
Fix alignment and toasting bugs in range types.
A range type whose element type has 'd' alignment must have 'd' alignment
itself, else there is no guarantee that the element value can be used
in-place. (Because range_deserialize uses att_align_pointer which forcibly
aligns the given pointer, violations of this rule did not lead to SIGBUS
but rather to garbage data being extracted, as in one of the added
regression test cases.)
Also, you can't put a toast pointer inside a range datum, since the
referenced value could disappear with the range datum still present.
For consistency with the handling of arrays and records, I also forced
decompression of in-line-compressed bound values. It would work to store
them as-is, but our policy is to avoid situations that might result in
double compression.
Add assorted regression tests for this, and bump catversion because of
fixes to built-in pg_type entries.
Also some marginal cleanup of inconsistent/unnecessary error checks.
Tom Lane [Tue, 15 Nov 2011 01:28:38 +0000 (20:28 -0500)]
Update oidjoins regression test to match git HEAD.
This is mostly to add some sanity checking for the pg_range catalog.
Tom Lane [Mon, 14 Nov 2011 20:34:39 +0000 (15:34 -0500)]
Return NULL instead of throwing error when desired bound is not available.
Change range_lower and range_upper to return NULL rather than throwing an
error when the input range is empty or the relevant bound is infinite. Per
discussion, throwing an error seems likely to be unduly hard to work with.
Also, this is more consistent with the behavior of the constructors, which
treat NULL as meaning an infinite bound.
Tom Lane [Mon, 14 Nov 2011 20:15:53 +0000 (15:15 -0500)]
Return FALSE instead of throwing error for comparisons with empty ranges.
Change range_before, range_after, range_adjacent to return false rather
than throwing an error when one or both input ranges are empty.
The original definition is unnecessarily difficult to use, and also can
result in undesirable planner failures since the planner could try to
compare an empty range to something else while deriving statistical
estimates. (This was, in fact, the cause of repeatable regression test
failures on buildfarm member jaguar, as well as intermittent failures
elsewhere.)
Also tweak rangetypes regression test to not drop all the objects it
creates, so that the final state of the regression database contains
some rangetype objects for pg_dump testing.
Tom Lane [Mon, 14 Nov 2011 18:59:34 +0000 (13:59 -0500)]
Fix copyright notices, other minor editing in new range-types code.
No functional changes in this commit (except I could not resist the
temptation to re-word a couple of error messages). This is just manual
cleanup after pgindent to make the code look reasonably like other PG
code, in preparation for more detailed code review to come.
Bruce Momjian [Mon, 14 Nov 2011 17:12:23 +0000 (12:12 -0500)]
Rerun pgindent with updated typedef list.
Bruce Momjian [Mon, 14 Nov 2011 17:08:48 +0000 (12:08 -0500)]
Run pgindent on range type files, per request from Tom.
Michael Meskes [Sun, 13 Nov 2011 12:46:45 +0000 (13:46 +0100)]
Applied patch by Zoltan to fix copy&paste bug in ecpg's sqlda handling.
Simon Riggs [Sun, 13 Nov 2011 09:00:57 +0000 (09:00 +0000)]
Wakeup WALWriter as needed for asynchronous commit performance.
Previously we waited for wal_writer_delay before flushing WAL. Now
we also wake WALWriter as soon as a WAL buffer page has filled.
Significant effect observed on performance of asynchronous commits
by Robert Haas, attributed to the ability to set hint bits on tuples
earlier and so reducing contention caused by clog lookups.
Tom Lane [Sat, 12 Nov 2011 23:49:09 +0000 (18:49 -0500)]
In plpgsql, allow foreign tables to define row types.
This seems to have been just an oversight in previous foreign-table work.
A quick grep didn't turn up any other places where RELKIND_FOREIGN_TABLE
was obviously omitted.
One change noted by Alexander Soudakov, the other by me.
Back-patch to 9.1.
Peter Eisentraut [Sat, 12 Nov 2011 15:03:10 +0000 (17:03 +0200)]
Add psql expanded auto mode
This adds the "auto" option to the \x command, which switches to the
expanded mode when the normal output would be wider than the screen.
reviewed by Noah Misch
Robert Haas [Sat, 12 Nov 2011 06:22:45 +0000 (01:22 -0500)]
Avoid retaining multiple relation locks in RangeVarGetRelid.
If it turns out we've locked the wrong OID, release the old lock. In
most cases, it's pretty harmless to retain the extra lock, but this
seems tidier and avoids using lock table slots unnecessarily.
Per discussion with Tom Lane.
Robert Haas [Sat, 12 Nov 2011 04:33:44 +0000 (23:33 -0500)]
Fix psql's \dd version check for operator families.
Report and patch by Josh Kupershmidt; comment revisions by me.
Tom Lane [Thu, 10 Nov 2011 23:36:49 +0000 (18:36 -0500)]
Throw nice error if server is too old to support psql's \ef or \sf command.
Previously, you'd get "function pg_catalog.pg_get_functiondef(integer) does
not exist", which is at best rather unprofessional-looking. Back-patch
to 8.4 where \ef was introduced.
Josh Kupershmidt
Robert Haas [Thu, 10 Nov 2011 23:00:34 +0000 (18:00 -0500)]
Correct documentation for trace_userlocks.
Robert Haas [Thu, 10 Nov 2011 22:54:27 +0000 (17:54 -0500)]
Revert removal of trace_userlocks, because userlocks aren't gone.
This reverts commit
0180bd6180511875db046bf8ddcaa633a2952dfd.
contrib/userlock is gone, but user-level locking still exists,
and is exposed via the pg_advisory* family of functions.
Tom Lane [Thu, 10 Nov 2011 21:08:14 +0000 (16:08 -0500)]
Avoid platform-dependent infinite loop in pg_dump.
If malloc(0) returns NULL, the binary search in findSecLabels() will
probably go into an infinite loop when there are no security labels,
because NULL-1 is greater than NULL after wraparound.
(We've seen this pathology before ... I wonder whether there's a way to
detect the class of bugs automatically?)
Diagnosis and patch by Steve Singer, cosmetic adjustments by me
Peter Eisentraut [Thu, 10 Nov 2011 18:52:54 +0000 (20:52 +0200)]
Fix server header file installation with vpath builds
Several server header files would not be installed in vpath builds
because they live in the build directory.
Bruce Momjian [Thu, 10 Nov 2011 18:00:44 +0000 (13:00 -0500)]
Document that PQexec() can handle a NULL res pointer just fine.
Backpatch to 9.1.
Mark Hills
Heikki Linnakangas [Thu, 10 Nov 2011 10:09:33 +0000 (12:09 +0200)]
Fix another bug in the redo of COPY batches.
I got alignment wrong in the redo routine. Spotted by redoing the log
genereated by copy regression test.
Peter Eisentraut [Wed, 9 Nov 2011 19:43:04 +0000 (21:43 +0200)]
Only install the extension files for the current Python major version
Heikki Linnakangas [Wed, 9 Nov 2011 19:24:26 +0000 (21:24 +0200)]
Fix bugs in the COPY heap-insert batching patch.
Forgot to call RestoreBkpBlocks() in the redo-function, as pointed out by
Simon Riggs. In redo of a regular heap insert, it's taken care of in
heap_redo(), but this new record type uses the heap2 RM, and heap2_redo()
does not take care of that for you.
Also, failed to reset the vmbuffer and all_visibile_cleared local variables
after switching to a new buffer.
Peter Eisentraut [Wed, 9 Nov 2011 18:56:19 +0000 (20:56 +0200)]
Clean gettext-files file in clean target
It used to be cleaned in maintainer-clean, but that is inconsistent
with other cleaning of NLS files in nls-global.mk, and it's also wrong
overall, because it's not part of the distribution tarball, which is
the base definition of the maintainer-clean target.
Robert Haas [Wed, 9 Nov 2011 16:14:50 +0000 (11:14 -0500)]
Fix compiler warning.
Heikki Linnakangas [Wed, 9 Nov 2011 08:54:41 +0000 (10:54 +0200)]
In COPY, insert tuples to the heap in batches.
This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.
Tom Lane [Wed, 9 Nov 2011 05:13:37 +0000 (00:13 -0500)]
Tweak new regression test case for more portability.
Ensure that same index gets selected on 32-bit and 64-bit machines.
Per buildfarm results.
Tom Lane [Wed, 9 Nov 2011 04:05:14 +0000 (23:05 -0500)]
Fix random discrepancies between parallel_schedule and serial_schedule.
In particular, my previous patch expected the create_index test to run
before the inherit test; but this was only true in the serial schedule.
Rearrange this portion of the schedules to be more consistent.
Per buildfarm results.
Tom Lane [Wed, 9 Nov 2011 02:14:21 +0000 (21:14 -0500)]
Wrap appendrel member outputs in PlaceHolderVars in additional cases.
Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output
expressions appear non-constant and distinct from each other. This makes
the world safe for add_child_rel_equivalences to do what it does. Before,
it was possible for that function to add identical expressions to different
EquivalenceClasses, which logically should imply merging such ECs, which
would be wrong; or to improperly add a constant to an EquivalenceClass,
drastically changing its behavior. Per report from Teodor Sigaev.
The only currently known consequence of this bug is "MergeAppend child's
targetlist doesn't match MergeAppend" planner failures in 9.1 and later.
I am suspicious that there may be other failure modes that could affect
older release branches; but in the absence of any hard evidence, I'll
refrain from back-patching further than 9.1.
Heikki Linnakangas [Tue, 8 Nov 2011 20:39:43 +0000 (22:39 +0200)]
Make DatumGetInetP() unpack inet datums with a 1-byte header, and add
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.
Backpatch to 8.3, where 1-byte header varlenas were introduced.
Robert Haas [Tue, 8 Nov 2011 13:11:25 +0000 (08:11 -0500)]
Rewrite comment for slightly greater accuracy.
Per an observation from Thom Brown that the old version contained a typo.
Robert Haas [Tue, 8 Nov 2011 13:07:21 +0000 (08:07 -0500)]
Fix hstore regression tests.
This was an oversight in commit
b60653bc0b75b7f3b5dda0a2968a22129aafb2b2.
Also, fix a typo spotted by Thom Brown.
Heikki Linnakangas [Tue, 8 Nov 2011 07:40:37 +0000 (09:40 +0200)]
Adjust range type docs for some last-minute changes I made to the patch.
non_empty(anyrange) function was removed, empty(anyrange) was renamed to
isempty(anyrange), and !? operators were removed.
Peter Eisentraut [Tue, 8 Nov 2011 04:49:50 +0000 (06:49 +0200)]
-DLINUX_OOM_ADJ=0 should be in CPPFLAGS, not CFLAGS
Robert Haas [Tue, 8 Nov 2011 02:47:45 +0000 (21:47 -0500)]
Remove hstore's text => text operator.
Since PostgreSQL 9.0, we've emitted a warning message when an operator
named => is created, because the SQL standard now reserves that token
for another use. But we've also shipped such an operator with hstore.
Use of the function hstore(text, text) has been recommended in
preference to =>(text, text). Per discussion, it's now time to take
the next step and stop shipping the operator. This will allow us to
prohibit the use of => as an operator name in a future release if and
when we wish to support the SQL standard use of this token.
The release notes should mention this incompatibility.
Patch by me, reviewed by David Wheeler, Dimitri Fontaine and Tom Lane.
Robert Haas [Tue, 8 Nov 2011 02:39:40 +0000 (21:39 -0500)]
Make VACUUM avoid waiting for a cleanup lock, where possible.
In a regular VACUUM, it's OK to skip pages for which a cleanup lock
isn't immediately available; the next VACUUM will deal with them. If
we're scanning the entire relation to advance relfrozenxid, we might
need to wait, but only if there are tuples on the page that actually
require freezing. These changes should greatly reduce the incidence
of of vacuum processes getting "stuck".
Simon Riggs and Robert Haas
Robert Haas [Mon, 7 Nov 2011 17:27:26 +0000 (12:27 -0500)]
Minor grammar improvements.
Tom Lane [Mon, 7 Nov 2011 16:48:53 +0000 (11:48 -0500)]
Fix assorted bugs in contrib/unaccent's configuration file parsing.
Make it use t_isspace() to identify whitespace, rather than relying on
sscanf which is known to get it wrong on some platform/locale combinations.
Get rid of fixed-size buffers. Make it actually continue to parse the file
after ignoring a line with untranslatable characters, as was obviously
intended.
The first of these issues is per gripe from J Smith, though not exactly
either of his proposed patches.
Heikki Linnakangas [Mon, 7 Nov 2011 15:33:31 +0000 (17:33 +0200)]
Fix timestamp range subdiff functions, when using float datetimes.
Tom Lane [Mon, 7 Nov 2011 01:12:20 +0000 (20:12 -0500)]
On second thought, we'd better just drop these tests altogether.
Further experimentation reveals that my previous change didn't fix the
issue entirely: these tests would still fail at the spring-forward DST
transition. There doesn't seem to be any great value in testing this
specific issue for both timestamp and timestamptz, so just lose the
latter tests.
Tom Lane [Sun, 6 Nov 2011 23:20:26 +0000 (18:20 -0500)]
Un-break horology regression test.
Adjust ill-considered timezone-dependent tests added in commit
8a3d33c8e6c681d512f79af4a521ee0c02befcef so that they won't fail on DST
transition days. Per all-pink buildfarm.
Heikki Linnakangas [Sun, 6 Nov 2011 12:33:49 +0000 (14:33 +0200)]
Oops, forgot to fix the catversion when I committed the range types patch.
It was inadvertently changed to
201111111, which is a wrong date. Change it
to current date, and remove the comment that was supposed to remind me to
fix it before committing.
Magnus Hagander [Sat, 5 Nov 2011 15:00:23 +0000 (16:00 +0100)]
Update regression tests for \d+ modification
Noted by Tom
Magnus Hagander [Sat, 5 Nov 2011 12:02:48 +0000 (13:02 +0100)]
Show statistics target for columns in \d+ on a table
Magnus Hagander [Sat, 5 Nov 2011 11:54:58 +0000 (12:54 +0100)]
Make psql \d on a sequence show the table/column owning it
Tom Lane [Sat, 5 Nov 2011 03:22:50 +0000 (23:22 -0400)]
Don't assume that a tuple's header size is unchanged during toasting.
This assumption can be wrong when the toaster is passed a raw on-disk
tuple, because the tuple might pre-date an ALTER TABLE ADD COLUMN operation
that added columns without rewriting the table. In such a case the tuple's
natts value is smaller than what we expect from the tuple descriptor, and
so its t_hoff value could be smaller too. In fact, the tuple might not
have a null bitmap at all, and yet our current opinion of it is that it
contains some trailing nulls.
In such a situation, toast_insert_or_update did the wrong thing, because
to save a few lines of code it would use the old t_hoff value as the offset
where heap_fill_tuple should start filling data. This did not leave enough
room for the new nulls bitmap, with the result that the first few bytes of
data could be overwritten with null flag bits, as in a recent report from
Hubert Depesz Lubaczewski.
The particular case reported requires ALTER TABLE ADD COLUMN followed by
CREATE TABLE AS SELECT * FROM ... or INSERT ... SELECT * FROM ..., and
further requires that there be some out-of-line toasted fields in one of
the tuples to be copied; else we'll not reach the troublesome code.
The problem can only manifest in this form in 8.4 and later, because
before commit
a77eaa6a95009a3441e0d475d1980259d45da072, CREATE TABLE AS or
INSERT/SELECT wouldn't result in raw disk tuples getting passed directly
to heap_insert --- there would always have been at least a junkfilter in
between, and that would reconstitute the tuple header with an up-to-date
t_natts and hence t_hoff. But I'm backpatching the tuptoaster change all
the way anyway, because I'm not convinced there are no older code paths
that present a similar risk.
Peter Eisentraut [Fri, 4 Nov 2011 20:01:35 +0000 (22:01 +0200)]
Fix archive_command example
The given archive_command example didn't use %p or %f, which wouldn't
really work in practice.
Peter Eisentraut [Fri, 4 Nov 2011 19:52:37 +0000 (21:52 +0200)]
Add note about using GNU tar warning options for base backups
Magnus Hagander [Fri, 4 Nov 2011 14:57:43 +0000 (15:57 +0100)]
Add missing space in comment
Robert Haas [Fri, 4 Nov 2011 14:40:52 +0000 (10:40 -0400)]
Silence bogus compiler warning.
Robert Haas [Fri, 4 Nov 2011 14:40:25 +0000 (10:40 -0400)]
Check the return value of getcwd(), instead of assuming success.
Kevin Grittner
Simon Riggs [Fri, 4 Nov 2011 09:37:17 +0000 (09:37 +0000)]
Move user functions related to WAL into xlogfuncs.c
Alvaro Herrera [Fri, 4 Nov 2011 01:54:58 +0000 (23:54 -0200)]
Unbreak isolationtester on Win32
I broke it in a previous commit because I neglected to install the
necessary incantations to have getopt() work on Windows.
Per red blots in buildfarm.
Tom Lane [Thu, 3 Nov 2011 23:17:48 +0000 (19:17 -0400)]
Fix bogus code in contrib/ tsearch dictionary examples.
Both dict_int and dict_xsyn were blithely assuming that whatever memory
palloc gives back will be pre-zeroed. This would typically work for
just about long enough to run their regression tests, and no longer :-(.
The pre-9.0 code in dict_xsyn was even lamer than that, as it would
happily give back a pointer to the result of palloc(0), encouraging
its caller to access off the end of memory. Again, this would just
barely fail to fail as long as memory contained nothing but zeroes.
Per a report from Rodrigo Hjort that code based on these examples
didn't work reliably.
Tom Lane [Thu, 3 Nov 2011 22:47:28 +0000 (18:47 -0400)]
Improve comments for TSLexeme data structure.
Mostly, clean up long-ago pgindent damage.
Tom Lane [Thu, 3 Nov 2011 21:53:13 +0000 (17:53 -0400)]
Fix inline_set_returning_function() to allow multiple OUT parameters.
inline_set_returning_function failed to distinguish functions returning
generic RECORD (which require a column list in the RTE, as well as run-time
type checking) from those with multiple OUT parameters (which do not).
This prevented inlining from happening. Per complaint from Jay Levitt.
Back-patch to 8.4 where this capability was introduced.
Andrew Dunstan [Thu, 3 Nov 2011 20:29:41 +0000 (16:29 -0400)]
Role membership of superusers is only by explicit membership for HBA.
Document that this rule applies to 'samerole' as well as to named roles.
Per gripe from Tom Lane.
Bruce Momjian [Thu, 3 Nov 2011 17:56:56 +0000 (13:56 -0400)]
Adjust pg_upgrade "new database skip" code, e.g. 'postgres', to more
cleanly handle old/new database mismatches.
Alvaro Herrera [Thu, 3 Nov 2011 17:06:08 +0000 (15:06 -0200)]
Implement a dry-run mode for isolationtester
This mode prints out the permutations that would be run by the given
spec file, in the same format used by the permutation lines in spec
files. This helps in building new spec files.
Author: Alexander Shulgin, with some tweaks by me
Andrew Dunstan [Thu, 3 Nov 2011 16:45:02 +0000 (12:45 -0400)]
Do not treat a superuser as a member of every role for HBA purposes.
This makes it possible to use reject lines with group roles.
Andrew Dunstan, reviewd by Robert Haas.
Magnus Hagander [Thu, 3 Nov 2011 14:43:25 +0000 (15:43 +0100)]
Properly close replication connection in pg_receivexlog
Magnus Hagander [Thu, 3 Nov 2011 14:37:08 +0000 (15:37 +0100)]
Pre-pad WAL files when streaming transaction log
Instead of filling files as they appear, pre-pad the
WAL files received when streaming xlog the same way
that the server does. Data is streamed into a .partial
file which is then renamed()d into palce when it's complete,
but it will always be 16MB.
This also means that the starting position for pg_receivexlog
is now simply right after the last complete segment, and we
never need to deal with partial segments there.
Patch by me, review by Fujii Masao
Heikki Linnakangas [Thu, 3 Nov 2011 11:16:28 +0000 (13:16 +0200)]
Support range data types.
Selectivity estimation functions are missing for some range type operators,
which is a TODO.
Jeff Davis
Simon Riggs [Thu, 3 Nov 2011 08:52:20 +0000 (08:52 +0000)]
Improve docs for timing and skipping of checkpoints
Greg Smith
Tom Lane [Thu, 3 Nov 2011 04:50:58 +0000 (00:50 -0400)]
Fix handling of PlaceHolderVars in nestloop parameter management.
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation. The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
Tom Lane [Wed, 2 Nov 2011 23:35:48 +0000 (19:35 -0400)]
Avoid scanning nulls at the beginning of a btree index scan.
If we have an inequality key that constrains the other end of the index,
it doesn't directly help us in doing the initial positioning ... but it
does imply a NOT NULL constraint on the index column. If the index stores
nulls at this end, we can use the implied NOT NULL condition for initial
positioning, just as if it had been stated explicitly. This avoids wasting
time when there are a lot of nulls in the column. This is the reverse of
the examples given in bugs #6278 and #6283, which were about failing to
stop early when we encounter nulls at the end of the indexscan.
Tom Lane [Wed, 2 Nov 2011 21:53:49 +0000 (17:53 -0400)]
Fix btree stop-at-nulls logic properly.
As pointed out by Naoya Anzai, my previous try at this was a few bricks
shy of a load, because I had forgotten that the initial-positioning logic
might not try to skip over nulls at the end of the index the scan will
start from. We ought to fix that, because it represents an unnecessary
inefficiency, but first let's get the scan-stop logic back to a safe
state. With this patch, we preserve the performance benefit requested
in bug #6278 for the case of scanning forward into NULLs (in a NULLS
LAST index), but the reverse case of scanning backward across NULLs
when there's no suitable initial-positioning qual is still inefficient.
Simon Riggs [Wed, 2 Nov 2011 17:15:35 +0000 (17:15 +0000)]
Update more comments about checkpoints being done by bgwriter
Simon Riggs [Wed, 2 Nov 2011 15:26:33 +0000 (15:26 +0000)]
Reduce checkpoints and WAL traffic on low activity database server
Previously, we skipped a checkpoint if no WAL had been written since
last checkpoint, though this does not appear in user documentation.
As of now, we skip a checkpoint until we have written at least one
enough WAL to switch the next WAL file. This greatly reduces the
level of activity and number of WAL messages generated by a very
low activity server. This is safe because the purpose of a checkpoint
is to act as a starting place for a recovery, in case of crash.
This patch maintains minimal WAL volume for replay in case of crash,
thus maintaining very low crash recovery time.
Simon Riggs [Wed, 2 Nov 2011 14:25:01 +0000 (14:25 +0000)]
Refactor xlog.c to create src/backend/postmaster/startup.c
Startup process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
Simon Riggs [Wed, 2 Nov 2011 08:54:56 +0000 (08:54 +0000)]
Derive oldestActiveXid at correct time for Hot Standby.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.
Bug report by Chris Redekop
Simon Riggs [Wed, 2 Nov 2011 08:47:43 +0000 (08:47 +0000)]
Start Hot Standby faster when initial snapshot is incomplete.
If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.
Bug report by Chris Redekop
Simon Riggs [Wed, 2 Nov 2011 08:37:52 +0000 (08:37 +0000)]
Remove spurious entry from missed catch while patch juggling
Simon Riggs [Wed, 2 Nov 2011 08:07:44 +0000 (08:07 +0000)]
Fix timing of Startup CLOG and MultiXact during Hot Standby
Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
Robert Haas [Wed, 2 Nov 2011 02:44:54 +0000 (22:44 -0400)]
Initialize myProcLocks queues just once, at postmaster startup.
In assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup that
we are inheriting empty queues. In non-assert enabled builds, we just
save a few cycles.
Tom Lane [Wed, 2 Nov 2011 02:13:11 +0000 (22:13 -0400)]
Preserve Var location information during flatten_join_alias_vars.
This allows us to give correct syntax error pointers when complaining
about ungrouped variables in a join query with aggregates or GROUP BY.
It's pretty much irrelevant for the planner's use of the function, though
perhaps it might aid debugging sometimes.
Tom Lane [Tue, 1 Nov 2011 23:48:37 +0000 (19:48 -0400)]
Fix race condition with toast table access from a stale syscache entry.
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.
The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.
The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.
Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
Peter Eisentraut [Tue, 1 Nov 2011 19:50:00 +0000 (21:50 +0200)]
Clean up whitespace and indentation in parser and scanner files
These are not touched by pgindent, so clean them up a bit manually.
Simon Riggs [Tue, 1 Nov 2011 18:48:47 +0000 (18:48 +0000)]
Comment changes to show bgwriter no longer performs checkpoints.
Simon Riggs [Tue, 1 Nov 2011 18:38:27 +0000 (18:38 +0000)]
Have checkpointer send stats once each processing loop.
Noted by Fujii Masao
Bruce Momjian [Tue, 1 Nov 2011 18:33:51 +0000 (14:33 -0400)]
Update pg_upgrade comment on missing 'postgres' database.
Simon Riggs [Tue, 1 Nov 2011 18:07:29 +0000 (18:07 +0000)]
Add new file for checkpointer.c
Bruce Momjian [Tue, 1 Nov 2011 17:49:03 +0000 (13:49 -0400)]
Allow pg_upgrade to upgrade an old cluster that doesn't have a
'postgres' database.
Simon Riggs [Tue, 1 Nov 2011 17:14:47 +0000 (17:14 +0000)]
Split work of bgwriter between 2 processes: bgwriter and checkpointer.
bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.
Patch by me, Review by Dickson Guedes
Magnus Hagander [Tue, 1 Nov 2011 14:44:26 +0000 (15:44 +0100)]
Document that multiple LDAP servers can be specified
Tom Lane [Mon, 31 Oct 2011 20:40:04 +0000 (16:40 -0400)]
Stop btree indexscans upon reaching nulls in either direction.
The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.
Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date. (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)
Tom Lane [Sun, 30 Oct 2011 19:02:58 +0000 (15:02 -0400)]
Support more locale-specific formatting options in cash_out().
The POSIX spec defines locale fields for controlling the ordering of the
value, sign, and currency symbol in monetary output, but cash_out only
supported a small subset of these options. Fully implement p/n_sign_posn,
p/n_cs_precedes, and p/n_sep_by_space per spec. Fix up cash_in so that
it will accept all these format variants.
Also, make sure that thousands_sep is only inserted to the left of the
decimal point, as required by spec.
Per bug #6144 from Eduard Kracmar and discussion of bug #6277. This patch
includes some ideas from Alexander Lakhin's proposed patch, though it is
very different in detail.
Tom Lane [Sun, 30 Oct 2011 16:21:28 +0000 (12:21 -0400)]
Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position. To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code. Remove unnecessary logic to
restore the old character value on failure. Additional comment and
formatting cleanup.
Robert Haas [Sat, 29 Oct 2011 18:45:39 +0000 (14:45 -0400)]
Update visibilitymap.c header comments.
Recent work on index-only scans left this somewhat out of date.
Tom Lane [Sat, 29 Oct 2011 18:30:55 +0000 (14:30 -0400)]
Fix assorted bogosities in cash_in() and cash_out().
cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law. In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign. Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output. Also, make cash_in handle
trailing negative signs, which formerly it would reject. Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug. IMO that
justifies patching this all the way back.
Robert Haas [Sat, 29 Oct 2011 18:22:20 +0000 (14:22 -0400)]
Improve make_greater_string() with encoding-specific incrementers.
This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.
Kyotaro Horiguchi, with various adjustments by me.
Bruce Momjian [Sat, 29 Oct 2011 01:18:36 +0000 (21:18 -0400)]
Remove pg_upgrade dependency on the 'postgres' database existing in the
new cluster. vacuumdb, used by pg_upgrade, still has this dependency.
Robert Haas [Fri, 28 Oct 2011 21:08:09 +0000 (17:08 -0400)]
Allow hint bits to be set sooner for temporary and unlogged tables.
We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway. Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15. In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.
Robert Haas [Fri, 28 Oct 2011 21:04:22 +0000 (17:04 -0400)]
Demote some sanity checks in BufferIsValid() to assertions.
Testing reveals that this macro is a hot-spot for index-only-scans.
Per discussion with Tom Lane.
Robert Haas [Fri, 28 Oct 2011 19:45:28 +0000 (15:45 -0400)]
Remove hard-coded "\connect postgres" from pg_dumpall.
This doesn't appear to accompish anything useful, and does make the
restore fail if the postgres database happens to have been dropped.
Tom Lane [Fri, 28 Oct 2011 19:16:40 +0000 (15:16 -0400)]
De-parallelize ecpg build some more.
Make sure ecpg/include/ is rebuilt before the other subdirectories,
so that ecpg_config.h is up to date. This is not likely to matter
during production builds, only development, so no back-patch.
Robert Haas [Fri, 28 Oct 2011 16:02:04 +0000 (12:02 -0400)]
Clarify that ORDER BY/FOR UPDATE can't malfunction at higher iso levels.
Kevin Grittner