| From: | Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com> | 
|---|---|
| To: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: WIP: Collecting statistics on CSV file data | 
| Date: | 2012-04-06 02:41:29 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
(2012/04/05 21:10), Shigeru HANADA wrote:
> file_fdw
> ========
> This patch contains a use case of new handler function in
> contrib/file_fdw.  Since file_fdw reads data from a flat file,
> fileAnalyzeForeignTable uses similar algorithm to ordinary tables;  it
> samples first N rows first, and replaces them randomly with subsequent
> rows.  Also file_fdw updates pg_class.relpages by calculating number of
> pages from size of the data file.
> 
> To allow FDWs to implement sampling argorighm like this, several
> functions are exported from analyze.c, e.g. random_fract,
> init_selection_state, and get_next_S.
Just after my post, Fujita-san posted another v7 patch[1], so I merged
v7 patches into v8 patch.
[1] http://archives.postgresql.org/pgsql-hackers/2012-04/msg00212.php
Changes taken from Fujita-san's patch
=====================================
* Remove reporting validrows and deadrows at the end of
acquire_sample_rows of file_fdw.  Thus, it doesn't validate NOT NULL
constraints any more.
* Improve get_rel_size of file_fdw, which is used in GetForeignRelSize,
to estimate current # of tuples of the foreign table from these values.
  - # of pages/tuples which are updated by last ANALYZE
  - current file size
Additional Changes
==================
* Fix memory leak in acquire_sample_rows which caused by calling
NextCopyFrom repeatedly in one long-span memory context.  I add
per-record temporary context and it's used during processing a record.
Main context is used to create heap tuples from sampled records, because
sample tuples must be valid after the function ends.
* Some cosmetic changes for document, e.g. remove line-break inside
tagged elements.
* Some cosmetic changes to make patch more readable by minimizing
difference from master branch.
Changes did *not* merged
========================
* Fujita-san moved document of AnalyzeForeignTable to the section
"Foreign Data Wrapper Helper Functions" from "Foreign Data Wrapper
Callback Routines".  But I think analyze handler is one of callback
functions, though it's optional.
Please find attached a patch.
Regards,
-- 
Shigeru HANADA
| Attachment | Content-Type | Size | 
|---|---|---|
| postgresql-analyze-v8.patch | text/plain | 47.2 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2012-04-06 03:25:39 | Re: Another review of URI for libpq, v7 submission | 
| Previous Message | Robert Haas | 2012-04-06 01:27:31 | Re: parallel dump/restore code on WIndows |