[Proposal] Global temporary tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
277 messages Options
123456 ... 14
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik


On 08.11.2019 18:06, 曾文旌(义从) wrote:
My comments for global_private_temp-4.patch

Thank you very much for inspecting my patch.

good side:
1 Lots of  index type on GTT. I think we need support for all kinds of indexes.
2 serial column on GTT.
3 INHERITS GTT.
4 PARTITION GTT.

I didn't choose to support them in the first release, but you did.

Other side:
1 case: create global temp table gtt2(a int primary key, b text) on commit delete rows;
I think you've lost the meaning of the on commit delete rows clause.
After the GTT is created, the other sessions feel that this is an on commit PRESERVE rows GTT.


Yes, there was bug in my implementation of ON COMMIT DELETE ROWS for GTT.
It is fixed in global_private_temp-6.patch

truncate gtt, mybe this is a bug in DropRelFileNodeBuffers.
GTT's local buffer is not released.
Case:
postgres=# insert into gtt2 values(1,'xx');
INSERT 0 1
postgres=# truncate gtt2;
TRUNCATE TABLE
postgres=# insert into gtt2 values(1,'xx');
ERROR:  unexpected data beyond EOF in block 0 of relation base/13579/t3_16384
HINT:  This has been seen to occur with buggy kernels; consider updating your system.


Yes another bug, also fixed in new version of the patch.

3  lock type of truncate GTT.
I don't think it's a good idea to hold a big lock with truncate GTT, because it only needs to process private data.

Sorry, I do not understand which lock you are talking about.
I have not introduced any special locks for GTT.

4 GTT's ddl Those ddl that need to rewrite data files may need attention.
We have discussed in the previous email. This is why I used shared hash to track the GTT file.


You are right.
But instead of prohibiting ALTER TABLE at all for GTT, we can check
that there are no other backends using it.
I do not think that we should maintain some hash in shared memory to check it.
As far as ALTER TABLE is rare and slow operation in any case, we can just check presence of GTT files
created by other backends.
I have implemented this check in global_private_temp-6.patch



5 There will be problems with DDL that will change relfilenode. Such as cluster GTT ,vacuum full GTT.
A session completes vacuum full gtt(a), and other sessions will immediately start reading and writing new storage files and existing data is also lost.
I disable them in my current version.

Thank you for noticing it.
Autovacuum full should really be prohibited for GTT.


6 drop GTT
I think drop GTT should clean up all storage files and definitions. How do you think?

Storage files will be cleaned in any case on backend termination.
Certainly if backend creates  and deletes huge number of GTT in the loop, it can cause space exhaustion.
But it seems to be very strange pattern of GTT usage.



7 MVCC visibility clog clean
GTT data visibility rules, like regular tables, so GTT also need clog.
We need to avoid the clog that GTT needs to be cleaned up. 
At the same time, GTT does not do autovacuum, and retaining "too old data" will cause wraparound data loss.
I have given a solution in my design.

But why do we need some special handling of visibility rules for GTT comparing with normal (local) temp tables?
Them are also not proceeded by autovacuum?

In principle, I have also implemented special visibility rules for GTT, but only for the case when them
are accessed at replica. And it is not included in this patch, because everybody think that access to GTT
replica should be considered in separate patch.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
Hi,

I think we need to do something with having two patches aiming to add
global temporary tables:

[1] https://commitfest.postgresql.org/26/2349/

[2] https://commitfest.postgresql.org/26/2233/

As a reviewer I have no idea which of the threads to look at - certainly
not without reading both threads, which I doubt anyone will really do.
The reviews and discussions are somewhat intermixed between those two
threads, which makes it even more confusing.

I think we should agree on a minimal patch combining the necessary/good
bits from the various patches, and terminate one of the threads (i.e.
mark it as rejected or RWF). And we need to do that now, otherwise
there's about 0% chance of getting this into v13.

In general, I agree with the sentiment Rober expressed in [1] - the
patch needs to be as small as possible, not adding "nice to have"
features (like support for parallel queries - I very much doubt just
using shared instead of local buffers is enough to make it work.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

曾文旌(义从)
In the previous communication

1 we agreed on the general direction
1.1 gtt use local (private) buffer
1.2 no replica access in first version

2 We feel that gtt needs to maintain statistics, but there is no agreement on what it will be done.

3 Still no one commented on GTT's transaction information processing, they include
3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data

I suggest we discuss further, reach an agreement, and merge the two patches to one.


Wenjing


> 2020年1月6日 上午4:06,Tomas Vondra <[hidden email]> 写道:
>
> Hi,
>
> I think we need to do something with having two patches aiming to add
> global temporary tables:
>
> [1] https://commitfest.postgresql.org/26/2349/
>
> [2] https://commitfest.postgresql.org/26/2233/
>
> As a reviewer I have no idea which of the threads to look at - certainly
> not without reading both threads, which I doubt anyone will really do.
> The reviews and discussions are somewhat intermixed between those two
> threads, which makes it even more confusing.
>
> I think we should agree on a minimal patch combining the necessary/good
> bits from the various patches, and terminate one of the threads (i.e.
> mark it as rejected or RWF). And we need to do that now, otherwise
> there's about 0% chance of getting this into v13.
>
> In general, I agree with the sentiment Rober expressed in [1] - the
> patch needs to be as small as possible, not adding "nice to have"
> features (like support for parallel queries - I very much doubt just
> using shared instead of local buffers is enough to make it work.)
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>In the previous communication
>
>1 we agreed on the general direction
>1.1 gtt use local (private) buffer
>1.2 no replica access in first version
>

OK, good.

>2 We feel that gtt needs to maintain statistics, but there is no
>agreement on what it will be done.
>

I certainly agree GTT needs to maintain statistics, otherwise it'll lead
to poor query plans. AFAIK the current patch stores the info in a hash
table in a backend private memory, and I don't see how else to do that
(e.g. storing it in a catalog would cause catalog bloat).

FWIW this is a reasons why I think just using shared buffers (instead of
local ones) is not sufficient to support parallel queriesl as proposed
by Alexander. The workers would not know the stats, breaking planning of
queries in PARALLEL SAFE plpgsql functions etc.

>3 Still no one commented on GTT's transaction information processing, they include
>3.1 Should gtt's frozenxid need to be care?
>3.2 gtt’s clog clean
>3.3 How to deal with "too old" gtt data
>

No idea what to do about this.

>I suggest we discuss further, reach an agreement, and merge the two patches to one.
>

OK, cool. Thanks for the clarification.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Dean Rasheed-3
On Mon, 6 Jan 2020 at 11:01, Tomas Vondra <[hidden email]> wrote:
>
> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>
> >2 We feel that gtt needs to maintain statistics, but there is no
> >agreement on what it will be done.
> >
>
> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
> to poor query plans.

+1

> AFAIK the current patch stores the info in a hash
> table in a backend private memory, and I don't see how else to do that
> (e.g. storing it in a catalog would cause catalog bloat).
>

It sounds like it needs a pair of system GTTs to hold the table and
column statistics for other GTTs. One would probably have the same
columns as pg_statistic, and the other just the relevant columns from
pg_class. I can see it being useful for the user to be able to see
these stats, so perhaps they could be UNIONed into the existing stats
view.

Regards,
Dean


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
On Mon, Jan 06, 2020 at 12:17:43PM +0000, Dean Rasheed wrote:

>On Mon, 6 Jan 2020 at 11:01, Tomas Vondra <[hidden email]> wrote:
>>
>> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>>
>> >2 We feel that gtt needs to maintain statistics, but there is no
>> >agreement on what it will be done.
>> >
>>
>> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
>> to poor query plans.
>
>+1
>
>> AFAIK the current patch stores the info in a hash
>> table in a backend private memory, and I don't see how else to do that
>> (e.g. storing it in a catalog would cause catalog bloat).
>>
>
>It sounds like it needs a pair of system GTTs to hold the table and
>column statistics for other GTTs. One would probably have the same
>columns as pg_statistic, and the other just the relevant columns from
>pg_class. I can see it being useful for the user to be able to see
>these stats, so perhaps they could be UNIONed into the existing stats
>view.
>

Hmmm, yeah. A "temporary catalog" (not sure if it can work exactly the
same as GTT) storing pg_statistics data for GTTs might work, I think. It
would not have the catalog bloat issue, which is good.

I still think we'd need to integrate this with the regular pg_statistic
catalogs somehow, so that people don't have to care about two things. I
mean, extensions like hypopg do use pg_statistic data to propose indexes
etc. and it would be nice if we don't make them more complicated.

Not sure why we'd need a temporary version of pg_class, though?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Pavel Stehule
In reply to this post by Dean Rasheed-3


po 6. 1. 2020 v 13:17 odesílatel Dean Rasheed <[hidden email]> napsal:
On Mon, 6 Jan 2020 at 11:01, Tomas Vondra <[hidden email]> wrote:
>
> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>
> >2 We feel that gtt needs to maintain statistics, but there is no
> >agreement on what it will be done.
> >
>
> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
> to poor query plans.

+1

> AFAIK the current patch stores the info in a hash
> table in a backend private memory, and I don't see how else to do that
> (e.g. storing it in a catalog would cause catalog bloat).
>

It sounds like it needs a pair of system GTTs to hold the table and
column statistics for other GTTs. One would probably have the same
columns as pg_statistic, and the other just the relevant columns from
pg_class. I can see it being useful for the user to be able to see
these stats, so perhaps they could be UNIONed into the existing stats
view.

+1

Pavel


Regards,
Dean
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

曾文旌(义从)
In reply to this post by Dean Rasheed-3


2020年1月6日 下午8:17,Dean Rasheed <[hidden email]> 写道:

On Mon, 6 Jan 2020 at 11:01, Tomas Vondra <[hidden email]> wrote:

On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:

2 We feel that gtt needs to maintain statistics, but there is no
agreement on what it will be done.


I certainly agree GTT needs to maintain statistics, otherwise it'll lead
to poor query plans.

+1

AFAIK the current patch stores the info in a hash
table in a backend private memory, and I don't see how else to do that
(e.g. storing it in a catalog would cause catalog bloat).


It sounds like it needs a pair of system GTTs to hold the table and
column statistics for other GTTs. One would probably have the same
columns as pg_statistic, and the other just the relevant columns from
pg_class. I can see it being useful for the user to be able to see
these stats, so perhaps they could be UNIONed into the existing stats
view.
The current patch provides several functions as extension(pg_gtt) for read gtt statistics. 
Next I can move them to the kernel and let the view pg_stats can see gttstatistics.


Regards,
Dean

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik
In reply to this post by 曾文旌(义从)


On 06.01.2020 8:04, 曾文旌(义从) wrote:

> In the previous communication
>
> 1 we agreed on the general direction
> 1.1 gtt use local (private) buffer
> 1.2 no replica access in first version
>
> 2 We feel that gtt needs to maintain statistics, but there is no agreement on what it will be done.
>
> 3 Still no one commented on GTT's transaction information processing, they include
> 3.1 Should gtt's frozenxid need to be care?
> 3.2 gtt’s clog clean
> 3.3 How to deal with "too old" gtt data
>
> I suggest we discuss further, reach an agreement, and merge the two patches to one.
>

I also hope that we should come to the common solution for GTT.
If we do not try to address parallel execution issues and access to temp
tables at replicas (and I agreed
that it should be avoided in first version of the patch), then GTT patch
becomes quite small.

The most complex and challenged task is to support GTT for all kind of
indexes. Unfortunately I can not proposed some good universal solution
for it.
Just patching all existed indexes implementation seems to be the only
choice.

Statistic is another important case.
But once again I do not completely understand why we want to address all
this issues with statistic in first version of the patch? It contradicts
to the idea to make this patch as small as possible.
Also it seems to me that everybody agreed that users very rarely create
indexes for temp tables and explicitly analyze them.
So I think GTT will be useful even with limited support of statistic. In
my version statistics for GTT is provided by pushing correspondent
information to backend's cache for pg_statistic table.
Also I provided pg_temp_statistic view for inspecting it by users. The
idea to make pg_statistic a view which combines statistic of normal and
temporary tables is overkill from my point of view.

I do not understand why do we need to maintain hash with some extra
information for GTT in backends memory (as it was done in Wenjing patch).
Also idea to use create extension for accessing this information seems
to be dubious.

--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik
In reply to this post by Tomas Vondra-4


On 06.01.2020 14:01, Tomas Vondra wrote:

> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>> In the previous communication
>>
>> 1 we agreed on the general direction
>> 1.1 gtt use local (private) buffer
>> 1.2 no replica access in first version
>>
>
> OK, good.
>
>> 2 We feel that gtt needs to maintain statistics, but there is no
>> agreement on what it will be done.
>>
>
> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
> to poor query plans. AFAIK the current patch stores the info in a hash
> table in a backend private memory, and I don't see how else to do that
> (e.g. storing it in a catalog would cause catalog bloat).
>
> FWIW this is a reasons why I think just using shared buffers (instead of
> local ones) is not sufficient to support parallel queriesl as proposed
> by Alexander. The workers would not know the stats, breaking planning of
> queries in PARALLEL SAFE plpgsql functions etc.


I do not think that "all or nothing" approach is so good for software
development as for database transactions.
Yes, if we have function in PL/pgSQL which performs queries om temporary
tables, then
parallel workers may build inefficient plan for this queries due to lack
of statistics.
 From my point of view this is not a pitfall of GTT but result of lack
of global plan cache in Postgres. And it should be fixed not at GTT level.

Also I never see real use cases with such functions, even in the systems
which using hard temporary tables and stored procedures.
But there are many other real problems with temp tables  (except already
mentioned in this thread).
In PgPro/EE we have fixes for some of them, for example:

1. Do not reserve space in the file for temp relations. Right now append
of relation cause writing zero page to the disk by mdextend.
It cause useless disk IO for temp tables which in most cases fit in
memory and should not be written at disk.

2. Implicitly perform analyze of temp table intermediately after storing
data in it. Usually tables are analyzed by autovacuum in background.
But it doesn't work for temp tables which are not processes by
autovacuum and are accessed immediately after filling them with data and
lack of statistic  may cause
building very inefficient plan. We have online_analyze extension which
force analyze of the table after appending some bulk of data to it.
It can be used for normal table but most of all it is useful for temp
relations.

Unlike hypothetical example with parallel safe function working with
temp tables,
this are real problems observed by some of our customers.
Them are applicable both to local and global temp tables and this is why
I do not want to discuss them in context of GTT.


>
>> 3 Still no one commented on GTT's transaction information processing,
>> they include
>> 3.1 Should gtt's frozenxid need to be care?
>> 3.2 gtt’s clog clean
>> 3.3 How to deal with "too old" gtt data
>>
>
> No idea what to do about this.
>

I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
On Thu, Jan 09, 2020 at 06:07:46PM +0300, Konstantin Knizhnik wrote:

>
>
>On 06.01.2020 14:01, Tomas Vondra wrote:
>>On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>>>In the previous communication
>>>
>>>1 we agreed on the general direction 1.1 gtt use local (private)
>>>buffer 1.2 no replica access in first version
>>>
>>
>>OK, good.
>>
>>>2 We feel that gtt needs to maintain statistics, but there is no
>>>agreement on what it will be done.
>>>
>>
>>I certainly agree GTT needs to maintain statistics, otherwise it'll
>>lead to poor query plans. AFAIK the current patch stores the info in a
>>hash table in a backend private memory, and I don't see how else to do
>>that (e.g. storing it in a catalog would cause catalog bloat).
>>
>>FWIW this is a reasons why I think just using shared buffers (instead
>>of local ones) is not sufficient to support parallel queriesl as
>>proposed by Alexander. The workers would not know the stats, breaking
>>planning of queries in PARALLEL SAFE plpgsql functions etc.
>
>
>I do not think that "all or nothing" approach is so good for software
>development as for database transactions.

Well, sure. I'm not saying we need to have a perfect solution in v1. I'm
saying if we have two choices:

(1) Use shared buffers even if it means the parallel query plan may be
     arbitrarily bad.

(2) Use private buffers, even if it means no parallel queries with temp
     tables.

Then I'm voting for (2) because it's less likely to break down. I can
imagine allowing parallel queries with GTT when there's no risk of
having to plan in the worker, but that's not there yet.

If we can come up with a reasonable solution for the parallel case, we
can enable it later.

>Yes, if we have function in PL/pgSQL which performs queries om
>temporary tables, then
>parallel workers may build inefficient plan for this queries due to
>lack of statistics.

IMHO that's a pretty awful deficiency, because it essentially means
users may need to disable parallelism for such queries. Which means
we'll get complaints from users, and we'll have to come up with some
sort of solution. I'd rather not be in that position.

>From my point of view this is not a pitfall of GTT but result of lack
>of global plan cache in Postgres. And it should be fixed not at GTT
>level.
>

That doesn't give us free pass to just ignore the issue. Even if it
really was due to a lack of global plan cache, the fact is we don't have
that feature, so we have a problem. I mean, if you need infrastructure
that is not available, you either have to implement that infrastructure
or make it work properly without it.

>Also I never see real use cases with such functions, even in the
>systems which using hard temporary tables and stored procedures.
>But there are many other real problems with temp tables  (except
>already mentioned in this thread).

Oh, I'm sure there are pretty large plpgsql applications, and I'd be
surprised if at least some of those were not affected. And I'm sure
there are apps using UDF to do all sorts of stuff (e.g. I wonder if
PostGIS would have this issue - IIRC it's using SPI etc.).

The question is whether we should consider existing apps affected,
because they are using the regular temporary tables and not GTT. So
unless they switch to GTT there is no regression ...

But even in that case I don't think it's a good idea to accept this as
an acceptable limitation. I admit one of the reasons why I think that
may be that statistics and planning are my areas of interest, so I'm not
quite willing to accept incomplete stuff as OK.

>In PgPro/EE we have fixes for some of them, for example:
>
>1. Do not reserve space in the file for temp relations. Right now
>append of relation cause writing zero page to the disk by mdextend.
>It cause useless disk IO for temp tables which in most cases fit in
>memory and should not be written at disk.
>
>2. Implicitly perform analyze of temp table intermediately after
>storing data in it. Usually tables are analyzed by autovacuum in
>background.
>But it doesn't work for temp tables which are not processes by
>autovacuum and are accessed immediately after filling them with data
>and lack of statistic  may cause
>building very inefficient plan. We have online_analyze extension which
>force analyze of the table after appending some bulk of data to it.
>It can be used for normal table but most of all it is useful for temp
>relations.
>
>Unlike hypothetical example with parallel safe function working with
>temp tables,
>this are real problems observed by some of our customers.
>Them are applicable both to local and global temp tables and this is
>why I do not want to discuss them in context of GTT.
>

I think those are both interesting issues worth fixing, but I don't
think it makes the issue discussed here less important.

>
>>
>>>3 Still no one commented on GTT's transaction information
>>>processing, they include
>>>3.1 Should gtt's frozenxid need to be care?
>>>3.2 gtt’s clog clean
>>>3.3 How to deal with "too old" gtt data
>>>
>>
>>No idea what to do about this.
>>
>
>I wonder what is the specific of GTT here?
>The same problem takes place for normal (local) temp tables, doesn't it?
>

Not sure. TBH I'm not sure I understand what the issue actually is.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
In reply to this post by konstantin knizhnik
On Thu, Jan 09, 2020 at 02:17:08PM +0300, Konstantin Knizhnik wrote:

>
>
>On 06.01.2020 8:04, 曾文旌(义从) wrote:
>>In the previous communication
>>
>>1 we agreed on the general direction
>>1.1 gtt use local (private) buffer
>>1.2 no replica access in first version
>>
>>2 We feel that gtt needs to maintain statistics, but there is no agreement on what it will be done.
>>
>>3 Still no one commented on GTT's transaction information processing, they include
>>3.1 Should gtt's frozenxid need to be care?
>>3.2 gtt’s clog clean
>>3.3 How to deal with "too old" gtt data
>>
>>I suggest we discuss further, reach an agreement, and merge the two patches to one.
>>
>
>I also hope that we should come to the common solution for GTT.
>If we do not try to address parallel execution issues and access to
>temp tables at replicas (and I agreed
>that it should be avoided in first version of the patch), then GTT
>patch becomes quite small.
>

Well, that was kinda my goal - making the patch as small as possible by
eliminating bits that are contentious or where we don't know the
solution (like planning for parallel queries).

>The most complex and challenged task is to support GTT for all kind of
>indexes. Unfortunately I can not proposed some good universal solution
>for it.
>Just patching all existed indexes implementation seems to be the only
>choice.
>

I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.

Can't we track which indexes a particular session sees, somehow?

>Statistic is another important case.
>But once again I do not completely understand why we want to address
>all this issues with statistic in first version of the patch?

I think the question is which "issues with statistic" you mean. I'm sure
we can ignore some of them, e.g. the one with parallel workers not
having any stats (assuming we consider functions using GTT to be
parallel restricted).

>It contradicts to the idea to make this patch as small as possible.

Well, there's "making patch as small as possible" vs. "patch behaving
correctly" trade-off ;-)

>Also it seems to me that everybody agreed that users very rarely
>create indexes for temp tables and explicitly analyze them.

I certainly *disagree* with this.

We often see temporary tables as a fix or misestimates in complex
queries, and/or as a replacement for CTEs with statistics/indexes. In
fact it's a pretty valuable tool when helping customers with complex
queries affected by poor estimates.

>So I think GTT will be useful even with limited support of statistic.
>In my version statistics for GTT is provided by pushing correspondent
>information to backend's cache for pg_statistic table.

I think someone pointed out pushing stuff directly into the cache is
rather problematic, but I don't recall the details.

>Also I provided pg_temp_statistic view for inspecting it by users. The
>idea to make pg_statistic a view which combines statistic of normal
>and temporary tables is overkill from my point of view.
>
>I do not understand why do we need to maintain hash with some extra
>information for GTT in backends memory (as it was done in Wenjing
>patch).
>Also idea to use create extension for accessing this information seems
>to be dubious.
>

I think the extension was more a PoC rather than a final solution.


regards
--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik


On 09.01.2020 19:48, Tomas Vondra wrote:

>
>> The most complex and challenged task is to support GTT for all kind
>> of indexes. Unfortunately I can not proposed some good universal
>> solution for it.
>> Just patching all existed indexes implementation seems to be the only
>> choice.
>>
>
> I haven't looked at the indexing issue closely, but IMO we need to
> ensure that every session sees/uses only indexes on GTT that were
> defined before the seesion started using the table.

Why? It contradicts with behavior of normal tables.
Assume that you have active clients and at some point of time DBA
recognizes that them are spending to much time in scanning some GTT.
It cab create index for this GTT but if existed client will not be able
to use this index, then we need somehow make this clients to restart
their sessions?
In my patch I have implemented building indexes for GTT on demand: if
accessed index on GTT is not yet initialized, then it is filled with
local data.

>
> Can't we track which indexes a particular session sees, somehow?
>
>> Statistic is another important case.
>> But once again I do not completely understand why we want to address
>> all this issues with statistic in first version of the patch?
>
> I think the question is which "issues with statistic" you mean. I'm sure
> we can ignore some of them, e.g. the one with parallel workers not
> having any stats (assuming we consider functions using GTT to be
> parallel restricted).

If we do not use shared buffers for GTT then parallel processing of GTT
is not possible at all, so there is no problem with statistic for
parallel workers.

>
> I think someone pointed out pushing stuff directly into the cache is
> rather problematic, but I don't recall the details.
>
I have not encountered any problems, so if you can point me on what is
wrong with this approach, I will think about alternative solution.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik
In reply to this post by Tomas Vondra-4


On 09.01.2020 19:30, Tomas Vondra wrote:





3 Still no one commented on GTT's transaction information processing, they include
3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data


No idea what to do about this.


I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?


Not sure. TBH I'm not sure I understand what the issue actually is.

Just open session, create temporary table and insert some data in it.
Then in other session run 2^31 transactions (at my desktop it takes about 2 hours).
As far as temp tables are not proceeded by vacuum, database is stalled:

 ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"

It seems to be quite dubious behavior and it is strange to me that nobody complains about it.
We discuss  many issues related with temp tables (statistic, parallel queries,...) which seems to be less critical.

But this problem is not specific to GTT - it can be reproduced with normal (local) temp tables.
This is why I wonder why do we need to solve it in GTT patch.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

曾文旌(义从)
In reply to this post by Tomas Vondra-4
Hi all

This is the latest patch

The updates are as follows:
1. Support global temp Inherit table global temp partition table
2. Support serial column in GTT
3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
4. Provide view pg_gtt_attached_pids to manage GTT
5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
6. Alter GTT or rename GTT is allowed under some conditions


Please give me feedback.

Wenjing






2020年1月6日 上午4:06,Tomas Vondra <[hidden email]> 写道:

Hi,

I think we need to do something with having two patches aiming to add
global temporary tables:

[1] https://commitfest.postgresql.org/26/2349/

[2] https://commitfest.postgresql.org/26/2233/

As a reviewer I have no idea which of the threads to look at - certainly
not without reading both threads, which I doubt anyone will really do.
The reviews and discussions are somewhat intermixed between those two
threads, which makes it even more confusing.

I think we should agree on a minimal patch combining the necessary/good
bits from the various patches, and terminate one of the threads (i.e.
mark it as rejected or RWF). And we need to do that now, otherwise
there's about 0% chance of getting this into v13.

In general, I agree with the sentiment Rober expressed in [1] - the
patch needs to be as small as possible, not adding "nice to have"
features (like support for parallel queries - I very much doubt just
using shared instead of local buffers is enough to make it work.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


global_temporary_table_v3-pg13.patch (155K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Pavel Stehule
Hi

so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从) <[hidden email]> napsal:
Hi all

This is the latest patch

The updates are as follows:
1. Support global temp Inherit table global temp partition table
2. Support serial column in GTT
3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
4. Provide view pg_gtt_attached_pids to manage GTT
5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
6. Alter GTT or rename GTT is allowed under some conditions


Please give me feedback.

I tested the functionality

1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local temp tables).

I tested some simple scripts

test01.sql

CREATE TEMP TABLE foo(a int, b int);
INSERT INTO foo SELECT random()*100, random()*1000 FROM generate_series(1,1000);
ANALYZE foo;
SELECT sum(a), sum(b) FROM foo;
DROP TABLE foo; -- simulate disconnect


after 100 sec, the table pg_attribute has 3.2MB
and 64 tps, 6446 transaction

test02.sql

INSERT INTO foo SELECT random()*100, random()*1000 FROM generate_series(1,1000);
ANALYZE foo;
SELECT sum(a), sum(b) FROM foo;
DELETE FROM foo; -- simulate disconnect


after 100 sec, 1688 tps, 168830 transactions

So performance is absolutely different as we expected.

From my perspective, this functionality is great.

Todo:

pg_table_size function doesn't work

Regards

Pavel


Wenjing





2020年1月6日 上午4:06,Tomas Vondra <[hidden email]> 写道:

Hi,

I think we need to do something with having two patches aiming to add
global temporary tables:

[1] https://commitfest.postgresql.org/26/2349/

[2] https://commitfest.postgresql.org/26/2233/

As a reviewer I have no idea which of the threads to look at - certainly
not without reading both threads, which I doubt anyone will really do.
The reviews and discussions are somewhat intermixed between those two
threads, which makes it even more confusing.

I think we should agree on a minimal patch combining the necessary/good
bits from the various patches, and terminate one of the threads (i.e.
mark it as rejected or RWF). And we need to do that now, otherwise
there's about 0% chance of getting this into v13.

In general, I agree with the sentiment Rober expressed in [1] - the
patch needs to be as small as possible, not adding "nice to have"
features (like support for parallel queries - I very much doubt just
using shared instead of local buffers is enough to make it work.)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
In reply to this post by konstantin knizhnik
On Fri, Jan 10, 2020 at 03:24:34PM +0300, Konstantin Knizhnik wrote:

>
>
>On 09.01.2020 19:30, Tomas Vondra wrote:
>
>
>>
>>>
>>>>
>>>>>3 Still no one commented on GTT's transaction information
>>>>>processing, they include
>>>>>3.1 Should gtt's frozenxid need to be care?
>>>>>3.2 gtt’s clog clean
>>>>>3.3 How to deal with "too old" gtt data
>>>>>
>>>>
>>>>No idea what to do about this.
>>>>
>>>
>>>I wonder what is the specific of GTT here?
>>>The same problem takes place for normal (local) temp tables, doesn't it?
>>>
>>
>>Not sure. TBH I'm not sure I understand what the issue actually is.
>
>Just open session, create temporary table and insert some data in it.
>Then in other session run 2^31 transactions (at my desktop it takes
>about 2 hours).
>As far as temp tables are not proceeded by vacuum, database is stalled:
>
> ERROR:  database is not accepting commands to avoid wraparound data
>loss in database "postgres"
>
>It seems to be quite dubious behavior and it is strange to me that
>nobody complains about it.
>We discuss  many issues related with temp tables (statistic, parallel
>queries,...) which seems to be less critical.
>
>But this problem is not specific to GTT - it can be reproduced with
>normal (local) temp tables.
>This is why I wonder why do we need to solve it in GTT patch.
>

Yeah, I think that's out of scope for GTT patch. Once we solve it for
plain temporary tables, we'll solve it for GTT too.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
In reply to this post by konstantin knizhnik
On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:

>
>
>On 09.01.2020 19:48, Tomas Vondra wrote:
>>
>>>The most complex and challenged task is to support GTT for all
>>>kind of indexes. Unfortunately I can not proposed some good
>>>universal solution for it.
>>>Just patching all existed indexes implementation seems to be the
>>>only choice.
>>>
>>
>>I haven't looked at the indexing issue closely, but IMO we need to
>>ensure that every session sees/uses only indexes on GTT that were
>>defined before the seesion started using the table.
>
>Why? It contradicts with behavior of normal tables.
>Assume that you have active clients and at some point of time DBA
>recognizes that them are spending to much time in scanning some GTT.
>It cab create index for this GTT but if existed client will not be
>able to use this index, then we need somehow make this clients to
>restart their sessions?
>In my patch I have implemented building indexes for GTT on demand: if
>accessed index on GTT is not yet initialized, then it is filled with
>local data.

Yes, I know the behavior would be different from behavior for regular
tables. And yes, it would not allow fixing slow queries in sessions
without interrupting those sessions.

I proposed just ignoring those new indexes because it seems much simpler
than alternative solutions that I can think of, and it's not like those
other solutions don't have other issues.

For example, I've looked at the "on demand" building as implemented in
global_private_temp-8.patch, I kinda doubt adding a bunch of index build
calls into various places in index code seems somewht suspicious.

* brinbuild is added to brinRevmapInitialize, which is meant to
   initialize state for scanning. It seems wrong to build the index we're
   scanning from this function (layering and all that).

* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?

... and so on for other index types. Also, what about custom indexes
implemented in extensions? It seems a bit strange each of them has to
support this separately.

IMHO if this really is the right solution, we need to make it work for
existing indexes without having to tweak them individually. Why don't we
track a flag whether an index on GTT was initialized in a given session,
and if it was not then call the build function before calling any other
function from the index AM?

But let's talk about other issues caused by "on demand" build. Imagine
you have 50 sessions, each using the same GTT with a GB of per-session
data. Now you create a new index on the GTT, which forces the sessions
to build it's "local" index. Those builds will use maintenance_work_mem
each, so 50 * m_w_m. I doubt that's expected/sensible.

So I suggest we start by just ignoring the *new* indexes, and improve
this in the future (by building the indexes on demand or whatever).

>>
>>Can't we track which indexes a particular session sees, somehow?
>>
>>>Statistic is another important case.
>>>But once again I do not completely understand why we want to
>>>address all this issues with statistic in first version of the
>>>patch?
>>
>>I think the question is which "issues with statistic" you mean. I'm sure
>>we can ignore some of them, e.g. the one with parallel workers not
>>having any stats (assuming we consider functions using GTT to be
>>parallel restricted).
>
>If we do not use shared buffers for GTT then parallel processing of
>GTT is not possible at all, so there is no problem with statistic for
>parallel workers.
>

Right.

>>
>>I think someone pointed out pushing stuff directly into the cache is
>>rather problematic, but I don't recall the details.
>>
>I have not encountered any problems, so if you can point me on what is
>wrong with this approach, I will think about alternative solution.
>

I meant this comment by Robert:

https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

konstantin knizhnik


On 12.01.2020 4:51, Tomas Vondra wrote:

> On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>>
>>
>> On 09.01.2020 19:48, Tomas Vondra wrote:
>>>
>>>> The most complex and challenged task is to support GTT for all kind
>>>> of indexes. Unfortunately I can not proposed some good universal
>>>> solution for it.
>>>> Just patching all existed indexes implementation seems to be the
>>>> only choice.
>>>>
>>>
>>> I haven't looked at the indexing issue closely, but IMO we need to
>>> ensure that every session sees/uses only indexes on GTT that were
>>> defined before the seesion started using the table.
>>
>> Why? It contradicts with behavior of normal tables.
>> Assume that you have active clients and at some point of time DBA
>> recognizes that them are spending to much time in scanning some GTT.
>> It cab create index for this GTT but if existed client will not be
>> able to use this index, then we need somehow make this clients to
>> restart their sessions?
>> In my patch I have implemented building indexes for GTT on demand: if
>> accessed index on GTT is not yet initialized, then it is filled with
>> local data.
>
> Yes, I know the behavior would be different from behavior for regular
> tables. And yes, it would not allow fixing slow queries in sessions
> without interrupting those sessions.
>
> I proposed just ignoring those new indexes because it seems much simpler
> than alternative solutions that I can think of, and it's not like those
> other solutions don't have other issues.

Quit opposite: prohibiting sessions to see indexes created before
session start to use GTT requires more efforts. We need to somehow
maintain and check GTT first access time.

>
> For example, I've looked at the "on demand" building as implemented in
> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
> calls into various places in index code seems somewht suspicious.

We in any case has to initialize GTT indexes on demand even if we
prohibit usages of indexes created after first access by session to GTT.
So the difference is only in one thing: should we just initialize empty
index or populate it with local data (if rules for index usability are
the same for GTT as for normal tables).
 From implementation point of view there is no big difference. Actually
building index in standard way is even simpler than constructing empty
index. Originally I have implemented
first approach (I just forgot to consider case when GTT was already user
by a session). Then I rewrited it using second approach and patch even
became simpler.

>
> * brinbuild is added to brinRevmapInitialize, which is meant to
>   initialize state for scanning. It seems wrong to build the index we're
>   scanning from this function (layering and all that).
>
> * btbuild is called from _bt_getbuf. That seems a bit ... suspicious?


As I already mentioned - support of indexes for GTT is one of the most
challenged things in my patch.
I didn't find good and universal solution. So I agreed that call of
btbuild from _bt_getbuf may be considered as suspicious.
I will be pleased if you or sombody else can propose better elternative
and not only for B-Tree, but for all other indexes.

But as I already wrote above, prohibiting session to used indexes
created after first access to GTT doesn't solve the problem.
For normal tables (and for local temp tables) indexes are initialized at
the time of their creation.
With GTT it doesn't work, because each session has its own local data of
GTT.
We should either initialize/build index on demand (when it is first
accessed), either at the moment of session start initialize indexes for
all existed GTTs.
Last options seem to be much worser from my point of view: there may me
huge number of GTT and session may not need to access GTT at all.
>
> ... and so on for other index types. Also, what about custom indexes
> implemented in extensions? It seems a bit strange each of them has to
> support this separately.

I have already complained about it: my patch supports GTT for all
built-in indexes, but custom indexes has to handle it themselves.
Looks like to provide some generic solution we need to extend index API,
providing two diffrent operations: creation and initialization.
But extending index API is very critical change... And also it doesn't
solve the problem with all existed extensions: them in any case have
to be rewritten to implement new API version in order to support GTT.

>
> IMHO if this really is the right solution, we need to make it work for
> existing indexes without having to tweak them individually. Why don't we
> track a flag whether an index on GTT was initialized in a given session,
> and if it was not then call the build function before calling any other
> function from the index AM?
> But let's talk about other issues caused by "on demand" build. Imagine
> you have 50 sessions, each using the same GTT with a GB of per-session
> data. Now you create a new index on the GTT, which forces the sessions
> to build it's "local" index. Those builds will use maintenance_work_mem
> each, so 50 * m_w_m. I doubt that's expected/sensible.

I do not see principle difference here with scenario when 50 sessions
create (local) temp table,
populate it with GB of data and create index for it.

>
> So I suggest we start by just ignoring the *new* indexes, and improve
> this in the future (by building the indexes on demand or whatever).

Sorry, but still do not agree with this suggestions:
- it doesn't simplify things
- it makes behavior of GTT incompatible with normal tables.
- it doesn't prevent some bad or unexpected behavior which can't be
currently reproduced with normal (local) temp tables.

>
>>>
>>> I think someone pointed out pushing stuff directly into the cache is
>>> rather problematic, but I don't recall the details.
>>>
>> I have not encountered any problems, so if you can point me on what
>> is wrong with this approach, I will think about alternative solution.
>>
>
> I meant this comment by Robert:
>
> https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com 
>
>
"if any code tried to access the statistics directly from the table,
rather than via the caches".

Currently optimizer is accessing statistic though caches. So this
approach works. If somebody will rewrite optimizer or provide own custom
optimizer in extension which access statistic directly
then it we really be a problem. But I wonder why bypassing catalog cache
may be needed.

Moreover, if we implement alternative solution - for example make
pg_statistic a view which combines results for normal tables and GTT,
then existed optimizer has to be rewritten
because it can not access statistic in the way it is doing now. And
there will be all problem with all existed extensions which are
accessing statistic in most natural way - through system cache.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Global temporary tables

Tomas Vondra-4
On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote:

>
>
>On 12.01.2020 4:51, Tomas Vondra wrote:
>>On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:
>>>
>>>
>>>On 09.01.2020 19:48, Tomas Vondra wrote:
>>>>
>>>>>The most complex and challenged task is to support GTT for all
>>>>>kind of indexes. Unfortunately I can not proposed some good
>>>>>universal solution for it.
>>>>>Just patching all existed indexes implementation seems to be
>>>>>the only choice.
>>>>>
>>>>
>>>>I haven't looked at the indexing issue closely, but IMO we need to
>>>>ensure that every session sees/uses only indexes on GTT that were
>>>>defined before the seesion started using the table.
>>>
>>>Why? It contradicts with behavior of normal tables.
>>>Assume that you have active clients and at some point of time DBA
>>>recognizes that them are spending to much time in scanning some
>>>GTT.
>>>It cab create index for this GTT but if existed client will not be
>>>able to use this index, then we need somehow make this clients to
>>>restart their sessions?
>>>In my patch I have implemented building indexes for GTT on demand:
>>>if accessed index on GTT is not yet initialized, then it is filled
>>>with local data.
>>
>>Yes, I know the behavior would be different from behavior for regular
>>tables. And yes, it would not allow fixing slow queries in sessions
>>without interrupting those sessions.
>>
>>I proposed just ignoring those new indexes because it seems much simpler
>>than alternative solutions that I can think of, and it's not like those
>>other solutions don't have other issues.
>
>Quit opposite: prohibiting sessions to see indexes created before
>session start to use GTT requires more efforts. We need to somehow
>maintain and check GTT first access time.
>

Hmmm, OK. I'd expect such check to be much simpler than the on-demand
index building, but I admit I haven't tried implementing either of those
options.

>>
>>For example, I've looked at the "on demand" building as implemented in
>>global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>>calls into various places in index code seems somewht suspicious.
>
>We in any case has to initialize GTT indexes on demand even if we
>prohibit usages of indexes created after first access by session to
>GTT.
>So the difference is only in one thing: should we just initialize
>empty index or populate it with local data (if rules for index
>usability are the same for GTT as for normal tables).
>From implementation point of view there is no big difference. Actually
>building index in standard way is even simpler than constructing empty
>index. Originally I have implemented
>first approach (I just forgot to consider case when GTT was already
>user by a session). Then I rewrited it using second approach and patch
>even became simpler.
>
>>
>>* brinbuild is added to brinRevmapInitialize, which is meant to
>>  initialize state for scanning. It seems wrong to build the index we're
>>  scanning from this function (layering and all that).
>>
>>* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?
>
>
>As I already mentioned - support of indexes for GTT is one of the most
>challenged things in my patch.
>I didn't find good and universal solution. So I agreed that call of
>btbuild from _bt_getbuf may be considered as suspicious.
>I will be pleased if you or sombody else can propose better
>elternative and not only for B-Tree, but for all other indexes.
>
>But as I already wrote above, prohibiting session to used indexes
>created after first access to GTT doesn't solve the problem.
>For normal tables (and for local temp tables) indexes are initialized
>at the time of their creation.
>With GTT it doesn't work, because each session has its own local data
>of GTT.
>We should either initialize/build index on demand (when it is first
>accessed), either at the moment of session start initialize indexes
>for all existed GTTs.
>Last options seem to be much worser from my point of view: there may
>me huge number of GTT and session may not need to access GTT at all.
>>
>>... and so on for other index types. Also, what about custom indexes
>>implemented in extensions? It seems a bit strange each of them has to
>>support this separately.
>
>I have already complained about it: my patch supports GTT for all
>built-in indexes, but custom indexes has to handle it themselves.
>Looks like to provide some generic solution we need to extend index
>API, providing two diffrent operations: creation and initialization.
>But extending index API is very critical change... And also it doesn't
>solve the problem with all existed extensions: them in any case have
>to be rewritten to implement new API version in order to support GTT.
>

Why not to allow creating only indexes implementing this new API method
(on GTT)?

>>
>>IMHO if this really is the right solution, we need to make it work for
>>existing indexes without having to tweak them individually. Why don't we
>>track a flag whether an index on GTT was initialized in a given session,
>>and if it was not then call the build function before calling any other
>>function from the index AM?
>>But let's talk about other issues caused by "on demand" build. Imagine
>>you have 50 sessions, each using the same GTT with a GB of per-session
>>data. Now you create a new index on the GTT, which forces the sessions
>>to build it's "local" index. Those builds will use maintenance_work_mem
>>each, so 50 * m_w_m. I doubt that's expected/sensible.
>
>I do not see principle difference here with scenario when 50 sessions
>create (local) temp table,
>populate it with GB of data and create index for it.
>

I'd say the high memory consumption is pretty significant.

>>
>>So I suggest we start by just ignoring the *new* indexes, and improve
>>this in the future (by building the indexes on demand or whatever).
>
>Sorry, but still do not agree with this suggestions:
>- it doesn't simplify things
>- it makes behavior of GTT incompatible with normal tables.
>- it doesn't prevent some bad or unexpected behavior which can't be
>currently reproduced with normal (local) temp tables.
>
>>
>>>>
>>>>I think someone pointed out pushing stuff directly into the cache is
>>>>rather problematic, but I don't recall the details.
>>>>
>>>I have not encountered any problems, so if you can point me on
>>>what is wrong with this approach, I will think about alternative
>>>solution.
>>>
>>
>>I meant this comment by Robert:
>>
>>https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com
>>
>>
>"if any code tried to access the statistics directly from the table,
>rather than via the caches".
>
>Currently optimizer is accessing statistic though caches. So this
>approach works. If somebody will rewrite optimizer or provide own
>custom optimizer in extension which access statistic directly
>then it we really be a problem. But I wonder why bypassing catalog
>cache may be needed.
>

I don't know, but it seems extensions like hypopg do it.

>Moreover, if we implement alternative solution - for example make
>pg_statistic a view which combines results for normal tables and GTT,
>then existed optimizer has to be rewritten
>because it can not access statistic in the way it is doing now. And
>there will be all problem with all existed extensions which are
>accessing statistic in most natural way - through system cache.
>

Perhaps. I don't know enough about this part of the code to have a
strong opinion.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


123456 ... 14