Global temporary tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
I have added support of all indexes (brin, btree, gin, gist, hash,
spgist) for global temp tables (before only B-Tree index was supported).
It will be nice to have some generic mechanism for it, but I do not
understand how it can look like.
The problem is that normal relations are initialized at the moment of
their creation.
But for global temp relations metadata already exists while data is
absent. We should somehow catch such access to not initialized page (but
not not all pages, but just first page of relation)
and perform initialization on demand.

New patch for global temp tables with shared buffers is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_shared_temp_replica-2.patch (117K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Pavel Stehule
In reply to this post by konstantin knizhnik


st 18. 9. 2019 v 12:04 odesílatel Konstantin Knizhnik <[hidden email]> napsal:


On 21.08.2019 11:54, Konstantin Knizhnik wrote:


On 20.08.2019 20:01, Pavel Stehule wrote:
Another solution is wait on ZHeap storage and replica can to have own UNDO log.

I thought about implementation of special table access method for temporary tables.

+1
 

Unfortunately implementing special table access method for temporary tables doesn't solve all problems.
XID generation is not part of table access methods.
So we still need to assign some XID to write transaction at replica which will not conflict with XIDs received from master.
Actually only global temp tables can be updated at replica and so assigned XIDs can be stored only in tuples of such relations.
But still I am not sure that we can use arbitrary XID for such transactions at replica.

Also I upset by amount of functionality which has to be reimplemented for global temp tables if we really want to provide access method for them:

1. CLOG
2. vacuum
3. MVCC visibility

And still it is not possible to encapsulate all changes need to support writes to temp tables at replica inside table access method.
XID assignment, transaction commit and abort, subtransactions - all this places need to be patched.


I was able to fully support work with global temp tables at replica (including subtransactions).
The patch is attached. Also you can find this version in https://github.com/postgrespro/postgresql.builtin_pool/tree/global_temp_hot

Right now transactions at replica updating global temp table are assigned special kind of GIDs which are not related with XIDs received from master.
So special visibility rules are used for such tables at replica. Also I have to patch TransactionIdIsInProgress, TransactionIdDidCommit, TransactionIdGetCurrent
functions to correctly handle such XIDs. In principle it is possible to implement global temp tables as special heap access method. But it will require copying a lot of code (heapam.c)
so I prefer to add few checks to existed functions.

There are still some limitations:
- Number of transactions at replica which update temp tables is limited by 2^32 (wraparound problem is not addressed).
- I have to maintain in-memory analog of CLOG for such transactions which is also not cropped. It means that for 2^32 transaction size of bitmap can grow up to  0.5Gb.

I try to understand what are the following steps in global temp tables support.
This is why I want to perform short survey - what people are expecting from global temp tables:

1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).

There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.

I missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.

Pavel


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 20.09.2019 19:43, Pavel Stehule wrote:

1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).

There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.

Yes, I already heard this argument and agree with it.
I just want to notice two things:
1. My assumption is that in most cases data of temporary table can fit in memory (certainly if we are not limiting them by temp_buffers = 8MB, but store in shared buffers) and so there is on need to write them to the persistent media at all.
2. Global temp tables do not substitute local temp tables, accessed through local buffers. So if you want to use temporary storage, you will always have a way to do it.
The question is whether we need to support two kinds of global temp tables (with shared or private buffers) or just implement one of them.


I missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.

Sorry, I do not completely understand your point here
You can use normal (permanent) table and you will not have to create them again and again. It is also possible to use them for storing temporary data - just need to truncate table when data is not needed any more.
Certainly you can not use the same table in more than one backend. Here is the main advantage of temp tables - you can have storage of per-session data and do not worry about possible name conflicts.

From the other side: there are many cases where format of temporary data is not statically known: it is determined dynamically during program execution.
In this case local temp table provides the most convenient mechanism for working with such data.

This is why I think that ewe need to have both local and global temp tables.

Also I do not agree with your statement "If you reduce @2 to zero, then @5 should be possible without any other".
In the solution implemented by Aleksander Alekseev metadata of temporary tables is kept in memory and not affecting catalog at all.
But them still can not be used at replica.
There are still some serious problems which need to be fixed to able it:
allow insert/update/delete statements for read-only transactions, somehow assign XIDs for them, implement savepoints and rollback of such transactions.
All this was done in the last version of my patch.
Yes, it doesn't depend on whether we are using shared or private buffers for temporary tables. The same approach can be implemented for both of them.
The question is whether we are really need temp tables at replica and if so, do we need full transaction support for them, including rollbacks, subtransactions.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Pavel Stehule


po 23. 9. 2019 v 9:57 odesílatel Konstantin Knizhnik <[hidden email]> napsal:


On 20.09.2019 19:43, Pavel Stehule wrote:

1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).

There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.

Yes, I already heard this argument and agree with it.
I just want to notice two things:
1. My assumption is that in most cases data of temporary table can fit in memory (certainly if we are not limiting them by temp_buffers = 8MB, but store in shared buffers) and so there is on need to write them to the persistent media at all.
2. Global temp tables do not substitute local temp tables, accessed through local buffers. So if you want to use temporary storage, you will always have a way to do it.
The question is whether we need to support two kinds of global temp tables (with shared or private buffers) or just implement one of them.

It's valid only for OLTP.  OLAP world is totally different. More if all users used temporary tables, and you should to calculate with it - it is one reason for global temp tables, then you need multiply size by max_connection.

hard to say what is best from implementation perspective, but it can be unhappy if global temporary tables has different performance characteristics and configuration than local temporary tables.


I missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.

Sorry, I do not completely understand your point here
You can use normal (permanent) table and you will not have to create them again and again. It is also possible to use them for storing temporary data - just need to truncate table when data is not needed any more.
Certainly you can not use the same table in more than one backend. Here is the main advantage of temp tables - you can have storage of per-session data and do not worry about possible name conflicts.

You use temporary tables because you know so you share data between session never. I don't remember any situation when I designed temp tables with different schema for different sessions.

Using global temp table is not effective - you are work with large tables, you need to use delete, .. so you cannot to use classic table like temp tables effectively.


From the other side: there are many cases where format of temporary data is not statically known: it is determined dynamically during program execution.
In this case local temp table provides the most convenient mechanism for working with such data.

This is why I think that ewe need to have both local and global temp tables.

Also I do not agree with your statement "If you reduce @2 to zero, then @5 should be possible without any other".
In the solution implemented by Aleksander Alekseev metadata of temporary tables is kept in memory and not affecting catalog at all.
But them still can not be used at replica.
There are still some serious problems which need to be fixed to able it:
allow insert/update/delete statements for read-only transactions, somehow assign XIDs for them, implement savepoints and rollback of such transactions.
All this was done in the last version of my patch.
Yes, it doesn't depend on whether we are using shared or private buffers for temporary tables. The same approach can be implemented for both of them.
The question is whether we are really need temp tables at replica and if so, do we need full transaction support for them, including rollbacks, subtransactions.

temporary tables (of any type) on replica is interesting feature that opens some possibilities. Some queries cannot be optimized and should be divided and some results should be stored to temporary tables, analysed (to get correct statistics), maybe indexed, and after that the calculation can continue. Now you can do this just only on master. More - on HotStandBy the data are read only, and without direct impact on master (production), so you can do some harder calculation there. And temporary tables is used technique how to fix estimation errors.

I don't think so subtransaction, transaction, rollbacks are necessary for these tables. On second hand with out it, it is half cooked features, and can looks pretty strange in pg environment.

I am very happy, how much work you do in this area, I had not a courage to start this job, but I don't think so this work can be reduced just to some supported scenarios - and I hope so correct implementation is possible - although it is not simply work.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Alvaro Herrera-9
In reply to this post by konstantin knizhnik
This broke recently.  Can you please rebase?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 25.09.2019 23:28, Alvaro Herrera wrote:
> This broke recently.  Can you please rebase?
>
Rebased version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_shared_temp_replica-3.patch (117K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
As far as both Robert and Pavel think that aspects of using GTT in
parallel queries and at replica should be considered separately.
I have prepared simplest version of the patch for GTT which introduces
minimal differences with current (local) temporary table.
So GTT are stored in private buffers, can not be accessed at replica, in
prepared transactions and parallel queries.
But it supports all existed built-on indexes (hash, nbtree, btrin, git,
gist, spgist) and per-backend statistic.
There are no any DDL limitations for GTT.

Also I have not yet introduced pg_statistic view (as proposed by Pavel).
I afraid that it may break compatibility with some existed extensions
and applications.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_private_temp-5.patch (78K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
Yet another version of my GTT patch addressing issues reported by
曾文旌(义从) <[hidden email]>
* Bug in TRUNCATE is fixed,
* ON COMMIT DELETE ROWS option is supported
* ALTER TABLE is correctly handled

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_private_temp-6.patch (85K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
Now pg_gtt_statistic view is provided for global temp tables.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_private_temp-7.patch (93K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Michael Paquier-2
On Wed, Nov 20, 2019 at 07:32:14PM +0300, Konstantin Knizhnik wrote:
> Now pg_gtt_statistic view is provided for global temp tables.

Latest patch fails to apply, per Mr Robot's report.  Could you please
rebase and send an updated version?  For now I have moved the patch to
next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 01.12.2019 4:56, Michael Paquier wrote:
> On Wed, Nov 20, 2019 at 07:32:14PM +0300, Konstantin Knizhnik wrote:
>> Now pg_gtt_statistic view is provided for global temp tables.
> Latest patch fails to apply, per Mr Robot's report.  Could you please
> rebase and send an updated version?  For now I have moved the patch to
> next CF, waiting on author.
> --
> Michael
Rebeased version of the patch is attached.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


global_private_temp-8.patch (94K) Download Attachment
123