Global temporary tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Global temporary tables

konstantin knizhnik
Current Postgres implementation of temporary table causes number of
problems:

1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.
2. Parallel queries: right now usage of temporary tables in query
disables parallel plan.
3. It is not possible to use temporary tables at replica. Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).
4. Temporary tables can not be used in prepared transactions.
5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data
between OS cache and local buffers. But if there are thousands of
backends, each executing queries with temporary tables, then  total
amount of
memory used for temporary buffers can exceed several tens of gigabytes.
6. Connection pooler can not reschedule session which has created
temporary tables to some other backend
because it's data is stored in local buffers.

There were several attempts to address this problems.
For example Alexandr Alekseev has implemented patch which allows to
create fast temporary tables without accessing system catalog:
https://www.postgresql.org/message-id/flat/20160301182500.2c81c3dc%40fujitsu
Unfortunately this patch was too invasive and rejected by community.

There was also attempt to allow under some condition use temporary
tables in 2PC transactions:
https://www.postgresql.org/message-id/flat/m2d0pllvqy.fsf%40dimitris-macbook-pro.home
https://www.postgresql.org/message-id/flat/3a4b3c88-4fa5-1edb-a878-1ed76fa1c82b%40postgrespro.ru#d8a8342d07317d12e3405b903d3b15e4
Them were also rejected.

I try to make yet another attempt to address this problems, first of all
1), 2), 5) and 6)
To solve this problems I propose notion of "global temporary" tables,
similar with ones in Oracle.
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

Suggested syntax for creation of global temporary tables:

     create global temp table
or
     create session table

Once been created it can be used by all backends.
Global temporary tables are accessed though shared buffers (to solve
problem 2).
Cleanup of temporary tables data (release of shared buffer and deletion
of relation files) is performed on backend termination.
In case of abnormal server termination, files of global temporary tables
are cleaned-up in the same way as of local temporary tables.

Certainly there are cases were global temporary tables can not be used,
i.e. when application is dynamically constructed name and columns of
temporary table.
Also access to local buffers is more efficient than access to shared
buffers because it doesn't require any synchronization.
But please notice that it is always possible to create old (local)
temporary tables which preserves current behavior.

The problem with replica is still not solved. But shared metadata is
step in this direction.
I am thinking about reimplementation of temporary tables using new table
access method API.
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.
I still not sure if implementing special table access method for
temporary tables is good idea.

Patch for global temporary tables is attached to this mail.
The known limitation is that now it supports only B-Tree indexes.
Any feedback is welcome.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


session_tables.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3


On Wed, 31 Jul 2019 at 23:05, Konstantin Knizhnik <[hidden email]> wrote:
Current Postgres implementation of temporary table causes number of
problems:

1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.

This also upsets logical decoding a little - AFAICS it still has to treat transactions that use temporary tables as catalog-modifying transactions, tracking them in its historic catalog snapshots and doing extra cache flushes etc when decoding them.

This will become even more important as we work to support eager/optimistic output plugin processing of in-progress transactions. We'd have to switch snapshots more, and that can get quite expensive so using temp tables could really hurt performance. Or we'd have to serialize on catalog-changing transactions, in which case using temp tables would negate the benefits of optimistic streaming of in-progress transactions.
 
3. It is not possible to use temporary tables at replica.

For physical replicas, yes.
 
Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).

Right. Because we cannot modify pg_class, pg_attribute etc, even though we could reasonably enough write to local-only relfilenodes on a replica if we didn't have to change WAL-logged catalog tables.

I've seen some hacks suggested around this where we have an unlogged fork of each of the needed catalog tables, allowing replicas to write temp table info to them. We'd scan both the logged and unlogged forks when doing relcache management etc. But there are plenty of ugly issues with this. We'd have to reserve oid ranges for them which is ugly; to make it BC friendly those reservations would probably have to take the form of some kind of placeholder entry in the real pg_class. And it gets ickier from there. It hardly seems worth it when we should probably just implement global temp tables instead.
  
5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.

Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.

If you're concerned about the memory use of backend local temp buffers, or about how we account for and limit those, that's worth looking into. But I don't think it'd be something that should be affected by global-temp vs backend-local-temp tables.
 
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are
thousands of backends, each executing queries with temporary tables,
then  total amount of memory used for temporary buffers can exceed
several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.

If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on global temp tables personally.

 
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.

Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.
 
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

Global temporary tables are accessed though shared buffers (to solve
problem 2).

I'm far from convinced of the wisdom or necessity of that, but I haven't spent as much time digging into this problem as you have.
 
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps. Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 01.08.2019 6:10, Craig Ringer wrote:
 
3. It is not possible to use temporary tables at replica.

For physical replicas, yes.

Yes, definitely logical replicas (for example our PgPro-EE multimaster based on logical replication) do not suffer from this problem.
But in case of multimaster we have another problem related with temporary tables: we have to use 2PC for each transaction and using temporary tables in prepared transaction is now prohibited.
This was the motivation of the patch proposed by Stas Kelvich which allows to use temporary tables in prepared transactions under some conditions.
 5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.

Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.

Yes, my implementation of global temp tables is using shared buffers.
It was not strictly needed as far as data is local. It is possible to have shared metadata and private data accessed through local buffers.
But I have done it for three reasons:
1, Make it possible to use parallel plans for temp tables.
2. Eliminate memory overflow problem.
3. Make in possible to reschedule session to other backens (connection pooler).

Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.

Definitely local buffers have some advantages:
- do not require synchronization
- avoid flushing data from shared buffers

But global temp tables are not excluding use of original (local) temp tables.
So you will have a choice: either to use local temp tables which can be easily created on demand and accessed through local buffers,
either create global temp tables, which eliminate catalog bloating, allow parallel queries and which data is  controlled by the same cache replacement discipline as for normal tables...



 
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are
thousands of backends, each executing queries with temporary tables,
then  total amount of memory used for temporary buffers can exceed
several tens of gigabytes.

Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.

I do not think that parallel execution and efficient connection pooling are "little benefit".

If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on global temp tables personally.

My assumptions are the following: temporary tables are mostly used in OLAP queries. And OLAP workload  means that there are few concurrent queries which are working with large datasets.
So size of produced temporary tables can be quite big. For OLAP it seems to be very important to be able to use parallel query execution and use the same cache eviction rule both for persistent and temp tables
(otherwise you either cause swapping, either extra copying of data between OS and Postgres caches).


 
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.

Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.

You can not use the same unlogged table to save intermediate query results in two parallel sessions.

 
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.

+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

As I already wrote, I tried to kill two bird with one stone: eliminate catalog bloating and allow access to temp tables from multiple backends (to be able to perform parallel queries and connection pooling).
This is why I have to use shared buffers for global temp tables.
May be it was not so good idea. But it was one of my primary intention of publishing this patch to know opinion of other people.
In PG-Pro some of my colleagues think  that the most critical problem is inability to use temporary tables at replica.
Other think that it is not a problem at all if you are using logical replication.
From my point of view the most critical problem is
inability to use parallel plans for temporary tables.
But looks like you don't think so.

I see three different activities related with temporary tables:
1. Shared metadata
2. Shared buffers
3. Alternative concurrency control & reducing tuple header size (specialized table access method for temporary tables)

In my proposal I combined 1 and 2, leaving 3 for next step.
I will be interested to know other suggestions.

One more thing - 1 and 2 are really independent: you can share metadata without sharing buffers.
But introducing yet another kind of temporary tables seems to be really overkill:
- local temp tables (private namespace and lcoal buffers)
- tables with shared metadata but local bufferes

- tables with shared metadata and bufferes


 
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.

That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps.

Sorry?
I mean elimination of MVCC overhead (visibility checks) for temp tables only.
I am not sure that we can really fully eliminate it if we support use of temp tables in prepared transactions and autonomous transactions (yet another awful feature we have in PgPro-EE).
Also looks like we need to have some analogue of CID to be able to correctly executed queries like "insert into T (select from T ...)" where T is global temp table.
I didn't think much about it, but I really considering new table access method API for reducing per-tuple storage overhead for temporary and append-only tables.

Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.

Yehh, subtransactions can be also a problem for eliminating xmin/xmax for temp tables. Thanks for noticing it.


I noticed that I have not patched some extension - fixed and rebased version of the patch is attached.
Also you can find this version in our github repository: https://github.com/postgrespro/postgresql.builtin_pool.git
branch global_temp.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

session_tables-1.patch (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


session_tables-2.patch (63K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <[hidden email]> wrote:
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended  version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.

Can you explain? Because AFAICS so long as temp table contents are backend-private there's absolutely no point ever using shared buffers for their contents.

Perhaps you mean that in a connection pooling case, each backend may land up filling up temp buffers with contents from *multiple different temp tables*? If so, sure, I get that, but the answer there seems to be to improve eviction and memory accounting, not make backends waste precious shared_buffers space on non-shareable data.

Anyhow, I strongly suggest you simplify the feature to add the basic global temp table feature so the need to change pg_class, pg_attribute etc to use temp tables is removed, but separate changes to temp table memory handling etc into a follow-up patch. That'll make it smaller and easier to review and merge too. The two changes are IMO logically quite separate anyway.

Come to think of it, I think connection poolers might benefit from an extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts temp table buffers from memory *without* dropping the temp tables. If they're currently in-memory tuplestores they'd be written out and evicted. That way a connection pooler could "clean" the backend, at the cost of some probably pretty cheap buffered writes to the system buffer cache. The kernel might not even bother to write out the buffercache and it won't be forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL so such evictions could be pretty cheap - and if not under lots of memory pressure the backend could often read the temp table back in from system buffer cache without disk I/O.

That's my suggestion for how to solve your pooler problem, assuming I've understood it correctly.

Along these lines I suggest adding the following to DISCARD at some point, obviously not as part of your patch:

* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS

where obviously DISCARD SHARED_BUFFERS would be superuser-only and evict only clean buffers.

(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST, OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for plugins in future).

Thoughts?

Would DISCARD TEMP_BUFFERS meet your needs?

Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 08.08.2019 5:40, Craig Ringer wrote:
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <[hidden email]> wrote:
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended  version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.

Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067


Can you explain? Because AFAICS so long as temp table contents are backend-private there's absolutely no point ever using shared buffers for their contents.

Sure, there is no such problem with temporary tables now.
There is another problem: you can not use temporary table with any existed connection poolers (pgbouncer,...) with pooling level other than session unless temporary table is used inside one transaction.
One of the advantages of built-in connection pooler is that it can provide session semantic (GUCs, prepared statement, temporary tables,...) with limited number of backends (smaller than number of sessions).

In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions, 
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why  I consider moving temporary tables to shared buffers as very important step.

In vanilla version of built-in connection pooler situation is slightly different.
Right now if client is using temporary tables without "ON COMMIT DROP" clause, backend is marked as "tainted" and is pinned for this session.
So it is actually excluded from connection pool and servers only this session. Once again - if I will be able to access temporary table data from other backend, there will be no need to mark backend as tainted in this case.
Certainly it also requires shared metadata. And here we come to the concept of global temp tables (if we forget for a moment that global temp tables were "invented" long time ago by Oracle and many other DBMSes:)

Perhaps you mean that in a connection pooling case, each backend may land up filling up temp buffers with contents from *multiple different temp tables*? If so, sure, I get that, but the answer there seems to be to improve eviction and memory accounting, not make backends waste precious shared_buffers space on non-shareable data.

Anyhow, I strongly suggest you simplify the feature to add the basic global temp table feature so the need to change pg_class, pg_attribute etc to use temp tables is removed, but separate changes to temp table memory handling etc into a follow-up patch. That'll make it smaller and easier to review and merge too. The two changes are IMO logically quite separate anyway.

I agree that them are separate.
But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers
(while still supporting private page pool for local temp tables). So both approaches (shared vs. private buffers) have their pros and contras. This is why it seems to be reasonable to support both of them and let user to make choice most suitable for concrete application. Certainly it is possible to provide "global shared temp tables" and "global private temp tables". But IMHO it is overkill.

Come to think of it, I think connection poolers might benefit from an extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts temp table buffers from memory *without* dropping the temp tables. If they're currently in-memory tuplestores they'd be written out and evicted. That way a connection pooler could "clean" the backend, at the cost of some probably pretty cheap buffered writes to the system buffer cache. The kernel might not even bother to write out the buffercache and it won't be forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL so such evictions could be pretty cheap - and if not under lots of memory pressure the backend could often read the temp table back in from system buffer cache without disk I/O.

Yes,  this is one of th possible solutions for session migration.  But frankly speaking flushing local buffers on each session reschedule seems to be not so good solution. Even if OS file cache is large enough and flushed buffers are still  present in memory (but them will be written to the disk in this case even if data of temp table is not intended to be persisted).

That's my suggestion for how to solve your pooler problem, assuming I've understood it correctly.

Along these lines I suggest adding the following to DISCARD at some point, obviously not as part of your patch:

* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS

where obviously DISCARD SHARED_BUFFERS would be superuser-only and evict only clean buffers.

(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST, OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for plugins in future).

Thoughts?

Would DISCARD TEMP_BUFFERS meet your needs?

Actually I have already implemented DropLocalBuffers function (three line of code:)

void
DropLocalBuffers(void)
{
     RelFileNode rnode;
     rnode.relNode = InvalidOid; /* drop all local buffers */
     DropRelFileNodeAllLocalBuffers(rnode);
}


for yet another Postgres extension which is not yet included even in PgPRO-EE - SnapFS: support of database snapshots.
I do not think that we need such command at user level (i.e. have correspondent SQL command).
But, as I already wrote above, I do not consider flushing all buffers on session reschedule as acceptable solution.
And moreover, just flushing buffers is not enough. There is still some smgr stuff associated with this relation which is local to the backend.
We in any case has to make some changes to be able to access temporary data from other backend even if data is flushed to the file system.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik <[hidden email]> wrote:


On 08.08.2019 5:40, Craig Ringer wrote:
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <[hidden email]> wrote:
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended  version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.

Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067

OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other functionality you want/propose is present in core or will be present in core. That's far from given. My suggestion is to split it up so that the parts can be reviewed and committed separately.
 
In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions, 
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why  I consider moving temporary tables to shared buffers as very important step.

I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly distinct functional changes in it, and you should separate them out.

* Introduce global temp tables, a new relkind that works like a temp table but doesn't require catalog changes. Uses per-backend relfilenode and cleanup like existing temp tables. You could extend the relmapper to handle the mapping of relation oid to per-backend relfilenode.

* Associate global temp tables with session state and manage them in shared_buffers so they can work with the in-core connection pooler (if committed)

Historically we've had a few efforts to get in-core connection pooling that haven't gone anywhere. Without your pooler patch the changes you make to use shared_buffers etc are going to be unhelpful at best, if not actively harmful to performance, and will add unnecessary complexity. So I think there's a logical series of patches here:

* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?
  
But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers?

I can see a way to share temp tables across parallel query backends being very useful for DW/OLAP workloads, yes. But I don't know if putting them in shared_buffers is the right answer for that. We have DSM/DSA, we have shm_mq, various options for making temp buffers share-able with parallel worker backends.

I'm suggesting that you not tie the whole (very useful) global temp tables feature to this, but instead split it up into logical units that can be understood, reviewed and committed separately.

I would gladly participate in review.

Would DISCARD TEMP_BUFFERS meet your needs?

Actually I have already implemented DropLocalBuffers function (three line of code:)

[...] 
I do not think that we need such command at user level (i.e. have correspondent SQL command).

I'd be very happy to have it personally, but don't think it needs to be tied in with your patch set here. Maybe I can cook up a patch soon.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 09.08.2019 8:34, Craig Ringer wrote:
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik <[hidden email]> wrote:


On 08.08.2019 5:40, Craig Ringer wrote:
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <[hidden email]> wrote:
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.

FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended  version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.

Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067

OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other functionality you want/propose is present in core or will be present in core. That's far from given. My suggestion is to split it up so that the parts can be reviewed and committed separately.
 
In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions, 
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why  I consider moving temporary tables to shared buffers as very important step.

I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly distinct functional changes in it, and you should separate them out.

* Introduce global temp tables, a new relkind that works like a temp table but doesn't require catalog changes. Uses per-backend relfilenode and cleanup like existing temp tables. You could extend the relmapper to handle the mapping of relation oid to per-backend relfilenode.

* Associate global temp tables with session state and manage them in shared_buffers so they can work with the in-core connection pooler (if committed)

Historically we've had a few efforts to get in-core connection pooling that haven't gone anywhere. Without your pooler patch the changes you make to use shared_buffers etc are going to be unhelpful at best, if not actively harmful to performance, and will add unnecessary complexity. So I think there's a logical series of patches here:

* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?
  
But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers?

I can see a way to share temp tables across parallel query backends being very useful for DW/OLAP workloads, yes. But I don't know if putting them in shared_buffers is the right answer for that. We have DSM/DSA, we have shm_mq, various options for making temp buffers share-able with parallel worker backends.

I'm suggesting that you not tie the whole (very useful) global temp tables feature to this, but instead split it up into logical units that can be understood, reviewed and committed separately.

I would gladly participate in review.

Ok, here it is: global_private_temp-1.patch
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)








--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

global_shared_temp-1.patch (74K) Download Attachment
global_private_temp-1.patch (52K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <[hidden email]> wrote:


Ok, here it is: global_private_temp-1.patch

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on physical read replicas, though I know there are plenty of other barriers there (not least of which being temp tables using persistent txns not vtxids)

Does it have a CF entry?
 
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch

Nice to see that split out. In addition to giving the first patch more hope of being committed this time around, it'll help with readability and testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state abstraction you have implemented. I think it's really important for high client count OLTP workloads, working with the endless collection of ORMs out there, etc. So I'm all in favour of it in principle so long as it can be made to work reliably with limited performance impact on existing workloads and without making life lots harder when adding new core functionality, for extension authors etc. The same goes for built-in pooling. I think PostgreSQL has needed some sort of separation of "connection", "backend", "session" and "executor" for a long time and I'm glad to see you working on it.

With that said: How do you intend to address the likelihood that this will cause performance regressions for existing workloads that use temp tables *without* relying on your session state and connection pooler? Consider workloads that use temp tables for mid-long txns where txn pooling is unimportant, where they also do plenty of read and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data for its medium-length transactions
* transactions also write to some persistent relations, say to record their summarised results  

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp tables
* the temp table data in shared_buffers may put pressure on shared_buffers space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers space for dirty buffers, forcing writes of persistent tables out earlier therefore reducing write-combining opportunities;


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 10.08.2019 5:12, Craig Ringer wrote:
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <[hidden email]> wrote:


Ok, here it is: global_private_temp-1.patch

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on physical read replicas, though I know there are plenty of other barriers there (not least of which being temp tables using persistent txns not vtxids)

Does it have a CF entry?
 

https://commitfest.postgresql.org/24/2233/

Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch

Nice to see that split out. In addition to giving the first patch more hope of being committed this time around, it'll help with readability and testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state abstraction you have implemented. I think it's really important for high client count OLTP workloads, working with the endless collection of ORMs out there, etc. So I'm all in favour of it in principle so long as it can be made to work reliably with limited performance impact on existing workloads and without making life lots harder when adding new core functionality, for extension authors etc. The same goes for built-in pooling. I think PostgreSQL has needed some sort of separation of "connection", "backend", "session" and "executor" for a long time and I'm glad to see you working on it.

With that said: How do you intend to address the likelihood that this will cause performance regressions for existing workloads that use temp tables *without* relying on your session state and connection pooler? Consider workloads that use temp tables for mid-long txns where txn pooling is unimportant, where they also do plenty of read and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data for its medium-length transactions
* transactions also write to some persistent relations, say to record their summarised results  

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp tables
* the temp table data in shared_buffers may put pressure on shared_buffers space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers space for dirty buffers, forcing writes of persistent tables out earlier therefore reducing write-combining opportunities;

I agree that access to local buffers is cheaper than to shared buffers because there is no lock overhead.
And the fact that access to local tables can not affect cached data of persistent tables is also important.
But most of Postgres tables are still normal (persistent) tables access through shared buffers.
And huge amount of efforts were made to make this access as efficient as possible (use clock algorithm which doesn't require global lock,
atomic operations,...). Also using the same replacement discipline for all tables at some workloads may be also preferable.
So it is not so obvious to me that in the described scenario local buffer cache for temporary table really will provide significant advantages.
It will be interesting to perform some benchmarking - I am going to do it.

What I have observed right now is that in type scenario: dumping results of huge query to temporary table with subsequent traverse of this table
old (local) temporary tables provide better performance (may be because of small size of local buffer cache and different eviction policy).
But subsequent accesses to global shared table are faster (because it completely fits in large shared buffer cache).

There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.
 



Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Pavel Stehule

Hi


There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it years.

I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).

Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.

There are two requirements:

a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.

Regards

p.s. I am very happy so you are working on this topic. It is interesting and important problem.

Pavel









 



Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
Hi,

On 11.08.2019 10:14, Pavel Stehule wrote:

Hi


There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it years.

I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).

Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.

But Postgres is not storing this information now anywhere else except statistic, isn't it?
There was proposal to cache relation size,  but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?



There are two requirements:

a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.


I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Pavel Stehule


po 12. 8. 2019 v 18:19 odesílatel Konstantin Knizhnik <[hidden email]> napsal:
Hi,

On 11.08.2019 10:14, Pavel Stehule wrote:

Hi


There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it years.

I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).

Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.

But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

There was proposal to cache relation size,  but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?

they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.




There are two requirements:

a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.


I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.

I spoke only about metadata. Data should be stored in temp buffers (and possibly in temp files)

Pavel

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <[hidden email]> wrote:
 
But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?

I'll need to check, but I wonder if periodically updating those fields in pg_class impacts logical decoding too. Logical decoding must treat transactions with catalog changes as special cases where it creates custom snapshots and does other expensive additional work. (See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its callsites). We don't actually need to know relpages and reltuples during logical decoding. It can probably ignore relfrozenxid and relminmxid changes too, maybe even pg_statistic changes though I'd be less confident about that one.

At some point I need to patch in a bunch of extra tracepoints and do some perf tracing to see how often we do potentially unnecessary snapshot related work in logical decoding.


There was proposal to cache relation size,  but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?

they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.

Agreed. It's actually quite a common solution to user problem reports / support queries about temp table performance: "Run ANALYZE. Consider creating indexes too."

Which reminds me - if global temp tables do get added, it'll further increase the desirability of exposing a feature to let users disable+invalidate and then later reindex+enable indexes without icky catalog hacking. So they can disable indexes for their local copy, load data, re-enable indexes. That'd be "interesting" to implement for global temp tables given that index state is part of the pg_index row associated with an index rel though. 


1. hold these data only in memory in special buffers

I don't see that working well for pg_statistic or anything else that holds nontrivial user data though.
2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.

Very meta. Syscache and relcache are extremely performance critical but could probably skip scans entirely on backends that haven't used any global temp tables.

I don't know the relevant caches well enough to have a useful opinion here.
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.

It'd only be the metadata, but if it includes things like column histograms and most frequent value data that'd still be undesirable to have pinned in backend memory.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik


On 13.08.2019 8:34, Craig Ringer wrote:
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <[hidden email]> wrote:
 
But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.


What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?

No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by all backends.
So all backends will now build plans according to this statistic. Certainly it may lead to not so efficient plans if there are large differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for temporary table. And it seems to me that users are created indexes for temporary tables even rarely than doing analyze for them.

I'll need to check, but I wonder if periodically updating those fields in pg_class impacts logical decoding too. Logical decoding must treat transactions with catalog changes as special cases where it creates custom snapshots and does other expensive additional work. (See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its callsites). We don't actually need to know relpages and reltuples during logical decoding. It can probably ignore relfrozenxid and relminmxid changes too, maybe even pg_statistic changes though I'd be less confident about that one.

At some point I need to patch in a bunch of extra tracepoints and do some perf tracing to see how often we do potentially unnecessary snapshot related work in logical decoding.

Temporary tables (both local and global) as well as unlogged tables are not subject of logical replication, aren't them?


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
In reply to this post by konstantin knizhnik
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <[hidden email]> wrote:


Ok, here it is: global_private_temp-1.patch


Initial pass review follows.

Relation name "SESSION" is odd. I guess you're avoiding "global" because the data is session-scoped, not globally temporary. But I'm not sure "session" fits either - after all regular temp tables are also session temporary tables. I won't focus on naming further beyond asking that it be consistent though, right now there's a mix of "global" in some places and "session" in others.


Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings? I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in that patch. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.

Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.


A number of places make a change like this:
 
rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION

and I'd like to see a test macro or inline static for it since it's repeated so much. Mostly to make the intent clear: "is this a relation with temporary backend-scoped data?"


This test:

+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) && IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);

seems sensible but I'm wondering if there's a way to channel initialization of global-temp objects through a bit more of a common path, so it reads more obviously as a common test applying to all global-temp tables. "Global temp table not initialized in session yet? Initialize it." So we don't have to have every object type do an object type specific test for "am I actually uninitialized?" in all paths it might hit. I guess I expected to see something more like a

if (RelGlobalTempUninitialized(rel))

but maybe I've been doing too much Java ;)

A similar test reappears here for sequences:

+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION && PageIsNew(page))

Why is this written differently?


Sequence initialization ignores sequence startval/firstval settings. Why?


- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)

Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?


In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if the oid is for a backend-temp table not a global-temp table?


+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence == RELPERSISTENCE_TEMP;

Won't session-temp tables have local buffers too? Until your next patch that adds shared_buffers storage for them anyway?


+ * These is no need to separate them at file system level, so just subtract SessionRelFirstBackendId
+ * to avoid too long file names.

I agree that there's no reason to be able to differentiate between local-temp and session-temp relfilenodes at the filesystem level.







 
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)








--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
In reply to this post by konstantin knizhnik


On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik <[hidden email]> wrote:


On 13.08.2019 8:34, Craig Ringer wrote:
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <[hidden email]> wrote:
 
But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.

Also autovacuum's autoanalyze.

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?

No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by all backends.
So all backends will now build plans according to this statistic. Certainly it may lead to not so efficient plans if there are large differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for temporary table. And it seems to me that users are created indexes for temporary tables even rarely than doing analyze for them.

That doesn't seem too bad TBH. Hacky but it doesn't seem dangerously wrong and as likely to be helpful as not if clearly documented.
 
Temporary tables (both local and global) as well as unlogged tables are not subject of logical replication, aren't them?


Right. But in the same way that they're still present in the catalogs, I think they still affect catalog snapshots maintained by logical decoding's historic snapshot manager as temp table creation/drop will still AFAIK cause catalog invalidations to be written on commit. I need to double check that.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
In reply to this post by Craig Ringer-3


On 13.08.2019 11:21, Craig Ringer wrote:
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <[hidden email]> wrote:


Ok, here it is: global_private_temp-1.patch


Initial pass review follows.

Relation name "SESSION" is odd. I guess you're avoiding "global" because the data is session-scoped, not globally temporary. But I'm not sure "session" fits either - after all regular temp tables are also session temporary tables. I won't focus on naming further beyond asking that it be consistent though, right now there's a mix of "global" in some places and "session" in others.

I have supported both forms "create session table" and "create global temp".
Both "session" and "global" are expected PostgreSQL keywords so we do not need to introduce new one (unlike "public" or "shared").
The form "global temp" is used in Oracle so it seems to be natural to provide similar syntax.

I am not insisting on this syntax and will consider all other suggestions.
But IMHO almost any SQL keyword is overloaded and have different meaning.
Temporary tables has session as living area (or transaction if created with "ON COMMIT DROP" clause).
So calling them "session tables" is actually more clear than just "temporary tables".
But "local temp tables" can be also considered as session tables. So may be it is really not so good idea
to use "session" keyword for them.


Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings? I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in that patch. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.

Sorry, are you really speaking about global_private_temp-1.patch?
This patch doesn't change bufmgr file at all.
May be you looked at another patch - global_shared_temp-1.patch
which is accessing shared tables though shared buffers and so have to change buffer tag to include backend ID in it.


Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.

In global_private_temp-1.patch TruncateSessionRelations does nothing with shared buffers, it just delete relation files.


A number of places make a change like this:
 
rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION

and I'd like to see a test macro or inline static for it since it's repeated so much. Mostly to make the intent clear: "is this a relation with temporary backend-scoped data?"

I consider to call such macro IsSessionRelation() or something like this but you do not like notion "session".
Is IsBackendScopedRelation() a better choice?


This test:

+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) && IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);

seems sensible but I'm wondering if there's a way to channel initialization of global-temp objects through a bit more of a common path, so it reads more obviously as a common test applying to all global-temp tables. "Global temp table not initialized in session yet? Initialize it." So we don't have to have every object type do an object type specific test for "am I actually uninitialized?" in all paths it might hit. I guess I expected to see something more like a

if (RelGlobalTempUninitialized(rel))

but maybe I've been doing too much Java ;)

A similar test reappears here for sequences:

+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION && PageIsNew(page))

Why is this written differently?


Just because I wrote them in different moment of times:)
I think that adding RelGlobalTempUninitialized is really good idea.


Sequence initialization ignores sequence startval/firstval settings. Why?


I am handling only case of implicitly created sequences for SERIAL/BIGSERIAL columns.
Is it possible to explicitly specify initial value and step for them?
If so, this place should definitely be rewritten.


- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)

Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?

RELPERSISTENCE_UNLOGGED case is handle in previous IF branch.


In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if the oid is for a backend-temp table not a global-temp table?

If it is local temp table, then XACT_FLAGS_ACCESSEDTEMPNAMESPACE is set and  so there is no need to check this flag.
And as far as XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set now  for global temp tables, I have to remove this check.
So for local temp table original behavior is preserved.

The question is why I do not set XACT_FLAGS_ACCESSEDTEMPNAMESPACE for global temp tables?
I wanted to avoid current limitation for using temp tables in prepared transactions.
Global metadata allows to eliminate some problems related with using temp tables in 2PC.
But I am not sure that it eliminates ALL problems and there are no strange effects related with
prepared transactions&global temp tables.



+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence == RELPERSISTENCE_TEMP;

Won't session-temp tables have local buffers too? Until your next patch that adds shared_buffers storage for them anyway?

Once again, it is change from global_shared_temp-1.patch, not from global_private_temp-1.patch


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

konstantin knizhnik
In reply to this post by Craig Ringer-3


On 13.08.2019 11:27, Craig Ringer wrote:


On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik <[hidden email]> wrote:


On 13.08.2019 8:34, Craig Ringer wrote:
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <[hidden email]> wrote:
 
But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.

Also autovacuum's autoanalyze.

When it happen?
I have created normal table, populated it with some data and then wait several hours but pg_class was not updated for this table.


I attach to this mail slightly refactored versions of this patches with fixes of issues reported in your review.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

global_shared_temp-2.patch (75K) Download Attachment
global_private_temp-2.patch (53K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global temporary tables

Craig Ringer-3
On Tue, 13 Aug 2019 at 21:50, Konstantin Knizhnik <[hidden email]> wrote:
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.

Also autovacuum's autoanalyze.

When it happen?
I have created normal table, populated it with some data and then wait several hours but pg_class was not updated for this table.



heap_vacuum_rel() in src/backend/access/heap/vacuumlazy.c below

     * Update statistics in pg_class.

which I'm pretty sure is common to explicit vacuum and autovacuum. I haven't run up a test to verify 100% but most DBs would never have relpages etc set if autovac didn't do it since most aren't explicitly VACUUMed at all.

I thought it was done when autovac ran an analyze, but it looks like it's all autovac. Try setting very aggressive autovac thresholds and inserting + deleting a bunch of tuples maybe.

I attach to this mail slightly refactored versions of this patches with fixes of issues reported in your review.

Thanks.

Did you have a chance to consider my questions too? I see a couple of things where there's no patch change, which is fine, but I'd be interested in your thoughts on the question/issue in those cases.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
123