pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <[hidden email]> napsal:
Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. I have another idea. Can be pg_statistics view instead a table? Some like SELECT * FROM pg_catalog.pg_statistics_rel UNION ALL SELECT * FROM pg_catalog.pg_statistics_gtt(); Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. Or crazy idea - today we can implement own types of heaps. Is possible to create engine where result can be combination of some shared data and local data. So union will be implemented on heap level. This implementation can be simple, just scanning pages from shared buffers and from local buffers. For these tables we don't need complex metadata. It's crazy idea, and I think so union with table function should be best. Regards Pavel
|
On Sat, Nov 2, 2019 at 6:31 AM Pavel Stehule <[hidden email]> wrote:
> > pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <[hidden email]> napsal: >> >> On 01.11.2019 18:26, Robert Haas wrote: >> > On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik >> > <[hidden email]> wrote: >> >> It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. >> >> To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. >> >> I wonder if there are some pitfalls of such approach? >> > That sounds pretty hackish. You'd have to be very careful, for >> > example, that if the tables were dropped or re-analyzed, all of the >> > old entries got removed -- >> >> I have checked it: >> - when table is reanalyzed, then cache entries are replaced. >> - when table is dropped, then cache entries are removed. >> >> > and then it would still fail if any code >> > tried to access the statistics directly from the table, rather than >> > via the caches. My assumption is that the statistics ought to be >> > stored in some backend-private data structure designed for that >> > purpose, and that the code that needs the data should be taught to >> > look for it there when the table is a GTT. >> >> Yes, if you do "select * from pg_statistic" then you will not see >> statistic for GTT in this case. >> But I do not think that it is so critical. I do not believe that anybody >> is trying to manually interpret values in this table. >> And optimizer is retrieving statistic through sys-cache mechanism and so >> is able to build correct plan in this case. > > > Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. > > I have another idea. Can be pg_statistics view instead a table? > > Some like > > SELECT * FROM pg_catalog.pg_statistics_rel > UNION ALL > SELECT * FROM pg_catalog.pg_statistics_gtt(); > > Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. It'd be a loss if you lose the ability to see the statistics, as there are valid use cases where you need to see the stats, eg. understanding why you don't get the plan you wanted. There's also at least one extension [1] that allows you to backup and use restored statistics, so there are definitely people interested in it. [1]: https://github.com/ossc-db/pg_dbms_stats |
so 2. 11. 2019 v 8:18 odesílatel Julien Rouhaud <[hidden email]> napsal: On Sat, Nov 2, 2019 at 6:31 AM Pavel Stehule <[hidden email]> wrote: I don't think - the extensions can use UNION and the content will be same as caches used by planner. |
so 2. 11. 2019 v 8:23 odesílatel Pavel Stehule <[hidden email]> napsal:
sure, if some one try to modify directly system tables, then it should be fixed. |
In reply to this post by Pavel Stehule
On Sat, Nov 2, 2019 at 8:23 AM Pavel Stehule <[hidden email]> wrote:
> > so 2. 11. 2019 v 8:18 odesílatel Julien Rouhaud <[hidden email]> napsal: >> >> On Sat, Nov 2, 2019 at 6:31 AM Pavel Stehule <[hidden email]> wrote: >> > >> > pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <[hidden email]> napsal: >> >> >> >> On 01.11.2019 18:26, Robert Haas wrote: >> >> > On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik >> >> > <[hidden email]> wrote: >> >> >> It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. >> >> >> To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. >> >> >> I wonder if there are some pitfalls of such approach? >> >> > That sounds pretty hackish. You'd have to be very careful, for >> >> > example, that if the tables were dropped or re-analyzed, all of the >> >> > old entries got removed -- >> >> >> >> I have checked it: >> >> - when table is reanalyzed, then cache entries are replaced. >> >> - when table is dropped, then cache entries are removed. >> >> >> >> > and then it would still fail if any code >> >> > tried to access the statistics directly from the table, rather than >> >> > via the caches. My assumption is that the statistics ought to be >> >> > stored in some backend-private data structure designed for that >> >> > purpose, and that the code that needs the data should be taught to >> >> > look for it there when the table is a GTT. >> >> >> >> Yes, if you do "select * from pg_statistic" then you will not see >> >> statistic for GTT in this case. >> >> But I do not think that it is so critical. I do not believe that anybody >> >> is trying to manually interpret values in this table. >> >> And optimizer is retrieving statistic through sys-cache mechanism and so >> >> is able to build correct plan in this case. >> > >> > >> > Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. >> > >> > I have another idea. Can be pg_statistics view instead a table? >> > >> > Some like >> > >> > SELECT * FROM pg_catalog.pg_statistics_rel >> > UNION ALL >> > SELECT * FROM pg_catalog.pg_statistics_gtt(); >> > >> > Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. >> >> It'd be a loss if you lose the ability to see the statistics, as there >> are valid use cases where you need to see the stats, eg. understanding >> why you don't get the plan you wanted. There's also at least one >> extension [1] that allows you to backup and use restored statistics, >> so there are definitely people interested in it. >> >> [1]: https://github.com/ossc-db/pg_dbms_stats > > > I don't think - the extensions can use UNION and the content will be same as caches used by planner. Yes, I agree that changing pg_statistics to be a view as you showed would fix the problem. I was answering Konstantin's point: >> >> But I do not think that it is so critical. I do not believe that anybody >> >> is trying to manually interpret values in this table. >> >> And optimizer is retrieving statistic through sys-cache mechanism and so >> >> is able to build correct plan in this case. which is IMHO a wrong assumption. |
In reply to this post by Julien Rouhaud
On 02.11.2019 10:19, Julien Rouhaud wrote: > On Sat, Nov 2, 2019 at 6:31 AM Pavel Stehule <[hidden email]> wrote: >> pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <[hidden email]> napsal: >>> On 01.11.2019 18:26, Robert Haas wrote: >>>> On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik >>>> <[hidden email]> wrote: >>>>> It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. >>>>> To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. >>>>> I wonder if there are some pitfalls of such approach? >>>> That sounds pretty hackish. You'd have to be very careful, for >>>> example, that if the tables were dropped or re-analyzed, all of the >>>> old entries got removed -- >>> I have checked it: >>> - when table is reanalyzed, then cache entries are replaced. >>> - when table is dropped, then cache entries are removed. >>> >>>> and then it would still fail if any code >>>> tried to access the statistics directly from the table, rather than >>>> via the caches. My assumption is that the statistics ought to be >>>> stored in some backend-private data structure designed for that >>>> purpose, and that the code that needs the data should be taught to >>>> look for it there when the table is a GTT. >>> Yes, if you do "select * from pg_statistic" then you will not see >>> statistic for GTT in this case. >>> But I do not think that it is so critical. I do not believe that anybody >>> is trying to manually interpret values in this table. >>> And optimizer is retrieving statistic through sys-cache mechanism and so >>> is able to build correct plan in this case. >> >> Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. >> >> I have another idea. Can be pg_statistics view instead a table? >> >> Some like >> >> SELECT * FROM pg_catalog.pg_statistics_rel >> UNION ALL >> SELECT * FROM pg_catalog.pg_statistics_gtt(); >> >> Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. > It'd be a loss if you lose the ability to see the statistics, as there > are valid use cases where you need to see the stats, eg. understanding > why you don't get the plan you wanted. There's also at least one > extension [1] that allows you to backup and use restored statistics, > so there are definitely people interested in it. > > [1]: https://github.com/ossc-db/pg_dbms_stats temporary tables which life time is limited to life time of backend, doesn't it? |
In reply to this post by Pavel Stehule
On 02.11.2019 8:30, Pavel Stehule
wrote:
And pg_catalog.pg_statistics_gtt() is set returning functions? I afraid that it is not acceptable solution from performance point of view: pg_statictic table is accessed by keys (<relid>,<attpos>,<inh>) If it can not be done using index scan, then it can cause significant performance slow down.
Frankly speaking, implementing special heap access method for pg_statistic just to handle case of global temp tables seems to be overkill from my point of view. It requires a lot coding (or at least copying a lot of code from heapam). Also, as I wrote above, we need also index for efficient lookup of statistic. |
In reply to this post by konstantin knizhnik
On Sat, Nov 2, 2019 at 4:09 PM Konstantin Knizhnik
<[hidden email]> wrote: > > On 02.11.2019 10:19, Julien Rouhaud wrote: > > On Sat, Nov 2, 2019 at 6:31 AM Pavel Stehule <[hidden email]> wrote: > >> pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <[hidden email]> napsal: > >>> On 01.11.2019 18:26, Robert Haas wrote: > >>>> On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik > >>>> <[hidden email]> wrote: > >>>>> It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself. > >>>>> To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache. > >>>>> I wonder if there are some pitfalls of such approach? > >>>> That sounds pretty hackish. You'd have to be very careful, for > >>>> example, that if the tables were dropped or re-analyzed, all of the > >>>> old entries got removed -- > >>> I have checked it: > >>> - when table is reanalyzed, then cache entries are replaced. > >>> - when table is dropped, then cache entries are removed. > >>> > >>>> and then it would still fail if any code > >>>> tried to access the statistics directly from the table, rather than > >>>> via the caches. My assumption is that the statistics ought to be > >>>> stored in some backend-private data structure designed for that > >>>> purpose, and that the code that needs the data should be taught to > >>>> look for it there when the table is a GTT. > >>> Yes, if you do "select * from pg_statistic" then you will not see > >>> statistic for GTT in this case. > >>> But I do not think that it is so critical. I do not believe that anybody > >>> is trying to manually interpret values in this table. > >>> And optimizer is retrieving statistic through sys-cache mechanism and so > >>> is able to build correct plan in this case. > >> > >> Years ago, when I though about it, I wrote patch with similar design. It's working, but surely it's ugly. > >> > >> I have another idea. Can be pg_statistics view instead a table? > >> > >> Some like > >> > >> SELECT * FROM pg_catalog.pg_statistics_rel > >> UNION ALL > >> SELECT * FROM pg_catalog.pg_statistics_gtt(); > >> > >> Internally - when stat cache is filled, then there can be used pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there was not possibility to work with queries, only with just relations. > > It'd be a loss if you lose the ability to see the statistics, as there > > are valid use cases where you need to see the stats, eg. understanding > > why you don't get the plan you wanted. There's also at least one > > extension [1] that allows you to backup and use restored statistics, > > so there are definitely people interested in it. > > > > [1]: https://github.com/ossc-db/pg_dbms_stats > It seems to have completely no sense to backup and restore statistic for > temporary tables which life time is limited to life time of backend, > doesn't it? In general yes I agree, but it doesn't if the goal is to understand why even after an analyze on the temporary table your query is still behaving poorly. It can be useful to allow reproduction or just give someone else the statistics to see what's going on. |
In reply to this post by konstantin knizhnik
yes
I don't think so it is problem. The any component, that needs to use fast access can use some special function that check index or check some memory buffers.
where you need fast access when you use SQL access? Inside postgres optimizer is caches everywhere. And statistics cache should to know so have to check index and some memory buffers. The proposed view will not be used by optimizer, but it can be used by some higher layers. I think so there is a agreement so GTT metadata should not be stored in system catalogue. If are stored in some syscache or somewhere else is not important in this moment. But can be nice if for user the GTT metadata should not be black hole. I think so is better to change some current tables to views, than use some special function just specialized for GTT (these functions should to exists in both variants). When I think about it - this is important not just for functionality that we expect from GTT. It is important for consistency of Postgres catalog - how much different should be GTT than other types of tables in system catalogue from user's perspective. |
In reply to this post by Robert Haas
Dear Hackers
I attached the patch of GTT implementationI base on PG12. The GTT design came from my first email. Some limitations in patch will be eliminated in later versions. Later, I will comment on Konstantin's patch and make some proposals for cooperation. Looking forward to your feedback. Thanks. Zeng Wenjing > 2019年10月29日 上午12:40,Robert Haas <[hidden email]> 写道: > > On Mon, Oct 28, 2019 at 9:37 AM Konstantin Knizhnik > <[hidden email]> wrote: >> Sorry, but both statements are not true. > > Well, I think they are true. > >> I am not sure how vital is lack of aborts for transactions working with >> GTT at replica. >> Some people said that there is no sense in aborts of read-only >> transactions at replica (despite to the fact that them are saving >> intermediate results in GTT). >> Some people said something similar with your's "dead on arrival". >> But inconsistency is not possible: if such transaction is really >> aborted, then backend is terminated and nobody can see this inconsistency. > > Aborting the current transaction is a very different thing from > terminating the backend. > > Also, the idea that there is no sense in aborts of read-only > transactions on a replica seems totally wrong. Suppose that you insert > a row into the table and then you go to insert a row in each index, > but one of the index inserts fails - duplicate key, out of memory > error, I/O error, whatever. Now the table and the index are > inconsistent. Normally, we're protected against this by MVCC, but if > you use a solution that breaks MVCC by using the same XID for all > transactions, then it can happen. > >> Concerning second alternative: you can check yourself that it is not >> *extremely* complicated and invasive. >> I extracted changes which are related with handling transactions at >> replica and attached them to this mail. >> It is just 500 lines (including diff contexts). Certainly there are some >> limitation of this implementation: number of transactions working with >> GTT at replica is limited by 2^32 >> and since GTT tuples are not frozen, analog of GTT CLOG kept in memory >> is never truncated. > > I admit that this patch is not lengthy, but there remains the question > of whether it is correct. It's possible that the problem isn't as > complicated as I think it is, but I do think there are quite a number > of reasons why this patch wouldn't be considered acceptable... > >> I agree with it and think that implementation of GTT with private >> buffers and no replica access is good starting point. > > ...but given that we seem to agree on this point, perhaps it isn't > necessary to argue about those things right now. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company |
On 06.11.2019 16:24, 曾文旌(义从) wrote: > Dear Hackers > > > I attached the patch of GTT implementationI base on PG12. > The GTT design came from my first email. > Some limitations in patch will be eliminated in later versions. > > Later, I will comment on Konstantin's patch and make some proposals for cooperation. > Looking forward to your feedback. > > Thanks. > > Zeng Wenjing > My first comments: 1. I have ported you patch to the latest Postgres version (my patch is attached). 2. You patch is supporting only B-Tree index for GTT. All other indexes (hash, gin, gist, brin,...) are not currently supported. 3. I do not understand the reason for the following limitation: "We allow to create index on global temp table only this session use it" First of all it seems to significantly reduce usage of global temp tables. Why do we need GTT at all? Mostly because we need to access temporary data in more than one backend. Otherwise we can just use normal table. If temp table is expected to be larger enough, so that we need to create index for it, then it is hard to believe that it will be needed only in one backend. May be the assumption is that all indexes has to be created before GTT start to be used. But right now this check is not working correctly in any case - if you insert some data into the table, then you can not create index any more: postgres=# create global temp table gtt(x integer primary key, y integer); CREATE TABLE postgres=# insert into gtt values (generate_series(1,100000), generate_series(1,100000)); INSERT 0 100000 postgres=# create index on gtt(y); ERROR: can not create index when have one or more backend attached this global temp table I wonder why do you need such restriction? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
> 2019年11月7日 上午12:08,Konstantin Knizhnik <[hidden email]> 写道: > > > > On 06.11.2019 16:24, 曾文旌(义从) wrote: >> Dear Hackers >> >> >> I attached the patch of GTT implementationI base on PG12. >> The GTT design came from my first email. >> Some limitations in patch will be eliminated in later versions. >> >> Later, I will comment on Konstantin's patch and make some proposals for cooperation. >> Looking forward to your feedback. >> >> Thanks. >> >> Zeng Wenjing >> > > Thank you for this patch. > My first comments: > > 1. I have ported you patch to the latest Postgres version (my patch is attached). > 2. You patch is supporting only B-Tree index for GTT. All other indexes (hash, gin, gist, brin,...) are not currently supported. I noticed that your patch supports more index types, which is where I'd like to work with you. > 3. I do not understand the reason for the following limitation: > "We allow to create index on global temp table only this session use it" > > First of all it seems to significantly reduce usage of global temp tables. > Why do we need GTT at all? Mostly because we need to access temporary data in more than one backend. Otherwise we can just use normal table. > If temp table is expected to be larger enough, so that we need to create index for it, then it is hard to believe that it will be needed only in one backend. > > May be the assumption is that all indexes has to be created before GTT start to be used. Yes, Currently, GTT's index is only supported and created in an empty table state, and other sessions are not using it. There has two improvements pointer: 1 Index can create on GTT(A) when the GTT(A) in the current session is not empty, requiring the GTT table to be empty in the other session. Index_build needs to be done in the current session just like a normal table. This improvement is relatively easy. 2 Index can create on GTT(A) when more than one session are using this GTT(A). Because when I'm done creating an index of the GTT in this session and setting it to be an valid index, it's not true for the GTT in other sessions. Indexes on gtt in other sessions require "rebuild_index" before using it. I don't have a better solution right now, maybe you have some suggestions. > But right now this check is not working correctly in any case - if you insert some data into the table, then > you can not create index any more: > > postgres=# create global temp table gtt(x integer primary key, y integer); > CREATE TABLE > postgres=# insert into gtt values (generate_series(1,100000), generate_series(1,100000)); > INSERT 0 100000 > postgres=# create index on gtt(y); > ERROR: can not create index when have one or more backend attached this global temp table > > I wonder why do you need such restriction? > > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > <global_temporary_table_v1-pg13.patch> |
čt 7. 11. 2019 v 10:30 odesílatel 曾文旌(义从) <[hidden email]> napsal:
I think so DDL operations can be implemented in some reduced form - so DDL are active only for one session, and for other sessions are invisible. Important is state of GTT object on session start. For example ALTER TABLE DROP COLUMN can has very fatal impact on other sessions. So I think the best of GTT can be pattern - the structure of GTT table is immutable for any session that doesn't do DDL operations.
|
In reply to this post by 曾文旌(义从)
This part of the improvement has been completed. New patch is attached. Index can create on GTT(A) when the GTT(A) in the current session is not empty now. But still requiring the GTT table to be empty in the other session.
Zeng Wenjing |
In reply to this post by Pavel Stehule
Yes, Those ddl that need to rewrite data files will have this problem. This is why I disabled alter GTT in the current version. It can be improved, such as Alter GTT can also be allowed when only the current session is in use. Users can also choose to kick off other sessions that are using gtt, then do alter GTT.
|
čt 7. 11. 2019 v 13:17 odesílatel 曾文旌(义从) <[hidden email]> napsal:
I think so it is acceptable solution for some first steps, but I cannot to imagine so this behave can be good for production usage. But can be good enough for some time. Regards Pavel
|
In reply to this post by 曾文旌(义从)
On 07.11.2019 12:30, 曾文旌(义从) wrote: > >> May be the assumption is that all indexes has to be created before GTT start to be used. > Yes, Currently, GTT's index is only supported and created in an empty table state, and other sessions are not using it. > There has two improvements pointer: > 1 Index can create on GTT(A) when the GTT(A) in the current session is not empty, requiring the GTT table to be empty in the other session. > Index_build needs to be done in the current session just like a normal table. This improvement is relatively easy. > > 2 Index can create on GTT(A) when more than one session are using this GTT(A). > Because when I'm done creating an index of the GTT in this session and setting it to be an valid index, it's not true for the GTT in other sessions. > Indexes on gtt in other sessions require "rebuild_index" before using it. > I don't have a better solution right now, maybe you have some suggestions. Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access) { Buffer buf; if (blkno != P_NEW) { /* Read an existing block of the relation */ buf = ReadBuffer(rel, blkno); /* Session temporary relation may be not yet initialized for this backend. */ if (blkno == BTREE_METAPAGE && GlobalTempRelationPageIsNotInitialized(rel, BufferGetPage(buf))) { Relation heap = RelationIdGetRelation(rel->rd_index->indrelid); ReleaseBuffer(buf); DropRelFileNodeLocalBuffers(rel->rd_node, MAIN_FORKNUM, blkno); btbuild(heap, rel, BuildIndexInfo(rel)); RelationClose(heap); buf = ReadBuffer(rel, blkno); LockBuffer(buf, access); } else { LockBuffer(buf, access); _bt_checkpage(rel, buf); } } ... This code initializes B-Tree and load data in it when GTT index is access and is not initialized yet. It looks a little bit hacker but it works. I also wonder why you are keeping information about GTT in shared memory. Looks like the only information we really need to share is table's metadata. But it is already shared though catalog. All other GTT related information is private to backend so I do not see reasons to place it in shared memory. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
> 2019年11月8日 上午12:32,Konstantin Knizhnik <[hidden email]> 写道: > > > > On 07.11.2019 12:30, 曾文旌(义从) wrote: >> >>> May be the assumption is that all indexes has to be created before GTT start to be used. >> Yes, Currently, GTT's index is only supported and created in an empty table state, and other sessions are not using it. >> There has two improvements pointer: >> 1 Index can create on GTT(A) when the GTT(A) in the current session is not empty, requiring the GTT table to be empty in the other session. >> Index_build needs to be done in the current session just like a normal table. This improvement is relatively easy. >> >> 2 Index can create on GTT(A) when more than one session are using this GTT(A). >> Because when I'm done creating an index of the GTT in this session and setting it to be an valid index, it's not true for the GTT in other sessions. >> Indexes on gtt in other sessions require "rebuild_index" before using it. >> I don't have a better solution right now, maybe you have some suggestions. > It is possible to create index on demand: > > Buffer > _bt_getbuf(Relation rel, BlockNumber blkno, int access) > { > Buffer buf; > > if (blkno != P_NEW) > { > /* Read an existing block of the relation */ > buf = ReadBuffer(rel, blkno); > /* Session temporary relation may be not yet initialized for this backend. */ > if (blkno == BTREE_METAPAGE && GlobalTempRelationPageIsNotInitialized(rel, BufferGetPage(buf))) > { > Relation heap = RelationIdGetRelation(rel->rd_index->indrelid); > ReleaseBuffer(buf); > DropRelFileNodeLocalBuffers(rel->rd_node, MAIN_FORKNUM, blkno); > btbuild(heap, rel, BuildIndexInfo(rel)); > RelationClose(heap); > buf = ReadBuffer(rel, blkno); > LockBuffer(buf, access); > } > else > { > LockBuffer(buf, access); > _bt_checkpage(rel, buf); > } > } > ... > > > This code initializes B-Tree and load data in it when GTT index is access and is not initialized yet. > It looks a little bit hacker but it works. > > I also wonder why you are keeping information about GTT in shared memory. Looks like the only information we really need to share is table's metadata. > But it is already shared though catalog. All other GTT related information is private to backend so I do not see reasons to place it in shared memory. The shared hash structure tracks which backend has initialized the GTT storage in order to implement the DDL of the GTT. As for GTT, there is only one definition(include index on GTT), but each backend may have one data. For the implementation of drop GTT, I assume that all data and definitions need to be deleted. > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > > |
On 08.11.2019 10:50, 曾文旌(义从) wrote: > In my opinion, it is not a good idea to trigger a btbuild with a select or DML, the cost of which depends on the amount of data in the GTT. IMHO it is better than returning error. Also index will be used only if cost of plan with index will be considered better than cost of plan without index. If you do not have index, then you have to scan the whole table. Time of such scan is comparable with time of building index. Yes, I agree that indexes for GTT are used to be created together with table itself before it is used by any application. But if later DBA recognized that efficient execution of queries requires some more indexes, it will be strange and dangerous to prevent him from adding such index until all clients which have accessed this table will drop their connections. Also maintaining in shared memory information about attached backends seems to be overkill. >> >> This code initializes B-Tree and load data in it when GTT index is access and is not initialized yet. >> It looks a little bit hacker but it works. >> >> I also wonder why you are keeping information about GTT in shared memory. Looks like the only information we really need to share is table's metadata. >> But it is already shared though catalog. All other GTT related information is private to backend so I do not see reasons to place it in shared memory. > The shared hash structure tracks which backend has initialized the GTT storage in order to implement the DDL of the GTT. Sorry, I do not understand this argument. DDL is performed on shared metadata present in global catalog. Standard postgres invalidation mechanism is used to notify all backends about schema changes. Why do we need to maintain some extra information in shared memory. Can you give me example of DLL which does't work without such shared hash? > As for GTT, there is only one definition(include index on GTT), but each backend may have one data. > For the implementation of drop GTT, I assume that all data and definitions need to be deleted. Data of dropped GTT is removed on normal backend termination or cleaned up at server restart in case of abnormal shutdown (as it is done for local temp tables). I have not used any shared control structures for GTT in my implementation and that is why I wonder why do you need it and what are the expected problems with my implementation? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
In reply to this post by konstantin knizhnik
My comments for global_private_temp-4.patch
good side: 1 Lots of index type on GTT. I think we need support for all kinds of indexes. 2 serial column on GTT. 3 INHERITS GTT. 4 PARTITION GTT. I didn't choose to support them in the first release, but you did. Other side: 1 case: create global temp table gtt2(a int primary key, b text) on commit delete rows; I think you've lost the meaning of the on commit delete rows clause. After the GTT is created, the other sessions feel that this is an on commit PRESERVE rows GTT. 2 truncate gtt, mybe this is a bug in DropRelFileNodeBuffers. GTT's local buffer is not released. Case: postgres=# insert into gtt2 values(1,'xx'); INSERT 0 1 postgres=# truncate gtt2; TRUNCATE TABLE postgres=# insert into gtt2 values(1,'xx'); ERROR: unexpected data beyond EOF in block 0 of relation base/13579/t3_16384 HINT: This has been seen to occur with buggy kernels; consider updating your system. 3 lock type of truncate GTT. I don't think it's a good idea to hold a big lock with truncate GTT, because it only needs to process private data. 4 GTT's ddl Those ddl that need to rewrite data files may need attention. We have discussed in the previous email. This is why I used shared hash to track the GTT file. 5 There will be problems with DDL that will change relfilenode. Such as cluster GTT ,vacuum full GTT. A session completes vacuum full gtt(a), and other sessions will immediately start reading and writing new storage files and existing data is also lost. I disable them in my current version. 6 drop GTT I think drop GTT should clean up all storage files and definitions. How do you think? 7 MVCC visibility clog clean GTT data visibility rules, like regular tables, so GTT also need clog. We need to avoid the clog that GTT needs to be cleaned up. At the same time, GTT does not do autovacuum, and retaining "too old data" will cause wraparound data loss. I have given a solution in my design. Zeng Wenjing
|
Free forum by Nabble | Edit this page |