In my experience it's not immediately obvious (even after reading the
documentation) the implications of how concurrent index builds manage transactions with respect to multiple concurrent index builds in flight at the same time. Specifically, as I understand multiple concurrent index builds running at the same time will all return at the same time as the longest running one. I've attached a small patch to call this caveat out specifically in the documentation. I think the description in the patch is accurate, but please let me know if there's some intricacies around how the various stages might change the results. James Coleman |
On Wed, Sep 18, 2019 at 01:51:00PM -0400, James Coleman wrote:
> In my experience it's not immediately obvious (even after reading the > documentation) the implications of how concurrent index builds manage > transactions with respect to multiple concurrent index builds in > flight at the same time. > > Specifically, as I understand multiple concurrent index builds running > at the same time will all return at the same time as the longest > running one. > > I've attached a small patch to call this caveat out specifically in > the documentation. I think the description in the patch is accurate, > but please let me know if there's some intricacies around how the > various stages might change the results. The CREATE INDEX docs already say: In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions. Before each table scan, the index build must wait for existing transactions that have modified the table to terminate. After the second scan, the index build must wait for any transactions --> that have a snapshot (see <xref linkend="mvcc"/>) predating the second --> scan to terminate. Then finally the index can be marked ready for use, So, having multiple concurrent index scans is just a special case of having to "wait for any transactions that have a snapshot", no? I am not sure adding a doc mention of other index builds really is helpful. --------------------------------------------------------------------------- > commit 9e28e704820eebb81ff94c1c3cbfb7db087b2c45 > Author: James Coleman <[hidden email]> > Date: Wed Sep 18 13:36:22 2019 -0400 > > Document concurrent indexes waiting on each other > > It's not immediately obvious that because concurrent index building > waits on previously running transactions to complete, running multiple > concurrent index builds at the same time will result in each of them > taking as long to return as the longest takes, so, document this caveat. > > diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml > index 629a31ef79..35f15abb0e 100644 > --- a/doc/src/sgml/ref/create_index.sgml > +++ b/doc/src/sgml/ref/create_index.sgml > @@ -616,6 +616,13 @@ Indexes: > cannot. > </para> > > + <para> > + Because the second table scan must wait for any transactions having a > + snapshot preceding the start of that scan to finish before completing the > + scan, concurrent index builds on multiple tables at the same time will > + not return on any one table until all have completed. > + </para> > + > <para> > Concurrent builds for indexes on partitioned tables are currently not > supported. However, you may concurrently build the index on each -- Bruce Momjian <[hidden email]> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + |
On 2019-Sep-28, Bruce Momjian wrote:
> The CREATE INDEX docs already say: > > In a concurrent index build, the index is actually entered into > the system catalogs in one transaction, then two table scans occur in > two more transactions. Before each table scan, the index build must > wait for existing transactions that have modified the table to terminate. > After the second scan, the index build must wait for any transactions > --> that have a snapshot (see <xref linkend="mvcc"/>) predating the second > --> scan to terminate. Then finally the index can be marked ready for use, > > So, having multiple concurrent index scans is just a special case of > having to "wait for any transactions that have a snapshot", no? I am > not sure adding a doc mention of other index builds really is helpful. I always thought that create index concurrently was prevented from running concurrently in a table by the ShareUpdateExclusive lock that's held during the operation. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera <[hidden email]> wrote:
> > On 2019-Sep-28, Bruce Momjian wrote: > > > The CREATE INDEX docs already say: > > > > In a concurrent index build, the index is actually entered into > > the system catalogs in one transaction, then two table scans occur in > > two more transactions. Before each table scan, the index build must > > wait for existing transactions that have modified the table to terminate. > > After the second scan, the index build must wait for any transactions > > --> that have a snapshot (see <xref linkend="mvcc"/>) predating the second > > --> scan to terminate. Then finally the index can be marked ready for use, > > > > So, having multiple concurrent index scans is just a special case of > > having to "wait for any transactions that have a snapshot", no? I am > > not sure adding a doc mention of other index builds really is helpful. > > I always thought that create index concurrently was prevented from > running concurrently in a table by the ShareUpdateExclusive lock that's > held during the operation. You mean multiple CICs on a single table at the same time? Yes, that (unfortunately) isn't possible, but I'm concerned in the patch with the fact that CIC on table X blocks CIC on table Y. James |
On Sat, Sep 28, 2019 at 09:54:48PM -0400, James Coleman wrote:
> On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera <[hidden email]> wrote: > > > > On 2019-Sep-28, Bruce Momjian wrote: > > > > > The CREATE INDEX docs already say: > > > > > > In a concurrent index build, the index is actually entered into > > > the system catalogs in one transaction, then two table scans occur in > > > two more transactions. Before each table scan, the index build must > > > wait for existing transactions that have modified the table to terminate. > > > After the second scan, the index build must wait for any transactions > > > --> that have a snapshot (see <xref linkend="mvcc"/>) predating the second > > > --> scan to terminate. Then finally the index can be marked ready for use, > > > > > > So, having multiple concurrent index scans is just a special case of > > > having to "wait for any transactions that have a snapshot", no? I am > > > not sure adding a doc mention of other index builds really is helpful. > > > > I always thought that create index concurrently was prevented from > > running concurrently in a table by the ShareUpdateExclusive lock that's > > held during the operation. > > You mean multiple CICs on a single table at the same time? Yes, that > (unfortunately) isn't possible, but I'm concerned in the patch with > the fact that CIC on table X blocks CIC on table Y. I think any open transaction will block CIC, which is my point. -- Bruce Momjian <[hidden email]> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + |
On Sat, Sep 28, 2019 at 9:56 PM Bruce Momjian <[hidden email]> wrote:
> > On Sat, Sep 28, 2019 at 09:54:48PM -0400, James Coleman wrote: > > On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera <[hidden email]> wrote: > > > > > > On 2019-Sep-28, Bruce Momjian wrote: > > > > > > > The CREATE INDEX docs already say: > > > > > > > > In a concurrent index build, the index is actually entered into > > > > the system catalogs in one transaction, then two table scans occur in > > > > two more transactions. Before each table scan, the index build must > > > > wait for existing transactions that have modified the table to terminate. > > > > After the second scan, the index build must wait for any transactions > > > > --> that have a snapshot (see <xref linkend="mvcc"/>) predating the second > > > > --> scan to terminate. Then finally the index can be marked ready for use, > > > > > > > > So, having multiple concurrent index scans is just a special case of > > > > having to "wait for any transactions that have a snapshot", no? I am > > > > not sure adding a doc mention of other index builds really is helpful. While that may be technically true, as a co-worker of mine likes to point out, being "technically correct" is the worst kind of correct. Here's what I mean: First, I believe the docs should aim to be as useful as possible to even those with more entry-level understanding of PostgreSQL. The fact the paragraph you cite actually links to the entire chapter on concurrency control in Postgres demonstrates that there's some not-so-immediate stuff here to consider. For one: is it obvious to all users that the transaction held by CIC (or even that all transactions) has an open snapshot? Second, this is a difference from a regular CREATE INDEX, and we already call out as caveats differences between CREATE INDEX CONCURRENTLY and regular CREATE INDEX as I point out below re: Alvaro's comment. Third, related to the above point, many DDL commands only block DDL against the table being operated on. The fact that CIC here is different is, in my opinion, a fairly surprising break from that pattern, and as such likely to catch users off guard. I can attest that this surprised at least one entire database team a while back :) including many people who've been operating Postgres at a large scale for a long time. I believe caveats like this are worth calling out rather than expecting users to have to understand the implementation details an work out the implications on their own. > > > I always thought that create index concurrently was prevented from > > > running concurrently in a table by the ShareUpdateExclusive lock that's > > > held during the operation. > > > > You mean multiple CICs on a single table at the same time? Yes, that > > (unfortunately) isn't possible, but I'm concerned in the patch with > > the fact that CIC on table X blocks CIC on table Y. > > I think any open transaction will block CIC, which is my point. I read Alvaro as referring to the fact that the docs already call out the following: > Regular index builds permit other regular index builds on the same table to occur simultaneously, but only one concurrent index build can occur on a table at a time. James |
On 2019-Sep-28, James Coleman wrote:
> I believe caveats like this are worth calling out rather than > expecting users to have to understand the implementation details an > work out the implications on their own. I agree. > I read Alvaro as referring to the fact that the docs already call out > the following: > > > Regular index builds permit other regular index builds on the same > > table to occur simultaneously, but only one concurrent index build > > can occur on a table at a time. Yeah, that's what I was understanding. BTW I think there's an approach that could alleviate part of this problem, at least some of the time: whenever CIC runs for an index that's not on expression and not partial, we could set the PROC_IN_VACUUM flag. That would cause it to get ignored by other processes for snapshot purposes (including CIC itself), as well as by vacuum. I need to take some time to research the safety of this, but intuitively it seems safe. Even further, I think we could also do it for regular CREATE INDEX (under the same conditions) provided that it's not run in a transaction block. But that requires even more research/proof. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
In reply to this post by Alvaro Herrera-9
On Sat, Sep 28, 2019 at 10:22:28PM -0300, Alvaro Herrera wrote:
> I always thought that create index concurrently was prevented from > running concurrently in a table by the ShareUpdateExclusive lock that's > held during the operation. REINDEX CONCURRENTLY and CIC can deadlock while waiting for each other to finish after their validation phase, see: https://www.postgresql.org/message-id/20190507030756.GD1499@... https://www.postgresql.org/message-id/20190507032543.GH1499@... -- Michael |
I went ahead and registered this in the current only CF as
https://commitfest.postgresql.org/27/2454/ Alvaro: Would you like to be added as a reviewer? James |
In reply to this post by Michael Paquier-2
On Sun, Sep 29, 2019 at 9:24 PM Michael Paquier <[hidden email]> wrote:
> > On Sat, Sep 28, 2019 at 10:22:28PM -0300, Alvaro Herrera wrote: > > I always thought that create index concurrently was prevented from > > running concurrently in a table by the ShareUpdateExclusive lock that's > > held during the operation. > > REINDEX CONCURRENTLY and CIC can deadlock while waiting for each other > to finish after their validation phase, see: > https://www.postgresql.org/message-id/20190507030756.GD1499@... > https://www.postgresql.org/message-id/20190507032543.GH1499@... Michael, Thanks for the cross-link. Do you think this would be valuable to document at the same time? Or did you just want to ensure we were also aware of this particular downfall? If the latter, I appreciate it, it's helpful info. If the latter, let me know, and I'll try to update the patch. Thanks, James |
In reply to this post by James Coleman
Hi,
On 2019-09-18 13:51:00 -0400, James Coleman wrote: > In my experience it's not immediately obvious (even after reading the > documentation) the implications of how concurrent index builds manage > transactions with respect to multiple concurrent index builds in > flight at the same time. > > Specifically, as I understand multiple concurrent index builds running > at the same time will all return at the same time as the longest > running one. > > I've attached a small patch to call this caveat out specifically in > the documentation. I think the description in the patch is accurate, > but please let me know if there's some intricacies around how the > various stages might change the results. > > James Coleman I'd much rather see effort spent fixing this issue as far as it relates to concurrent CICs. For the snapshot waits we can add a procarray flag (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is doing. Which WaitForOlderSnapshots() can then use to ignore those CICs, which is safe, because those transactions definitely don't insert into relations targeted by CIC. The change to WaitForOlderSnapshots() would just be to pass the new flag to GetCurrentVirtualXIDs, I think. Greetings, Andres Freund |
On Wed, Mar 25, 2020 at 3:19 PM Andres Freund <[hidden email]> wrote:
> > Hi, > > On 2019-09-18 13:51:00 -0400, James Coleman wrote: > > In my experience it's not immediately obvious (even after reading the > > documentation) the implications of how concurrent index builds manage > > transactions with respect to multiple concurrent index builds in > > flight at the same time. > > > > Specifically, as I understand multiple concurrent index builds running > > at the same time will all return at the same time as the longest > > running one. > > > > I've attached a small patch to call this caveat out specifically in > > the documentation. I think the description in the patch is accurate, > > but please let me know if there's some intricacies around how the > > various stages might change the results. > > > > James Coleman > > I'd much rather see effort spent fixing this issue as far as it relates > to concurrent CICs. For the snapshot waits we can add a procarray flag > (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is > doing. Which WaitForOlderSnapshots() can then use to ignore those CICs, > which is safe, because those transactions definitely don't insert into > relations targeted by CIC. The change to WaitForOlderSnapshots() would > just be to pass the new flag to GetCurrentVirtualXIDs, I think. Alvaro: I think you had some ideas on this too; any chance you've know of a patch that anyone's got cooking? Andres: If we got this fixed in current PG would you be opposed to documenting the caveat in previous versions? Thanks, James |
On 2020-Mar-25, James Coleman wrote:
> Alvaro: I think you had some ideas on this too; any chance you've know > of a patch that anyone's got cooking? I posted this in November https://postgr.es/m/20191101203310.GA12239@... but I didn't put time to go through the issues there. I don't know if my approach is exactly what Andres has in mind, but I was discouraged by the number of gotchas for which the optimization I propose has to be turned off. Maybe that preliminary patch can serve as a discussion starter, if nothing else. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
In reply to this post by James Coleman
Hi,
On 2020-03-25 15:24:44 -0400, James Coleman wrote: > Andres: If we got this fixed in current PG would you be opposed to > documenting the caveat in previous versions? Not really. I'm just not confident it's going to be useful, given the amount of details needed to be provided to really make sense of the issue (the earlier CIC phases don't wait for snapshots, but just relation locks etc). Greetings, Andres Freund |
In reply to this post by Alvaro Herrera-9
Hi,
On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > I posted this in November > https://postgr.es/m/20191101203310.GA12239@... but I didn't > put time to go through the issues there. Oh, missed that. > I don't know if my approach is exactly what Andres has in mind Not quite. I don't think it's generally correct for CIC to set PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - we don't want rows to be pruned away from under us. I also think we'd want to set such a flag during all of the CIC phases? What I was thinking of was a new flag, with a distinct value from PROC_IN_VACUUM. It'd currently just be specified in the GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid needing to wait for other CICs on different relations. Since CIC is not permitted on system tables, and CIC doesn't do DML on normal tables, it seems fairly obviously correct to exclude other CICs. Greetings, Andres Freund |
On 2020-Mar-25, Andres Freund wrote:
> > I don't know if my approach is exactly what Andres has in mind > > Not quite. I don't think it's generally correct for CIC to set > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > we don't want rows to be pruned away from under us. I also think we'd > want to set such a flag during all of the CIC phases? > > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. Hmm, that sounds more promising. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
On Wed, Mar 25, 2020 at 05:12:48PM -0300, Alvaro Herrera wrote:
> Hmm, that sounds more promising. Haven't looked at that myself in details. But as I doubt that this would be backpatched, wouldn't it be better to document the issue for now? -- Michael |
In reply to this post by Andres Freund
On Wed, Mar 25, 2020 at 3:58 PM Andres Freund <[hidden email]> wrote:
> > Hi, > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > I posted this in November > > https://postgr.es/m/20191101203310.GA12239@... but I didn't > > put time to go through the issues there. > > Oh, missed that. > > > > I don't know if my approach is exactly what Andres has in mind > > Not quite. I don't think it's generally correct for CIC to set > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > we don't want rows to be pruned away from under us. I also think we'd > want to set such a flag during all of the CIC phases? > > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. That would keep CIC from blocking other CICs, but it wouldn't solve the problem of CIC blocking vacuum on unrelated tables, right? Perhaps that's orthogonal though. James |
Hi,
On 2020-04-15 09:31:58 -0400, James Coleman wrote: > On Wed, Mar 25, 2020 at 3:58 PM Andres Freund <[hidden email]> wrote: > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > > I posted this in November > > > https://postgr.es/m/20191101203310.GA12239@... but I didn't > > > put time to go through the issues there. > > > > Oh, missed that. > > > > > > > I don't know if my approach is exactly what Andres has in mind > > > > Not quite. I don't think it's generally correct for CIC to set > > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > > we don't want rows to be pruned away from under us. I also think we'd > > want to set such a flag during all of the CIC phases? > > > > What I was thinking of was a new flag, with a distinct value from > > PROC_IN_VACUUM. It'd currently just be specified in the > > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > > needing to wait for other CICs on different relations. Since CIC is not > > permitted on system tables, and CIC doesn't do DML on normal tables, it > > seems fairly obviously correct to exclude other CICs. > > That would keep CIC from blocking other CICs, but it wouldn't solve > the problem of CIC blocking vacuum on unrelated tables, right? Perhaps > that's orthogonal though. I am not sure what blocking you are referring to here? CIC shouldn't block vacuum on other tables from running? Or do you just mean that vacuum will not be able to remove some rows due to the snapshot from the CIC? That'd be an orthogonal problem, yes. If it's about the xmin horizon for vacuum: I think we could probably avoid that using the same flag. As vacuum cannot be run against a table that has a CIC running (although it'd theoretically be possible to allow that), it should be safe to ignore PROC_IN_CIC backends in vacuum's GetOldestXmin() call. That might not be true for system relations, but we don't allow CIC on those. Greetings, Andres Freund |
On Wed, Apr 15, 2020 at 6:31 PM Andres Freund <[hidden email]> wrote:
> > Hi, > > On 2020-04-15 09:31:58 -0400, James Coleman wrote: > > On Wed, Mar 25, 2020 at 3:58 PM Andres Freund <[hidden email]> wrote: > > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > > > I posted this in November > > > > https://postgr.es/m/20191101203310.GA12239@... but I didn't > > > > put time to go through the issues there. > > > > > > Oh, missed that. > > > > > > > > > > I don't know if my approach is exactly what Andres has in mind > > > > > > Not quite. I don't think it's generally correct for CIC to set > > > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > > > we don't want rows to be pruned away from under us. I also think we'd > > > want to set such a flag during all of the CIC phases? > > > > > > What I was thinking of was a new flag, with a distinct value from > > > PROC_IN_VACUUM. It'd currently just be specified in the > > > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > > > needing to wait for other CICs on different relations. Since CIC is not > > > permitted on system tables, and CIC doesn't do DML on normal tables, it > > > seems fairly obviously correct to exclude other CICs. > > > > That would keep CIC from blocking other CICs, but it wouldn't solve > > the problem of CIC blocking vacuum on unrelated tables, right? Perhaps > > that's orthogonal though. > > I am not sure what blocking you are referring to here? CIC shouldn't > block vacuum on other tables from running? Or do you just mean that > vacuum will not be able to remove some rows due to the snapshot from the > CIC? That'd be an orthogonal problem, yes. > > If it's about the xmin horizon for vacuum: I think we could probably > avoid that using the same flag. As vacuum cannot be run against a table > that has a CIC running (although it'd theoretically be possible to allow > that), it should be safe to ignore PROC_IN_CIC backends in vacuum's > GetOldestXmin() call. That might not be true for system relations, but > we don't allow CIC on those. Yeah, I mean that if I have a CIC running on table X then vacuum can't remove dead tuples (from after the CIC's snapshot) on table Y. That's a pretty significant danger, given the combination of: 1. Index builds on very large tables can take many days, and 2. The well understood problems of high update tables with dead tuples and poor plans. I've previously discussed this with other hackers and the reasoning they'd understood way that we couldn't always safely ignore PROC_IN_CIC backends in the vacuum's oldest xmin call because of function indexes, and the fact that (despite clear recommendations to the contrary), there's nothing actually preventing someone from adding a function index on table X that queries table Y. I'm not sure I buy that we should care about people doing something clearly so dangerous, but...I grant that it'd be nice not to cause new crashes. James |
Free forum by Nabble | Edit this page |