While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]), a few ideas were thrown around to simplify our current system of RI checks to enforce foreign keys with the aim of reducing some of its overheads. The two main aspects of how we do these checks that seemingly cause the most overhead are: * Using row-level triggers that are fired during the modification of the referencing and the referenced relations to perform them * Using plain SQL queries issued over SPI There is a discussion nearby titled "More efficient RI checks - take 2" [2] to address this problem from the viewpoint that it is using row-level triggers that causes the most overhead, although there are some posts mentioning that SQL-over-SPI is not without blame here. I decided to focus on the latter aspect and tried reimplementing some checks such that SPI can be skipped altogether. I started with the check that's performed when inserting into or updating the referencing table to confirm that the new row points to a valid row in the referenced relation. The corresponding SQL is this: SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x $1 is the value of the foreign key of the new row. If the query returns a row, all good. Thanks to SPI, or its use of plan caching, the query is re-planned only a handful of times before making a generic plan that is then saved and reused, which looks like this: QUERY PLAN -------------------------------------- LockRows -> Index Scan using pk_pkey on pk x Index Cond: (a = $1) (3 rows) So in most cases, the trigger's function would only execute the plan that's already there, at least in a given session. That's good but what we realized would be even better is if we didn't have to "execute" a full-fledged "plan" for this, that is, to simply find out whether a row containing the key we're looking for exists in the referenced relation and if found lock it. Directly scanning the index and locking it directly with table_tuple_lock() like ExecLockRows() does gives us exactly that behavior, which seems simple enough to be done in a not-so-long local function in ri_trigger.c. I gave that a try and came up with the attached. It also takes care of the case where the referenced relation is partitioned in which case its appropriate leaf partition's index is scanned. The patch results in ~2x improvement in the performance of inserts and updates on referencing tables: create table p (a numeric primary key); insert into p select generate_series(1, 1000000); create table f (a bigint references p); -- unpatched insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 6340.733 ms (00:06.341) update f set a = a + 1; UPDATE 1000000 Time: 7490.906 ms (00:07.491) -- patched insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 3340.808 ms (00:03.341) update f set a = a + 1; UPDATE 1000000 Time: 4178.171 ms (00:04.178) The improvement is even more dramatic when the referenced table (that we're no longer querying over SPI) is partitioned. Here are the numbers when the PK relation has 1000 hash partitions. Unpatched: insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 35898.783 ms (00:35.899) update f set a = a + 1; UPDATE 1000000 Time: 37736.294 ms (00:37.736) Patched: insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 5633.377 ms (00:05.633) update f set a = a + 1; UPDATE 1000000 Time: 6345.029 ms (00:06.345) That's over ~5x improvement! While the above case seemed straightforward enough for skipping SPI, it seems a bit hard to do the same for other cases where we query the *referencing* relation during an operation on the referenced table (for example, checking if the row being deleted is still referenced), because the plan in those cases is not predictably an index scan. Also, the filters in those queries are more than likely to not match the partition key of a partitioned referencing relation, so all partitions will have to scanned. I have left those cases as future work. The patch seems simple enough to consider for inclusion in v14 unless of course we stumble into some dealbreaker(s). I will add this to March CF. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/message-id/CADkLM%3DcTt_8Fg1Jtij5j%2BQEBOxz9Cuu4DiMDYOwdtktDAKzuLw%40mail.gmail.com [2] https://www.postgresql.org/message-id/1813.1586363881%40antos ![]() ![]() |
Hi, I was looking at this statement: insert into f select generate_series(1, 2000000, 2); Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ? I tried a scaled down version (1000th) of your example: yugabyte=# insert into f select generate_series(1, 2000, 2); ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey" DETAIL: Key (a)=(1001) is not present in table "p". For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch : + * Collect partition key values from the unique key. At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ? This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop. Cheers On Mon, Jan 18, 2021 at 4:40 AM Amit Langote <[hidden email]> wrote: While discussing the topic of foreign key performance off-list with |
In reply to this post by Amit Langote
po 18. 1. 2021 v 13:40 odesílatel Amit Langote <[hidden email]> napsal: While discussing the topic of foreign key performance off-list with What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows. |
On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <[hidden email]> wrote:
> po 18. 1. 2021 v 13:40 odesílatel Amit Langote <[hidden email]> napsal: >> I started with the check that's performed when inserting into or >> updating the referencing table to confirm that the new row points to a >> valid row in the referenced relation. The corresponding SQL is this: >> >> SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x >> >> $1 is the value of the foreign key of the new row. If the query >> returns a row, all good. Thanks to SPI, or its use of plan caching, >> the query is re-planned only a handful of times before making a >> generic plan that is then saved and reused, which looks like this: >> >> QUERY PLAN >> -------------------------------------- >> LockRows >> -> Index Scan using pk_pkey on pk x >> Index Cond: (a = $1) >> (3 rows) > > > What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows. I see the same ~2x improvement. create table p (a numeric primary key); insert into p select generate_series(1, 1000); create table f (a bigint references p); Unpatched: insert into f select i%1000+1 from generate_series(1, 1000000) i; INSERT 0 1000000 Time: 5461.377 ms (00:05.461) Patched: insert into f select i%1000+1 from generate_series(1, 1000000) i; INSERT 0 1000000 Time: 2357.440 ms (00:02.357) That's expected because the overhead of using SPI to check the PK table, which the patch gets rid of, is the same no matter the size of the index to be scanned. -- Amit Langote EDB: http://www.enterprisedb.com |
In reply to this post by Zhihong Yu
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <[hidden email]> wrote:
> > Hi, > I was looking at this statement: > > insert into f select generate_series(1, 2000000, 2); > > Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ? > I tried a scaled down version (1000th) of your example: > > yugabyte=# insert into f select generate_series(1, 2000, 2); > ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey" > DETAIL: Key (a)=(1001) is not present in table "p". create table p (a numeric primary key); insert into p select generate_series(1, 2000000); create table f (a bigint references p); -- Unpatched insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 6527.652 ms (00:06.528) update f set a = a + 1; UPDATE 1000000 Time: 8108.310 ms (00:08.108) -- Patched: insert into f select generate_series(1, 2000000, 2); INSERT 0 1000000 Time: 3312.193 ms (00:03.312) update f set a = a + 1; UPDATE 1000000 Time: 4292.807 ms (00:04.293) > For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch : > > + * Collect partition key values from the unique key. > > At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ? > This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop. I've updated the patch to add the Assert. Thanks for taking a look. -- Amit Langote EDB: http://www.enterprisedb.com ![]() ![]() |
Thanks for the quick response. + if (mapped_partkey_attnums[i] == pk_attnums[j]) + { + partkey_vals[i] = pk_vals[j]; + partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false; + i++; + break; The way counter (i) is incremented is out of my expectation. In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ? I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied. I guess a counter other than i would be better for this purpose. Cheers On Mon, Jan 18, 2021 at 6:45 PM Amit Langote <[hidden email]> wrote: On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <[hidden email]> wrote: |
In reply to this post by Amit Langote
On Tue, 19 Jan 2021 at 10:45, Amit Langote <[hidden email]> wrote: > On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <[hidden email]> wrote: >> >> Hi, >> I was looking at this statement: >> >> insert into f select generate_series(1, 2000000, 2); >> >> Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ? >> I tried a scaled down version (1000th) of your example: >> >> yugabyte=# insert into f select generate_series(1, 2000, 2); >> ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey" >> DETAIL: Key (a)=(1001) is not present in table "p". > > Sorry, a wrong copy-paste by me. Try this: > > create table p (a numeric primary key); > insert into p select generate_series(1, 2000000); > create table f (a bigint references p); > > -- Unpatched > insert into f select generate_series(1, 2000000, 2); > INSERT 0 1000000 > Time: 6527.652 ms (00:06.528) > > update f set a = a + 1; > UPDATE 1000000 > Time: 8108.310 ms (00:08.108) > > -- Patched: > insert into f select generate_series(1, 2000000, 2); > INSERT 0 1000000 > Time: 3312.193 ms (00:03.312) > > update f set a = a + 1; > UPDATE 1000000 > Time: 4292.807 ms (00:04.293) > >> For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch : >> >> + * Collect partition key values from the unique key. >> >> At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ? >> This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop. > > I've updated the patch to add the Assert. Thanks for taking a look. After apply the v2 patches, here are some warnings: In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0, from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24: /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’: /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=] do { \ ^ /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’ ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__) ^~~~~~~~~~~~~~ /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’ ereport(elevel, errmsg_internal(__VA_ARGS__)) ^~~~~~~ /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’ elog(ERROR, "unexpected table_tuple_lock status: %u", res); ^~~~ /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here default: ^~~~~~~ -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. |
In reply to this post by Amit Langote
út 19. 1. 2021 v 3:08 odesílatel Amit Langote <[hidden email]> napsal: On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <[hidden email]> wrote: It looks very well. Regards Pavel
|
In reply to this post by japin
I also get this warning. Adding a "break;" at line 418 resolves the warning. |
On Tue, Jan 19, 2021 at 2:56 PM Corey Huinker <[hidden email]> wrote:
>> In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0, >> from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24: >> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’: >> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=] >> do { \ >> ^ >> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’ >> ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__) >> ^~~~~~~~~~~~~~ >> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’ >> ereport(elevel, errmsg_internal(__VA_ARGS__)) >> ^~~~~~~ >> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’ >> elog(ERROR, "unexpected table_tuple_lock status: %u", res); >> ^~~~ >> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here >> default: >> ^~~~~~~ Thanks, will fix it. -- Amit Langote EDB: http://www.enterprisedb.com |
In reply to this post by Amit Langote
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote <[hidden email]> wrote: On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <[hidden email]> wrote: v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert. The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables. A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there. I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers. This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different. |
On Tue, Jan 19, 2021 at 3:46 PM Corey Huinker <[hidden email]> wrote:
> v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert. Thanks for the review. I will look into checking the coverage. > The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables. > > A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there. I decided not to deviate from pk_ terminology so that the new code doesn't look too different from the other code in the file. Although, I guess we can at least call the main function ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've changed that. > I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers. > > [1] https://www.postgresql.org/message-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@... Thanks. I hadn't noticed [1] before today, but after looking it over, it seems that what is being proposed there can still be of use. As long as SPI is used in ri_trigger.c, it makes sense to consider any tweaks addressing its negative impact, especially if they are not very invasive. There's this patch too from the last month: https://commitfest.postgresql.org/32/2930/ > This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different. Yeah, we should continue looking into the ways to make referenced-side RI checks be less bloated. I've attached the updated patch. -- Amit Langote EDB: http://www.enterprisedb.com ![]() ![]() |
In reply to this post by Zhihong Yu
On Tue, Jan 19, 2021 at 12:00 PM Zhihong Yu <[hidden email]> wrote:
> + if (mapped_partkey_attnums[i] == pk_attnums[j]) > + { > + partkey_vals[i] = pk_vals[j]; > + partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false; > + i++; > + break; > > The way counter (i) is incremented is out of my expectation. > In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ? > > I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied. > > I guess a counter other than i would be better for this purpose. I have done that in v3. Thanks. -- Amit Langote EDB: http://www.enterprisedb.com |
In reply to this post by Amit Langote
I decided not to deviate from pk_ terminology so that the new code I agree with leaving the existing terminology where it is for this patch. Changing the function name is probably enough to alert the reader that the things that are called pks may not be precisely that. |
In reply to this post by Amit Langote
I think that's a nice compromise, it makes the reader aware of the concept.
Missing "break" added. Check. Comment updated. Check. Function renamed. Check. Attribute mapping matching test (and assertion) added. Check. Patch applies to an as-of-today master, passes make check and check world. No additional regression tests required, as no new functionality is introduced. No docs required, as there is nothing user-facing. Questions: 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care? 2. Same question for the AtrrMap map, should there be a free_attrmap(). |
On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <[hidden email]> wrote:
>> I decided not to deviate from pk_ terminology so that the new code >> doesn't look too different from the other code in the file. Although, >> I guess we can at least call the main function >> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've >> changed that. > > I think that's a nice compromise, it makes the reader aware of the concept. >> >> I've attached the updated patch. > > Missing "break" added. Check. > Comment updated. Check. > Function renamed. Check. > Attribute mapping matching test (and assertion) added. Check. > Patch applies to an as-of-today master, passes make check and check world. > No additional regression tests required, as no new functionality is introduced. > No docs required, as there is nothing user-facing. > Questions: > 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care? > 2. Same question for the AtrrMap map, should there be a free_attrmap(). I hadn't checked, but yes, the prevailing context is AfterTriggerTupleContext, a short-lived one that is reset for every trigger event tuple. I'm still inclined to explicitly free those objects, so changed like that. While at it, I also changed mapped_partkey_attnums to root_partattrs for readability. Attached v4. -- Amit Langote EDB: http://www.enterprisedb.com ![]() ![]() |
Hi, + for (i = 0; i < riinfo->nkeys; i++) + { + Oid eq_opr = eq_oprs[i]; + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); + + if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid)) It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'. + case TM_Updated: + if (IsolationUsesXactSnapshot()) ... + case TM_Deleted: + if (IsolationUsesXactSnapshot()) It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code). Cheers On Fri, Jan 22, 2021 at 11:10 PM Amit Langote <[hidden email]> wrote: On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <[hidden email]> wrote: |
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <[hidden email]> wrote:
I'll pause on reviewing v4 until you've addressed the suggestions above. |
On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <[hidden email]> wrote:
> On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <[hidden email]> wrote: >> >> Hi, Thanks for the review. >> + for (i = 0; i < riinfo->nkeys; i++) >> + { >> + Oid eq_opr = eq_oprs[i]; >> + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); >> + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); >> + >> + if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid)) >> >> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'. Good idea, so done. Although, there can't be nulls right now. >> + case TM_Updated: >> + if (IsolationUsesXactSnapshot()) >> ... >> + case TM_Deleted: >> + if (IsolationUsesXactSnapshot()) >> >> It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code). Ah, yes. The TM_Updated case used to be handled a bit differently in earlier unposted versions of the patch, though at some point I concluded that the special handling was unnecessary, but didn't realize what you just pointed out. Fixed. > I'll pause on reviewing v4 until you've addressed the suggestions above. Here's v5. -- Amit Langote EDB: http://www.enterprisedb.com ![]() ![]() |
On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <[hidden email]> wrote: On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <[hidden email]> wrote: v5 patches apply to master. Suggested If/then optimization is implemented. Suggested case merging is implemented. Passes make check and make check-world yet again. Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct? Anybody else want to look this patch over before I mark it Ready For Committer? |
Free forum by Nabble | Edit this page |