The following bug has been logged on the website:
Bug reference: 16801 Logged by: Alexander Lakhin Email address: [hidden email] PostgreSQL version: 13.1 Operating system: Ubuntu 20.04 Description: When executing the following query: WITH RECURSIVE rec(x) AS ( WITH outermost(x) AS ( SELECT ( WITH innermost as (SELECT 1) SELECT * FROM innermost ) ) SELECT * FROM outermost ) SELECT * FROM rec; valgrind detects an invalid read: ==00:00:00:04.145 217144== Invalid read of size 8 ==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker (parse_cte.c:549) ==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph (parse_cte.c:439) ==00:00:00:04.145 217144== by 0x304557: transformWithClause (parse_cte.c:176) ==00:00:00:04.145 217144== by 0x2DD70A: transformSelectStmt (analyze.c:1202) ==00:00:00:04.145 217144== by 0x2DDAB4: transformStmt (analyze.c:301) ==00:00:00:04.145 217144== by 0x2DEDDA: transformOptionalSelectInto (analyze.c:246) ==00:00:00:04.145 217144== by 0x2DEE0F: transformTopLevelStmt (analyze.c:196) ==00:00:00:04.145 217144== by 0x2DEE71: parse_analyze (analyze.c:116) ==00:00:00:04.145 217144== by 0x55E69F: pg_analyze_and_rewrite (postgres.c:691) ==00:00:00:04.145 217144== by 0x55ED66: exec_simple_query (postgres.c:1155) ==00:00:00:04.145 217144== by 0x560D83: PostgresMain (postgres.c:4315) ==00:00:00:04.145 217144== by 0x4CC6B8: BackendRun (postmaster.c:4526) ==00:00:00:04.145 217144== Address 0x50890a8 is 24 bytes inside a block of size 32 client-defined ==00:00:00:04.145 217144== at 0x6B4831: palloc (mcxt.c:974) ==00:00:00:04.145 217144== by 0x42B624: new_list (list.c:134) ==00:00:00:04.145 217144== by 0x42BF1B: lcons (list.c:458) ==00:00:00:04.145 217144== by 0x302C73: makeDependencyGraphWalker (parse_cte.c:542) ==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph (parse_cte.c:439) ==00:00:00:04.145 217144== by 0x304557: transformWithClause (parse_cte.c:176) ==00:00:00:04.145 217144== by 0x2DD70A: transformSelectStmt (analyze.c:1202) ==00:00:00:04.145 217144== by 0x2DDAB4: transformStmt (analyze.c:301) ==00:00:00:04.145 217144== by 0x2DEDDA: transformOptionalSelectInto (analyze.c:246) ==00:00:00:04.145 217144== by 0x2DEE0F: transformTopLevelStmt (analyze.c:196) ==00:00:00:04.145 217144== by 0x2DEE71: parse_analyze (analyze.c:116) ==00:00:00:04.145 217144== by 0x55E69F: pg_analyze_and_rewrite (postgres.c:691) ==00:00:00:04.145 217144== The first bad commit is 1cff1b95. |
On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote:
> valgrind detects an invalid read: > ==00:00:00:04.145 217144== Invalid read of size 8 > ==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker > (parse_cte.c:549) > ==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph > (parse_cte.c:439) > ==00:00:00:04.145 217144== by 0x304557: transformWithClause > (parse_cte.c:176) > > The first bad commit is 1cff1b95. parse_cte.c, and there are extra ones in split_pathtarget_walker(). I cannot reproduce that here, and I have just tried with different optimization levels on HEAD and REL_13_STABLE. Are you using specific options for valgrind? -- Michael |
Hello Michael,
03.01.2021 08:15, Michael Paquier wrote: > On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote: >> valgrind detects an invalid read: >> ==00:00:00:04.145 217144== Invalid read of size 8 >> ==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker >> (parse_cte.c:549) >> ==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph >> (parse_cte.c:439) >> ==00:00:00:04.145 217144== by 0x304557: transformWithClause >> (parse_cte.c:176) >> >> The first bad commit is 1cff1b95. > The same kind of list manipulation is done in two places in > parse_cte.c, and there are extra ones in split_pathtarget_walker(). I > cannot reproduce that here, and I have just tried with different > optimization levels on HEAD and REL_13_STABLE. Are you using specific > options for valgrind? (c09f6882) with: CPPFLAGS="-DUSE_VALGRIND -Og" ./configure --enable-debug --enable-cassert && make -j8 Also I'm using a patch [1] to inject valgrind into the `make check` procedure. So it's possible to reproduce the issue with the extended with.sql: echo " WITH RECURSIVE rec(x) AS ( WITH outermost(x) AS ( SELECT ( WITH innermost as (SELECT 1) SELECT * FROM innermost ) ) SELECT * FROM outermost ) SELECT * FROM rec; " >>src/test/regress/sql/with.sql make check 'make check' fails and src/test/regress/log/postmaster.log contains the aforementioned valgrind message. If it's still not reproduced for you, please let me know your OS/compiler/valgrind version and I'll try it in your environment too. [1] https://www.postgresql.org/message-id/10dae4a1-e714-601d-7518-c19414255180%40gmail.com Best regards, Alexander |
On Sun, Jan 03, 2021 at 09:00:00AM +0300, Alexander Lakhin wrote:
> I'm using gcc 10.2.0 and valgrind-3.15.0, and building REL_13_STABLE > (c09f6882) with: > CPPFLAGS="-DUSE_VALGRIND -Og" ./configure --enable-debug > --enable-cassert && make -j8 > Also I'm using a patch [1] to inject valgrind into the `make check` > procedure. Thanks, -DUSE_VALGRIND is the part that mattered here. I can now reproduce it. I'll try to look at that more in details. -- Michael |
In reply to this post by Michael Paquier-2
Hello Michael,
03.01.2021 08:15, Michael Paquier wrote: > On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote: >> valgrind detects an invalid read: >> ==00:00:00:04.145 217144== Invalid read of size 8 >> ==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker >> (parse_cte.c:549) >> ==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph >> (parse_cte.c:439) >> ==00:00:00:04.145 217144== by 0x304557: transformWithClause >> (parse_cte.c:176) >> >> The first bad commit is 1cff1b95. > The same kind of list manipulation is done in two places in > parse_cte.c, and there are extra ones in split_pathtarget_walker(). I > cannot reproduce that here, and I have just tried with different > optimization levels on HEAD and REL_13_STABLE. Are you using specific > options for valgrind? following usage pattern: List *testList = NIL; ListCell *testCell; testList = lcons(NIL, testList); testCell = list_head(testList); ... testList = lcons(NIL, testList); elog(INFO, "lfirst(testCell): %p", lfirst(testCell)); // prints 0x7f7f7f7f7f7f7f7f when compiled with -DUSE_VALGRIND (Such list manipulation is happening in that makeDependencyGraphWalker call.) Best regards, Alexander |
On Tue, Feb 23, 2021 at 09:00:00AM +0300, Alexander Lakhin wrote:
> I've found out that the list implementation doesn't support the > following usage pattern: > List *testList = NIL; > ListCell *testCell; > > testList = lcons(NIL, testList); > testCell = list_head(testList); > ... > testList = lcons(NIL, testList); > > elog(INFO, "lfirst(testCell): %p", lfirst(testCell)); // prints > 0x7f7f7f7f7f7f7f7f when compiled with -DUSE_VALGRIND > > (Such list manipulation is happening in that makeDependencyGraphWalker > call.) Anyway.. What's happening here is that the second lcons() call does new_head_cell() as testList is not NIL. This itself calls enlarge_list(), followed by wipe_mem() which invalidates the position of the list head previously stored. Oops. Coming back to the original problem, as you say there is some confusion about the list operation that we had better clarify. From what I can see cstate->innerwiths stores a List of Lists, and in each passage the code tries to build a new list appended to cstate->innerwiths. Using a ListCell to be able to track where the new list head is located is awkward on HEAD (the business with cell1), and there should be no need to keep around the reference to the first element innerwiths, as long as you save the result in a new, separate, List. The second case in checkWellFormedRecursionWalker() is equally dangerous. I have been playing with subselects and more recursions and could not trigger a problem with -DUSE_VALGRIND, but let's be safe. So attached is a patch to take care of this, with a regression test based on what has been sent upthread. This solves the issue for me. Thoughts? -- Michael |
Michael Paquier <[hidden email]> writes:
> So attached is a patch to take care of this, with a regression test > based on what has been sent upthread. This solves the issue for me. Surely that breaks things entirely (if it doesn't, then we are badly under-testing this area). A nil list is just a null pointer, so appending to "new_cte_list" later isn't going to affect what was previously put into the innerwiths list. I haven't tested, but I think a more correct fix would be - ListCell *cell1; cstate->innerwiths = lcons(NIL, cstate->innerwiths); - cell1 = list_head(cstate->innerwiths); foreach(lc, stmt->withClause->ctes) { CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc); + ListCell *cell1; (void) makeDependencyGraphWalker(cte->ctequery, cstate); + /* note innerwiths list can change during recursion */ + cell1 = list_head(cstate->innerwiths); lfirst(cell1) = lappend((List *) lfirst(cell1), cte); } ie, recompute the "cell1" pointer each time it's needed instead of assuming that the original value is good throughout the loop. regards, tom lane |
I wrote:
> Surely that breaks things entirely (if it doesn't, then we are badly > under-testing this area). A nil list is just a null pointer, so > appending to "new_cte_list" later isn't going to affect what was > previously put into the innerwiths list. Here's a patch that I think fixes it correctly, including a test case that doesn't work with your patch. regards, tom lane diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c index f4f7041ead..f46d63d451 100644 --- a/src/backend/parser/parse_cte.c +++ b/src/backend/parser/parse_cte.c @@ -730,15 +730,15 @@ makeDependencyGraphWalker(Node *node, CteState *cstate) * In the non-RECURSIVE case, query names are visible to the * WITH items after them and to the main query. */ - ListCell *cell1; - cstate->innerwiths = lcons(NIL, cstate->innerwiths); - cell1 = list_head(cstate->innerwiths); foreach(lc, stmt->withClause->ctes) { CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc); + ListCell *cell1; (void) makeDependencyGraphWalker(cte->ctequery, cstate); + /* note that recursion could mutate innerwiths list */ + cell1 = list_head(cstate->innerwiths); lfirst(cell1) = lappend((List *) lfirst(cell1), cte); } (void) raw_expression_tree_walker(node, @@ -1006,15 +1006,15 @@ checkWellFormedRecursionWalker(Node *node, CteState *cstate) * In the non-RECURSIVE case, query names are visible to the * WITH items after them and to the main query. */ - ListCell *cell1; - cstate->innerwiths = lcons(NIL, cstate->innerwiths); - cell1 = list_head(cstate->innerwiths); foreach(lc, stmt->withClause->ctes) { CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc); + ListCell *cell1; (void) checkWellFormedRecursionWalker(cte->ctequery, cstate); + /* note that recursion could mutate innerwiths list */ + cell1 = list_head(cstate->innerwiths); lfirst(cell1) = lappend((List *) lfirst(cell1), cte); } checkWellFormedSelectStmt(stmt, cstate); diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out index c519a61c4f..be4ba1825f 100644 --- a/src/test/regress/expected/with.out +++ b/src/test/regress/expected/with.out @@ -176,6 +176,29 @@ ERROR: operator does not exist: text + integer LINE 4: SELECT n+1 FROM t WHERE n < 10 ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. +-- Deeply nested WITH caused a list-munging problem in v13 +with recursive w1(c1) as + (with w2(c2) as + (with w3(c3) as + (with w4(c4) as + (with w5(c5) as + (with recursive w6(c6) as + (with w6(c6) as + (with w8(c8) as + (select 1) + select * from w8) + select * from w6) + select * from w6) + select * from w5) + select * from w4) + select * from w3) + select * from w2) +select * from w1; + c1 +---- + 1 +(1 row) + -- -- Some examples with a tree -- diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql index f4ba0d8e39..b2f9ac07e8 100644 --- a/src/test/regress/sql/with.sql +++ b/src/test/regress/sql/with.sql @@ -95,6 +95,25 @@ UNION ALL ) SELECT n, pg_typeof(n) FROM t; +-- Deeply nested WITH caused a list-munging problem in v13 +with recursive w1(c1) as + (with w2(c2) as + (with w3(c3) as + (with w4(c4) as + (with w5(c5) as + (with recursive w6(c6) as + (with w6(c6) as + (with w8(c8) as + (select 1) + select * from w8) + select * from w6) + select * from w6) + select * from w5) + select * from w4) + select * from w3) + select * from w2) +select * from w1; + -- -- Some examples with a tree -- |
On Wed, Feb 24, 2021 at 07:36:26PM -0500, Tom Lane wrote:
> Here's a patch that I think fixes it correctly, including a test > case that doesn't work with your patch. Yes, thanks. While looking at that this morning, I have been able to get a crash with my previous patch once I used more nesting in those CTEs and your test is much simpler. I also got a test case able to break things the same way in checkWellFormedRecursionWalker(): WITH RECURSIVE outermost(x) AS ( SELECT 1 UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2) SELECT * FROM outermost UNION SELECT * FROM innermost) ) SELECT * FROM outermost ORDER BY 1; Your patch does not have a test for that, but it fixes the list handling. With more nested levels and some UNIONs, the patch I sent previously would equally break, though I am not sure we need more than what I am sending here. What do you think about this extra test? -- Michael |
Michael Paquier <[hidden email]> writes:
> Yes, thanks. While looking at that this morning, I have been able to > get a crash with my previous patch once I used more nesting in those > CTEs and your test is much simpler. I also got a test case able to > break things the same way in checkWellFormedRecursionWalker(): > WITH RECURSIVE outermost(x) AS ( > SELECT 1 > UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2) > SELECT * FROM outermost > UNION SELECT * FROM innermost) > ) > SELECT * FROM outermost ORDER BY 1; Hmm, I don't see any failure from that... In my understanding of the bug, you need at least half a dozen levels of WITH nesting to provoke a problem, because nothing will go wrong until the innerwiths list gets to be six entries long, causing list.c to move its elements array to somewhere else. If it is possible to fail without that then there's still something here that I don't get. regards, tom lane |
In reply to this post by Michael Paquier-2
On Thu, Feb 25, 2021 at 10:30:41AM +0900, Michael Paquier wrote:
> Your patch does not have a test for that, but it fixes the list > handling. With more nested levels and some UNIONs, the patch I sent > previously would equally break, though I am not sure we need more than > what I am sending here. What do you think about this extra test? By the way, here is a fancier test case to make the list handling recurse much more in checkWellFormedRecursionWalker(): WITH RECURSIVE outermost(x) AS ( SELECT 1 UNION (WITH innermost1 AS ( SELECT 2 UNION (WITH innermost2 AS ( SELECT 3 UNION (WITH innermost3 AS ( SELECT 4 UNION (WITH innermost4 AS ( SELECT 5 UNION (WITH innermost5 AS (SELECT 6) SELECT * FROM innermost5)) SELECT * FROM innermost4)) SELECT * FROM innermost3)) SELECT * FROM innermost2)) SELECT * FROM outermost UNION SELECT * FROM innermost1) ) SELECT * FROM outermost ORDER BY 1; -- Michael |
In reply to this post by Tom Lane-2
On Wed, Feb 24, 2021 at 09:13:46PM -0500, Tom Lane wrote:
> Michael Paquier <[hidden email]> writes: >> WITH RECURSIVE outermost(x) AS ( >> SELECT 1 >> UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2) >> SELECT * FROM outermost >> UNION SELECT * FROM innermost) >> ) >> SELECT * FROM outermost ORDER BY 1; > > Hmm, I don't see any failure from that... this fails with only two nested levels? I get a failure once I do that. > In my understanding of the bug, you need at least half a dozen levels of > WITH nesting to provoke a problem, because nothing will go wrong until the > innerwiths list gets to be six entries long, causing list.c to move its > elements array to somewhere else. If it is possible to fail without that > then there's still something here that I don't get. Anyway, I also get a failure without -DUSE_VALGRIND once I apply 6 nested levels: #1 0x00007ff47eb5b537 in __GI_abort () at abort.c:79 #2 0x000055b24c7b340e in ExceptionalCondition (conditionName=0x55b24c8e917a "cstate->innerwiths == NIL", errorType=0x55b24c8e8803 "FailedAssertion", fileName=0x55b24c8e8a08 "parse_cte.c", lineNumber=869) at assert.c:69 #3 0x000055b24c2db9a5 in checkWellFormedRecursion (cstate=0x7ffe2fce8a30) at parse_cte.c:869 Here you go with a test case: WITH RECURSIVE outermost(x) AS ( SELECT 1 UNION (WITH innermost1 AS ( SELECT 2 UNION (WITH innermost2 AS ( SELECT 3 UNION (WITH innermost3 AS ( SELECT 4 UNION (WITH innermost4 AS ( SELECT 5 UNION (WITH innermost5 AS ( SELECT 6 UNION (WITH innermost6 AS (SELECT 7) SELECT * FROM innermost6)) SELECT * FROM innermost5)) SELECT * FROM innermost4)) SELECT * FROM innermost3)) SELECT * FROM innermost2)) SELECT * FROM outermost UNION SELECT * FROM innermost1) ) SELECT * FROM outermost ORDER BY 1; My previous patch and HEAD break on that in checkWellFormedRecursionWalker(), not your patch. -- Michael |
Michael Paquier <[hidden email]> writes:
> On Wed, Feb 24, 2021 at 09:13:46PM -0500, Tom Lane wrote: >> Hmm, I don't see any failure from that... > Perhaps because you are not compiling with -DUSE_VALGRIND which is why > this fails with only two nested levels? Ah, right, because that enables DEBUG_LIST_MEMORY_USAGE. I did run it under valgrind but I hadn't bothered to recompile. Anyway, I think we're better off with a test that doesn't require DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY, to show the problem. The test I showed misbehaves even in non-debug builds, because it actually depends on what's in the innerwiths lists. regards, tom lane |
On Wed, Feb 24, 2021 at 09:48:03PM -0500, Tom Lane wrote:
> Anyway, I think we're better off with a test that doesn't require > DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY, > to show the problem. The test I showed misbehaves even in non-debug > builds, because it actually depends on what's in the innerwiths lists. Yeah. What I sent previously does not break in non-debug builds, but it does with just --enable-cassert. Hmm. I'd like to think that this would be enough for this thread, and I cannot come up now with a test able to break the lists for non-debug builds. But perhaps you have an idea? -- Michael |
Michael Paquier <[hidden email]> writes:
> On Wed, Feb 24, 2021 at 09:48:03PM -0500, Tom Lane wrote: >> Anyway, I think we're better off with a test that doesn't require >> DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY, >> to show the problem. The test I showed misbehaves even in non-debug >> builds, because it actually depends on what's in the innerwiths lists. > Yeah. What I sent previously does not break in non-debug builds, but > it does with just --enable-cassert. > Hmm. I'd like to think that this would be enough for this thread, and > I cannot come up now with a test able to break the lists for non-debug > builds. But perhaps you have an idea? For me, the example I gave fails in a non-debug build. With the code as of HEAD, I get "ERROR: stack depth limit exceeded" or a segfault. With your patch applied, it complains about w6 not having the correct form for a recursive CTE. regards, tom lane |
On Thu, Feb 25, 2021 at 10:19:43AM -0500, Tom Lane wrote:
> For me, the example I gave fails in a non-debug build. With the code as > of HEAD, I get "ERROR: stack depth limit exceeded" or a segfault. With > your patch applied, it complains about w6 not having the correct form for > a recursive CTE. Yes, same here for the test stressing makeDependencyGraphWalker(). The second test I posted for checkWellFormedRecursionWalker() passes on HEAD in a non-debug build, and triggers an assertion with debug builds. So the second test requires CLOBBER_FREED_MEMORY but not DEBUG_LIST_MEMORY_USAGE. That's not as good as the first one, but I would vote for having this second test than none to stress more the list handling when looking after invalid self-references in a CTE. This gives me the attached. I would rather have both tests in the version committed, but if you think that this addition is not necessary I won't fight hard either :) -- Michael |
Michael Paquier <[hidden email]> writes:
> I would rather have both tests in the version committed, but if you > think that this addition is not necessary I won't fight hard either :) I'm doubtful that the extra test is really testing an independent issue, but I don't wanna fight about it either. Will push this in a few. regards, tom lane |
On Thu, Feb 25, 2021 at 08:31:26PM -0500, Tom Lane wrote:
> I'm doubtful that the extra test is really testing an independent > issue, but I don't wanna fight about it either. Will push this > in a few. Thanks! -- Michael |
Free forum by Nabble | Edit this page |