a segfault failure of query

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

a segfault failure of query

康桥平

hi all

 

  I encountered segmentation fault when executing the sql statement of the query.

 

  I opened the software debug option and reproduced the problem, use the gbd debugging to capture the stack information as follows: 

  $ gdb $dir/11.3_debug/bin/postgres core.402790

  GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7

  Copyright (C) 2013 Free Software Foundation, Inc.

  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

  This is free software: you are free to change and redistribute it.

  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

  and "show warranty" for details.

  This GDB was configured as "x86_64-redhat-linux-gnu".

  For bug reporting instructions, please see:

  <http://www.gnu.org/software/gdb/bugs/>...

  Reading symbols from /paic/postgres/base/11.3_debug/bin/postgres...done.

  [New LWP 402790]

  [Thread debugging using libthread_db enabled]

  Using host libthread_db library "/lib64/libthread_db.so.1".

  Core was generated by `postgres: dbtest: postgres test1 [local] SELECT                             '.

  Program terminated with signal 11, Segmentation fault.

  #0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1951

  1951    fmgr.c: No such file or directory.

  Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.0.1.el7.x86_64 libselinux-2.5-11.el7.x86_64 libxml2-2.9.1-6.0.1.el7_2.3.x86_64 openssl-libs-1.0.2k-12.0.1.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64

  (gdb) bt

  #0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1951

  #1  0x0000000000806fa2 in text_to_cstring (t=0x0) at varlena.c:185

  #2  0x0000000000831845 in FunctionCall1Coll (flinfo=<optimized out>, collation=collation@entry=0, arg1=<optimized out>) at fmgr.c:1123

  #3  0x000000000083293a in OutputFunctionCall (flinfo=<optimized out>, val=<optimized out>) at fmgr.c:1755

  #4  0x0000000000489e89 in printtup (slot=0x2b61018, self=0x2ba7f28be438) at printtup.c:434

  #5  0x00000000005f6380 in ExecutePlan (execute_once=<optimized out>, dest=0x2ba7f28be438, direction=<optimized out>, numberTuples=0, sendTuples=true,

      operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2b47650, estate=0x2b47410) at execMain.c:1762

  #6  standard_ExecutorRun (queryDesc=0x2b45b50, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364

  #7  0x00002ba68bd11fe5 in pgss_ExecutorRun (queryDesc=0x2b45b50, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)

      at pg_stat_statements.c:892

  #8  0x00002ba68bf1959e in explain_ExecutorRun (queryDesc=0x2b45b50, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)

      at auto_explain.c:268

  #9  0x000000000072e0bb in PortalRunSelect (portal=portal@entry=0x2a8b870, forward=forward@entry=true, count=0, count@entry=9223372036854775807,

      dest=dest@entry=0x2ba7f28be438) at pquery.c:932

  #10 0x000000000072f3d0 in PortalRun (portal=portal@entry=0x2a8b870, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,

      run_once=run_once@entry=true, dest=dest@entry=0x2ba7f28be438, altdest=altdest@entry=0x2ba7f28be438,

      completionTag=completionTag@entry=0x7ffdef80dc90 "") at pquery.c:773

  #11 0x000000000072b477 in exec_simple_query (

      query_string=0x29fecd0 "SELECT\n       CASE\nWHEN field1 = 'YxxxTyyyl' THEN\n       '?'\nWHEN field1 IN ('Y1', 'Y2') THEN\n       '??'\nWHEN field1 IN ('X1', 'X2', 'X3', 'X4') THEN\n       '?'\nWHEN field1 IN (\n       '01',\n"...) at postgres.c:1145

  #12 0x000000000072c712 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2a384e0, dbname=0x2a383c8 "test1", username=<optimized out>)

      at postgres.c:4182

  #13 0x000000000047e3c3 in BackendRun (port=0x2a2aa40) at postmaster.c:4358

  #14 BackendStartup (port=0x2a2aa40) at postmaster.c:4030

  #15 ServerLoop () at postmaster.c:1707

  #16 0x00000000006c36d9 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x29f8f80) at postmaster.c:1380

  #17 0x000000000047ee3b in main (argc=3, argv=0x29f8f80) at main.c:228

 

  I try to rewrite the sql content and test it. It is suspected to be related to the automatic parallel execution of the execution plan.

  This problem can be temporarily solved after executing the command set max_parallel_workers_per_gather=0.

 

  Please help to see if there is a patch to fix this problem.

 

  Regards.

 


********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************

Reply | Threaded
Open this post in threaded view
|

Re: a segfault failure of query

Tom Lane-2
=?gb2312?B?v7XHxca9?= <[hidden email]> writes:
>   I encountered segmentation fault when executing the sql statement of the query.

Hm...

>   #0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1951
>   #1  0x0000000000806fa2 in text_to_cstring (t=0x0) at varlena.c:185
>   #2  0x0000000000831845 in FunctionCall1Coll (flinfo=<optimized out>, collation=collation@entry=0, arg1=<optimized out>) at fmgr.c:1123
>   #3  0x000000000083293a in OutputFunctionCall (flinfo=<optimized out>, val=<optimized out>) at fmgr.c:1755

This isn't terribly helpful.  It seems that the query has returned a text
datum that's actually a null (zero) pointer, but where that came from is
impossible to tell at this very late stage of query execution.

>   This problem can be temporarily solved after executing the command set max_parallel_workers_per_gather=0.

That's pretty interesting, but again, not very useful for localizing
the source of the issue.  Parallel query invokes a *lot* of code that
is not in the non-parallel path.

The first thing I'd note is that 11.3 is five minor releases ago
(and it'll be six minor releases out of date by the end of the week).
So really the *first* thing you ought to do is update to 11.8 to see
if this is already fixed.

If it turns out it's not fixed, is there any chance of showing us a
self-contained test case?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

答复: a segfault failure of query

康桥平
The error will occur during execution.
The test results are as follows.

$ psql -p 5432 -d xxxx
Timing is on.
psql (11.3)
Type "help" for help.

#  
# show  max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
---------------------------------
 2
(1 row)

Time: 18.140 ms
 
# explain SELECT
       CASE
WHEN field1 = 'TestAl' THEN
       '?'
WHEN field1 IN ('T1', 'T2') THEN
       '??'
WHEN field1 IN ('S1', 'S2', 'S3', 'S4') THEN
       '?'
WHEN field1 IN (
       '01',
       '02',
       '03',
       '04',
       '05',
       '06',
       '07',
       '08',
       '09',
       '10',
       '11',
       '12'
) THEN
       '?'
WHEN field1 LIKE 'W__' THEN
       '?'
WHEN field1 LIKE '____' THEN
       '?'
ELSE
       ''
END AS "FREQ",
field1 AS "field1"
FROM
       TABLE_D
WHERE
       1 = 1
AND field2 IN ('field28300001')
AND field3 IN ('field38300000')
AND field4 IN ('2000')
AND field5 IN ('field58300007')
AND field6 IN ('field68300000')
AND field7 IN ('Flxjf')
AND field8 IN ('Ikjfjk')
AND
field9
       IN ('field95300000')
AND (
       field1 IN (
              '01',
              '02',
              '03',
              '04',
              '05',
              '06',
              '07',
              '08',
              '09',
              '10',
              '11',
              '12'
       )
)
ORDER BY 1=1
limit 16 offset 77;
                                                                                                                                                                                             
               QUERY PLAN                                                                                                                                                                    
                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------
 Limit  (cost=38327.74..46084.16 rows=16 width=37)
   ->  Gather  (cost=1000.00..52386.25 rows=106 width=37)
         Workers Planned: 2
         ->  Parallel Append  (cost=0.00..51373.82 rows=45 width=37)
               ->  Parallel Bitmap Heap Scan on TABLE_D_2000  (cost=2555.46..51371.79 rows=44 width=3)
                     Recheck Cond: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text))
                     Filter: (((year)::text = '2000'::text) AND ((bl)::text = 'field58300007'::text) AND ((expandch)::text = 'field65300000'::text) AND ((scenario)::text = 'Flxjf'::text) AND ((vie
w)::text = 'Ikjfjk'::text) AND ((value)::text = 'field95300000'::text) AND ((period)::text = ANY ('{01,02,03,04,05,06,07,08,09,10,11,12}'::text[])))
                     ->  Bitmap Index Scan on TABLE_D_2000_index  (cost=0.00..2555.44 rows=64751 width=0)
                           Index Cond: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text))
               ->  Parallel Seq Scan on TABLE_D  (cost=0.00..0.00 rows=1 width=118)
                     Filter: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text) AND ((year)::text = '2000'::text) AND ((bl)::text = 'field58300007'::text) AND ((exp
andch)::text = 'field65300000'::text) AND ((scenario)::text = 'Flxjf'::text) AND ((view)::text = 'Ikjfjk'::text) AND ((value)::text = 'field95300000'::text) AND ((period)::text = ANY ('{01,02,03,
04,05,06,07,08,09,10,11,12}'::text[])))
(11 rows)

Time: 125.984 ms

#
# SELECT
       CASE
WHEN field1 = 'TestAl' THEN
       '?'
WHEN field1 IN ('T1', 'T2') THEN
       '??'
WHEN field1 IN ('S1', 'S2', 'S3', 'S4') THEN
       '?'
WHEN field1 IN (
       '01',
       '02',
       '03',
       '04',
       '05',
       '06',
       '07',
       '08',
       '09',
       '10',
       '11',
       '12'
) THEN
       '?'
WHEN field1 LIKE 'W__' THEN
       '?'
WHEN field1 LIKE '____' THEN
       '?'
ELSE
       ''
END AS "FREQ",
field1 AS "field1"
FROM
       TABLE_D
WHERE
       1 = 1
AND field2 IN ('field25300001')
AND field3 IN ('field35300000')
AND field4 IN ('2000')
AND field5 IN ('field58300007')
AND field6 IN ('field65300000')
AND field7 IN ('Flxjf')
AND field8 IN ('Ikjfjk')
AND
field9
       IN ('field95300000')
AND (
       field1 IN (
              '01',
              '02',
              '03',
              '04',
              '05',
              '06',
              '07',
              '08',
              '09',
              '10',
              '11',
              '12'
       )
)
ORDER BY 1=1
limit 16 offset 77;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 22179.898 ms (00:22.180)
[:@] [08-24.17:52:42]!>
[:@] [08-24.17:52:44]!> \q


After modifying the value of max_parallel_workers_per_gather, SQL execution will not report an error. The test results are as follows

$ psql -p 5432 -d xxxx
Timing is on.
psql (11.3)
Type "help" for help.

[postgres:5432@psrstj] [08-24.17:52:49]=#
[postgres:5432@psrstj] [08-24.17:52:51]=# set  max_parallel_workers_per_gather=0;
SET
Time: 0.431 ms
[postgres:5432@psrstj] [08-24.17:52:53]=# explain SELECT
       CASE
WHEN field1 = 'TestAl' THEN
       '?'
WHEN field1 IN ('T1', 'T2') THEN
       '??'
WHEN field1 IN ('S1', 'S2', 'S3', 'S4') THEN
       '?'
WHEN field1 IN (
       '01',
       '02',
       '03',
       '04',
       '05',
       '06',
       '07',
       '08',
       '09',
       '10',
       '11',
       '12'
) THEN
       '?'
WHEN field1 LIKE 'W__' THEN
       '?'
WHEN field1 LIKE '____' THEN
       '?'
ELSE
       ''
END AS "FREQ",
field1 AS "field1"
FROM
       TABLE_D
WHERE
       1 = 1
AND field2 IN ('field25300001')
AND field3 IN ('field35300000')
AND field4 IN ('2000')
AND field5 IN ('field58300007')
AND field6 IN ('field65300000')
AND field7 IN ('Flxjf')
AND field8 IN ('Ikjfjk')
AND
field9
       IN ('field95300000')
AND (
       field1 IN (
              '01',
              '02',
              '03',
              '04',
              '05',
              '06',
              '07',
              '08',
              '09',
              '10',
              '11',
              '12'
       )
)
ORDER BY 1=1
limit 16 offset 77;
                                                                                                                                                                                             
               QUERY PLAN                                                                                                                                                                    
                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------
 Limit  (cost=38555.41..46566.92 rows=16 width=37)
   ->  Result  (cost=0.00..53076.27 rows=106 width=37)
         ->  Append  (cost=0.00..53072.03 rows=106 width=4)
               ->  Seq Scan on TABLE_D  (cost=0.00..0.00 rows=1 width=118)
                     Filter: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text) AND ((year)::text = '2000'::text) AND ((bl)::text = 'field58300007'::text) AND ((exp
andch)::text = 'field65300000'::text) AND ((scenario)::text = 'Flxjf'::text) AND ((view)::text = 'Ikjfjk'::text) AND ((value)::text = 'field95300000'::text) AND ((period)::text = ANY ('{01,02,03,
04,05,06,07,08,09,10,11,12}'::text[])))
               ->  Bitmap Heap Scan on TABLE_D_2000  (cost=2555.46..53071.50 rows=105 width=3)
                     Recheck Cond: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text))
                     Filter: (((year)::text = '2000'::text) AND ((bl)::text = 'field58300007'::text) AND ((expandch)::text = 'field65300000'::text) AND ((scenario)::text = 'Flxjf'::text) AND ((vie
w)::text = 'Ikjfjk'::text) AND ((value)::text = 'field95300000'::text) AND ((period)::text = ANY ('{01,02,03,04,05,06,07,08,09,10,11,12}'::text[])))
                     ->  Bitmap Index Scan on TABLE_D_2000_index  (cost=0.00..2555.44 rows=64751 width=0)
                           Index Cond: (((account)::text = 'field25300001'::text) AND ((entity)::text = 'field35300000'::text))
(10 rows)

Time: 6.668 ms
#
# SELECT
       CASE
WHEN field1 = 'TestAl' THEN
       '?'
WHEN field1 IN ('T1', 'T2') THEN
       '??'
WHEN field1 IN ('S1', 'S2', 'S3', 'S4') THEN
       '?'
WHEN field1 IN (
       '01',
       '02',
       '03',
       '04',
       '05',
       '06',
       '07',
       '08',
       '09',
       '10',
       '11',
       '12'
) THEN
       '?'
WHEN field1 LIKE 'W__' THEN
       '?'
WHEN field1 LIKE '____' THEN
       '?'
ELSE
       ''
END AS "FREQ",
field1 AS "field1"
FROM
       TABLE_D
WHERE
       1 = 1
AND field2 IN ('field25300001')
AND field3 IN ('field35300000')
AND field4 IN ('2000')
AND field5 IN ('field58300007')
AND field6 IN ('field65300000')
AND field7 IN ('Flxjf')
AND field8 IN ('Ikjfjk')
AND
field9
       IN ('field95300000')
AND (
       field1 IN (
              '01',
              '02',
              '03',
              '04',
              '05',
              '06',
              '07',
              '08',
              '09',
              '10',
              '11',
              '12'
       )
)
ORDER BY 1=1
limit 16 offset 77;
 FREQ | field1
------+--------
 ?    | 02
(1 row)

Time: 146.805 ms


康桥平 Qiaoping Kang
Mobile: 86-13570809194   Phone: 0755-88-670124


-----邮件原件-----
发件人: Tom Lane <[hidden email]>
发送时间: 2020年8月10日 21:45
收件人: 康桥平 <[hidden email]>
抄送: [hidden email]; 栾长苗 <[hidden email]>
主题: Re: a segfault failure of query

=?gb2312?B?v7XHxca9?= <[hidden email]> writes:
>   I encountered segmentation fault when executing the sql statement of the query.

Hm...

>   #0  pg_detoast_datum_packed (datum=0x0) at fmgr.c:1951
>   #1  0x0000000000806fa2 in text_to_cstring (t=0x0) at varlena.c:185
>   #2  0x0000000000831845 in FunctionCall1Coll (flinfo=<optimized out>, collation=collation@entry=0, arg1=<optimized out>) at fmgr.c:1123
>   #3  0x000000000083293a in OutputFunctionCall (flinfo=<optimized
> out>, val=<optimized out>) at fmgr.c:1755

This isn't terribly helpful.  It seems that the query has returned a text datum that's actually a null (zero) pointer, but where that came from is impossible to tell at this very late stage of query execution.

>   This problem can be temporarily solved after executing the command set max_parallel_workers_per_gather=0.

That's pretty interesting, but again, not very useful for localizing the source of the issue.  Parallel query invokes a *lot* of code that is not in the non-parallel path.

The first thing I'd note is that 11.3 is five minor releases ago (and it'll be six minor releases out of date by the end of the week).
So really the *first* thing you ought to do is update to 11.8 to see if this is already fixed.

If it turns out it's not fixed, is there any chance of showing us a self-contained test case?

                        regards, tom lane

********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************