BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15815
Logged by:          Steve I
Email address:      [hidden email]
PostgreSQL version: 9.6.12
Operating system:   Amazon Aurora
Description:        

AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

LOG:  server process (PID 31294) was terminated by signal 11: Segmentation
fault
DETAIL:  Failed process was running:
simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))
>= '{string}'
LOG:  terminating any other active server processes
FATAL:  Can't handle storage runtime process crash

This specific SQL will cause a segfault on our dataset 100%. If I change any
part of it it won't e.g. remove lower, or substring, or change > to <, or
any part of the string.  We have a few other variations, but this example is
the most often reported and reproducible.

Guidance on if this is a know issue, how to provide additional information
to further trace it in an AWS environment, or how to bypass it, is most
appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

> LOG:  server process (PID 31294) was terminated by signal 11: Segmentation
> fault
> DETAIL:  Failed process was running:
> simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))
>> = '{string}'
> LOG:  terminating any other active server processes

Huh.  Can you get a stack trace from that?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

Also, could we see the definition of the table (psql \d would be
helpful)?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Euler Taveira
In reply to this post by PG Bug reporting form
Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form
<[hidden email]> escreveu:
>
> AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:
>
Aurora is a Postgres fork so you should report it to Amazon. However, ...

> LOG:  server process (PID 31294) was terminated by signal 11: Segmentation
> fault
> DETAIL:  Failed process was running:
> simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))
> >= '{string}'
> LOG:  terminating any other active server processes
> FATAL:  Can't handle storage runtime process crash
>
Could you reproduce it with stock Postgres? Could you provide a test case?


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Steve-5
I probably need AWS hands to get a trace from behind the curtain. Started a thread there https://forums.aws.amazon.com/thread.jspa?threadID=303488&tstart=0

     Column     |  Type   |                      Modifiers
----------------+---------+------------------------------------------------------
             a  | integer | not null default nextval('{table3}_seq'::regclass)
             b  | integer |
             c  | integer |
             d  | text    |
Indexes:
    … PRIMARY KEY, btree (a)
    … UNIQUE CONSTRAINT, btree (b, c)
    … btree (b)
    … btree (c)
    … btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
    … btree (lower("substring"(d, 1, 1000)), b)
Foreign-key constraints:
    … FOREIGN KEY (b) REFERENCES {table2}(b)
    … FOREIGN KEY (c) REFERENCES {table1}(c)

> 200G so it would/will take time to run tests on stock Postgres.


On Tue, May 21, 2019 at 8:46 AM Euler Taveira <[hidden email]> wrote:
Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form
<[hidden email]> escreveu:
>
> AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:
>
Aurora is a Postgres fork so you should report it to Amazon. However, ...

> LOG:  server process (PID 31294) was terminated by signal 11: Segmentation
> fault
> DETAIL:  Failed process was running:
> simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))
> >= '{string}'
> LOG:  terminating any other active server processes
> FATAL:  Can't handle storage runtime process crash
>
Could you reproduce it with stock Postgres? Could you provide a test case?


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Tom Lane-2
Steve <[hidden email]> writes:

>      Column     |  Type   |                      Modifiers
> ----------------+---------+------------------------------------------------------
>              a  | integer | not null default
> nextval('{table3}_seq'::regclass)
>              b  | integer |
>              c  | integer |
>              d  | text    |
> Indexes:
>     … PRIMARY KEY, btree (a)
>     … UNIQUE CONSTRAINT, btree (b, c)
>     … btree (b)
>     … btree (c)
>     … btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
>     … btree (lower("substring"(d, 1, 1000)), b)
> Foreign-key constraints:
>     … FOREIGN KEY (b) REFERENCES {table2}(b)
>     … FOREIGN KEY (c) REFERENCES {table1}(c)

Hm, so this query is probably using the last of those indexes ---
could we see EXPLAIN output to confirm that?

If so, a plausible explanation is that a portion of that index is corrupt,
although it's certainly not very nice that you're getting a crash rather
than an error report.

If you're in a hurry to restore functionality, dropping and recreating
that index would likely make the problem go away ... but it would also
destroy the evidence we'd need to find the cause of the crash.  So if
you can hold off till we see the stack trace, that'd be nice.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Steve-5
Yeah I agree, and it is, ... our first step was to regenerate all the indexes, but the segfault persists.

We're reproducing the case in a restored-from-snapshot db.  Perhaps I'll reindex again there.  Since we have a AWS snapshot we can jump back pretty fast to retest.

BINGO, An AWS Development Manager just stepped in…  They've identified the problem and deploying a patch release https://forums.aws.amazon.com/thread.jspa?messageID=901775&#901775

On Tue, May 21, 2019 at 9:43 AM Tom Lane <[hidden email]> wrote:
Steve <[hidden email]> writes:
>      Column     |  Type   |                      Modifiers
> ----------------+---------+------------------------------------------------------
>              a  | integer | not null default
> nextval('{table3}_seq'::regclass)
>              b  | integer |
>              c  | integer |
>              d  | text    |
> Indexes:
>     … PRIMARY KEY, btree (a)
>     … UNIQUE CONSTRAINT, btree (b, c)
>     … btree (b)
>     … btree (c)
>     … btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
>     … btree (lower("substring"(d, 1, 1000)), b)
> Foreign-key constraints:
>     … FOREIGN KEY (b) REFERENCES {table2}(b)
>     … FOREIGN KEY (c) REFERENCES {table1}(c)

Hm, so this query is probably using the last of those indexes ---
could we see EXPLAIN output to confirm that?

If so, a plausible explanation is that a portion of that index is corrupt,
although it's certainly not very nice that you're getting a crash rather
than an error report.

If you're in a hurry to restore functionality, dropping and recreating
that index would likely make the problem go away ... but it would also
destroy the evidence we'd need to find the cause of the crash.  So if
you can hold off till we see the stack trace, that'd be nice.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Tom Lane-2
Steve <[hidden email]> writes:
> BINGO, An AWS Development Manager just stepped in…  They've identified the
> problem and deploying a patch release
> https://forums.aws.amazon.com/thread.jspa?messageID=901775&#901775

Oh, so it was their bug not ours?  Sure wish there was more detail there.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Michael Paquier-2
On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote:
> Steve <[hidden email]> writes:
>> BINGO, An AWS Development Manager just stepped in…  They've identified the
>> problem and deploying a patch release
>> https://forums.aws.amazon.com/thread.jspa?messageID=901775&#901775
>
> Oh, so it was their bug not ours?  Sure wish there was more detail there.

Aurora uses a different engine than Postgres as far as I understood,
so we may likely not be impacted by that..  Let's see if we get any
feedback.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Steve-5
Patched Aurora from 1.5.0 to 1.5.1 (which fixes an issue with index prefetch). The issue appears to be fully resolved.

Thanks for the help leading into this.

On Tue, May 21, 2019 at 7:21 PM Michael Paquier <[hidden email]> wrote:
On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote:
> Steve <[hidden email]> writes:
>> BINGO, An AWS Development Manager just stepped in…  They've identified the
>> problem and deploying a patch release
>> https://forums.aws.amazon.com/thread.jspa?messageID=901775&#901775
>
> Oh, so it was their bug not ours?  Sure wish there was more detail there.

Aurora uses a different engine than Postgres as far as I understood,
so we may likely not be impacted by that..  Let's see if we get any
feedback.
--
Michael