BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15943
Logged by:          Alexander Lakhin
Email address:      [hidden email]
PostgreSQL version: 12beta2
Operating system:   Ubuntu 18.04
Description:        

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

I get the following error:
==00:00:00:12.885 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:12.885 19648==    at 0x5054281: write (write.c:27)
==00:00:00:12.885 19648==    by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:12.885 19648==    by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:12.885 19648==    by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:12.885 19648==    by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:12.885 19648==    by 0x31DABA: asyncQueueAddEntries
(async.c:1379)
==00:00:00:12.885 19648==    by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:12.885 19648==    by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:12.885 19648==    by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:12.885 19648==    by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:12.885 19648==    by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:12.885 19648==    by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:12.885 19648==  Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:12.885 19648==  Uninitialised value was created by a stack
allocation
==00:00:00:12.885 19648==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Increasing the count to 8192 produces another one, slightly different
stack:
==00:00:00:29.999 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:29.999 19648==    at 0x5054281: write (write.c:27)
==00:00:00:29.999 19648==    by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:29.999 19648==    by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:29.999 19648==    by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:29.999 19648==    by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2022)
==00:00:00:29.999 19648==    by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:29.999 19648==    by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:29.999 19648==    by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:29.999 19648==    by 0x4B481B: BackendStartup
(postmaster.c:4122)
==00:00:00:29.999 19648==    by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:29.999 19648==    by 0x4B5F70: PostmasterMain
(postmaster.c:1377)
==00:00:00:29.999 19648==    by 0x406428: main (main.c:228)
==00:00:00:29.999 19648==  Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:29.999 19648==  Uninitialised value was created by a stack
allocation
==00:00:00:29.999 19648==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

(It's also reproduced on REL_12_BETA3, REL9_6_15.)

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Andres Freund
Hi,

On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <[hidden email]> wrote:

>The following bug has been logged on the website:
>
>Bug reference:      15943
>Logged by:          Alexander Lakhin
>Email address:      [hidden email]
>PostgreSQL version: 12beta2
>Operating system:   Ubuntu 18.04
>Description:        
>
>When running under valgrind (sitting on the master branch):
>printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>
>I get the following error:
>==00:00:00:12.885 19648== Syscall param write(buf) points to
>uninitialised
>byte(s)
>==00:00:00:12.885 19648==    at 0x5054281: write (write.c:27)
>==00:00:00:12.885 19648==    by 0x2687F0: SlruPhysicalWritePage
>(slru.c:851)
>==00:00:00:12.885 19648==    by 0x268E95: SlruInternalWritePage
>(slru.c:543)
>==00:00:00:12.885 19648==    by 0x2690F1: SlruSelectLRUPage
>(slru.c:1095)
>==00:00:00:12.885 19648==    by 0x269777: SimpleLruZeroPage
>(slru.c:269)
>==00:00:00:12.885 19648==    by 0x31DABA: asyncQueueAddEntries
>(async.c:1379)
>==00:00:00:12.885 19648==    by 0x31F130: PreCommit_Notify
>(async.c:845)
>==00:00:00:12.885 19648==    by 0x273C19: CommitTransaction
>(xact.c:2128)
>==00:00:00:12.885 19648==    by 0x274A10: CommitTransactionCommand
>(xact.c:2921)
>==00:00:00:12.885 19648==    by 0x53BD7D: finish_xact_command
>(postgres.c:2597)
>==00:00:00:12.885 19648==    by 0x53E2F3: exec_simple_query
>(postgres.c:1256)
>==00:00:00:12.885 19648==    by 0x540545: PostgresMain
>(postgres.c:4256)
>==00:00:00:12.885 19648==  Address 0xf4f95bf is in a rw- anonymous
>segment
>==00:00:00:12.885 19648==  Uninitialised value was created by a stack
>allocation
>==00:00:00:12.885 19648==    at 0x31D967: asyncQueueAddEntries
>(async.c:1306)
>
>Increasing the count to 8192 produces another one, slightly different
>stack:
>==00:00:00:29.999 19648== Syscall param write(buf) points to
>uninitialised
>byte(s)
>==00:00:00:29.999 19648==    at 0x5054281: write (write.c:27)
>==00:00:00:29.999 19648==    by 0x2687F0: SlruPhysicalWritePage
>(slru.c:851)
>==00:00:00:29.999 19648==    by 0x268E95: SlruInternalWritePage
>(slru.c:543)
>==00:00:00:29.999 19648==    by 0x26A1BD: SimpleLruTruncate
>(slru.c:1234)
>==00:00:00:29.999 19648==    by 0x31D5A5: asyncQueueAdvanceTail
>(async.c:2022)
>==00:00:00:29.999 19648==    by 0x31F231: ProcessCompletedNotifies
>(async.c:1132)
>==00:00:00:29.999 19648==    by 0x5403D1: PostgresMain
>(postgres.c:4176)
>==00:00:00:29.999 19648==    by 0x4B16D9: BackendRun
>(postmaster.c:4431)
>==00:00:00:29.999 19648==    by 0x4B481B: BackendStartup
>(postmaster.c:4122)
>==00:00:00:29.999 19648==    by 0x4B4B32: ServerLoop
>(postmaster.c:1704)
>==00:00:00:29.999 19648==    by 0x4B5F70: PostmasterMain
>(postmaster.c:1377)
>==00:00:00:29.999 19648==    by 0x406428: main (main.c:228)
>==00:00:00:29.999 19648==  Address 0xf4fb5bf is in a rw- anonymous
>segment
>==00:00:00:29.999 19648==  Uninitialised value was created by a stack
>allocation
>==00:00:00:29.999 19648==    at 0x31D967: asyncQueueAddEntries
>(async.c:1306)
>
>(It's also reproduced on REL_12_BETA3, REL9_6_15.)

That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Alexander Lakhin-2
Hello Andres,

08.08.2019 13:33, Andres Freund wrote:

> Hi,
>
> On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <[hidden email]> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      15943
>> Logged by:          Alexander Lakhin
>> Email address:      [hidden email]
>> PostgreSQL version: 12beta2
>> Operating system:   Ubuntu 18.04
>> Description:        
>>
>> When running under valgrind (sitting on the master branch):
>> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>>
> That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.
>
With "--track-origins=yes --read-var-info=yes" I get:

==00:00:00:30.800 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:30.801 24110==    at 0x5054281: write (write.c:27)
==00:00:00:30.801 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:30.801 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:30.801 24110==    by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:30.801 24110==    by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:30.801 24110==    by 0x31DABA: asyncQueueAddEntries
(async.c:1381)
==00:00:00:30.801 24110==    by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:30.801 24110==    by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:30.801 24110==    by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:30.801 24110==    by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:30.801 24110==    by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:30.801 24110==    by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:30.801 24110==  Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:30.801 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:30.801 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

and
==00:00:00:48.177 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:48.177 24110==    at 0x5054281: write (write.c:27)
==00:00:00:48.177 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:48.177 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:48.177 24110==    by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:48.177 24110==    by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2024)
==00:00:00:48.177 24110==    by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:48.177 24110==    by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:48.177 24110==    by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:48.177 24110==    by 0x4B481B: BackendStartup (postmaster.c:4122)
==00:00:00:48.177 24110==    by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:48.178 24110==    by 0x4B5F70: PostmasterMain (postmaster.c:1377)
==00:00:00:48.178 24110==    by 0x406428: main (main.c:228)
==00:00:00:48.178 24110==  Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:48.178 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:48.178 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
place fixes it too (the patch is attached).
So it seems, the padding in AsyncQueueEntry has an alibi.

Best regards,
Alexander


asyncQueueAddEntries-memset.patch (454 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Alvaro Herrera-9
On 2019-Aug-08, Alexander Lakhin wrote:

> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
> asyncQueueAddEntries fixes the issue.
> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
> place fixes it too (the patch is attached).

Maybe VALGRIND_MAKE_MEM_DEFINED?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Andres Freund
Hi,

On August 8, 2019 12:43:32 PM EDT, Alvaro Herrera <[hidden email]> wrote:
>On 2019-Aug-08, Alexander Lakhin wrote:
>
>> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
>> asyncQueueAddEntries fixes the issue.
>> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the
>same
>> place fixes it too (the patch is attached).
>
>Maybe VALGRIND_MAKE_MEM_DEFINED?

Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Tom Lane-2
Andres Freund <[hidden email]> writes:
> Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?

I think all we need here is another suppression in
src/tools/valgrind.supp.  We have not insisted on zeroing
pad bytes that get sent to disk in the places already
enumerated in valgrind.supp, so why would we apply a
different rule to async.c?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Andres Freund
Hi,

On August 8, 2019 2:41:42 PM EDT, Tom Lane <[hidden email]> wrote:
>Andres Freund <[hidden email]> writes:
>> Seems there's no reason to not zero initialize memory here? But
>perhaps just by initializing the stack variable?
>
>I think all we need here is another suppression in
>src/tools/valgrind.supp.  We have not insisted on zeroing
>pad bytes that get sent to disk in the places already
>enumerated in valgrind.supp, so why would we apply a
>different rule to async.c?

Well, with a lot of the other case there's a lot of sources for such uninitialized data. Here there probably is just one? If it weren't such a game of whack a mole, I'd even consider properly initializing the other places. And initializing the stack data here ought to be cheap in this case?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Alexander Lakhin-2
In reply to this post by Alexander Lakhin-2
Hello hackers,

> 08.08.2019 13:33, Andres Freund wrote:
>> On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <[hidden email]> wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      15943
>>> Logged by:          Alexander Lakhin
>>> Email address:      [hidden email]
>>> PostgreSQL version: 12beta2
>>> Operating system:   Ubuntu 18.04
>>> Description:        
>>>
>>> When running under valgrind (sitting on the master branch):
>>> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>>>
>> That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.
> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
> asyncQueueAddEntries fixes the issue.
> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
> place fixes it too (the patch is attached).
> So it seems, the padding in AsyncQueueEntry has an alibi.
Andres, you're right. The padding is found guilty (inside the qe.data
itself).
Please look at the patch addressing the cause of the problem (inspired
by btree_bit.c).
But if a valgrind suppression is preferred, look at the alternate patch.

Best regards,
Alexander

fix-aqe-padding.patch (1K) Download Attachment
valgrind_slru_write.patch (377 bytes) Download Attachment