seems like a bug in pgbench -R

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

seems like a bug in pgbench -R

Tomas Vondra-4
Hi,

There seems to be a bug in pgbench when used with '-R' option, resulting
in stuck pgbench process. Reproducing it is pretty easy:

echo 'select 1' > select.sql

while /bin/true; do
  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
  date;
done;

I do get a stuck pgbench within a few rounds (~10-20), but YMMV. It gets
stuck like this (this is on REL_11_STABLE):

0x00007f3b1814fcb3 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x7fff9d668ec0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0)
    at ../sysdeps/unix/sysv/linux/select.c:41
41      return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds,
(gdb) bt
#0  0x00007f3b1814fcb3 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x7fff9d668ec0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0)
    at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x000000000040834d in threadRun (arg=arg@entry=0x1c9f370) at
pgbench.c:5810
#2  0x0000000000403e0a in main (argc=<optimized out>, argv=<optimized
out>) at pgbench.c:5583

All the connections are already closed at this point (there is nothing
in pg_stat_activity and log_disconnections=on confirms that), and then
there's this:

(gdb) p remains
$1 = 0
(gdb) p nstate
$1 = 1
(gdb) p state[0]
$2 = {con = 0x0, id = 0, state = CSTATE_FINISHED, cstack = 0xf21b10,
use_file = 0, command = 1, variables = 0xf2b0b0, nvariables = 4,
vars_sorted = false, txn_scheduled = 699536519281, sleep_until = 0,
txn_begin = {tv_sec = 699536, tv_nsec = 518478603}, stmt_begin = {tv_sec
= 0,  tv_nsec = 0}, prepared = {false <repeats 128 times>}, cnt = 132,
ecnt = 0}

So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.

cheers

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: seems like a bug in pgbench -R

Fabien COELHO-3

> echo 'select 1' > select.sql
>
> while /bin/true; do
>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
>  date;
> done;

Indeed. I'll look at it over the weekend.

> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
> one of the other commits touching this part of the code.

Yep, possibly.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: seems like a bug in pgbench -R

Fabien COELHO-3

>> echo 'select 1' > select.sql
>>
>> while /bin/true; do
>>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
>>  date;
>> done;
>
> Indeed. I'll look at it over the weekend.
>
>> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
>> one of the other commits touching this part of the code.

I could not reproduce this issue on head, but I confirm on 11.2.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: seems like a bug in pgbench -R

Tomas Vondra-4
On 3/15/19 5:16 PM, Fabien COELHO wrote:

>
>>> echo 'select 1' > select.sql
>>>
>>> while /bin/true; do
>>>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
>>>  date;
>>> done;
>>
>> Indeed. I'll look at it over the weekend.
>>
>>> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
>>> one of the other commits touching this part of the code.
>
> I could not reproduce this issue on head, but I confirm on 11.2.
>

AFAICS on head it's fixed by 3bac77c48f166b9024a5ead984df73347466ae12

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: seems like a bug in pgbench -R

Fabien COELHO-3

Hello Tomas,

>>>> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
>>>> one of the other commits touching this part of the code.
>>
>> I could not reproduce this issue on head, but I confirm on 11.2.
>>
>
> AFAICS on head it's fixed by 3bac77c48f166b9024a5ead984df73347466ae12

Thanks for the information.

I pinpointed the exact issue in one go: no surprise given that the patch
was motivated by cleaning up this kind of external state machine changes
which I thought doubtful and error prone.

Attached is a fix to apply on pg11.

--
Fabien.

pg11-pgbench-rate-fix-1.patch (642 bytes) Download Attachment