pgbench and timestamps

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

pgbench and timestamps

jaime soler
Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql: 
 (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
query executed successfullly with psql
/usr/lib/postgresql/12/bin/psql -p 5432 -h localhost -d picp -U postgres -f time.sql
BEGIN
?column?
--------------------------
2005-11-24 13:22:02.4781
(1 fila)COMMIT
psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
but look at what happen with pgbench
pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
pghost: localhost pgport: 5432 nclients: 2 nxacts: 10 dbName: picp
starting vacuum...ERROR: no existe la relación «pgbench_branches»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_tellers»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_history»
(ignoring this error and continuing anyway)
end.
client 0 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^
client 0 sending P0_0
client 0 receiving
client 0 receiving
client 0 sending P0_1
client 0 receiving
client 0 receiving
client 0 script 0 aborted in command 1 query 0: ERROR: no existe la sentencia preparada «P0_1»
client 1 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^Run was aborted; the above results are incomplete.
pgbench (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00' 

Regards
Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

David Rowley
On Wed, 24 Jun 2020 at 20:41, Jaime Soler <[hidden email]> wrote:
>
> Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql:
>
>  (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
> pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
> ERROR:  la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
>
> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David


Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

jaime soler
Hi,

Thanks for your comments, I worked around that problem because I was able to truncate the timestamp and use only the date part , alsoit might works the use of to_timestamp.  But I would like to understand what is happening , I realized that pgbench is identified erroneously  the minutes and seconds parts :00:00 as two variables .

Regards

El mié., 24 jun. 2020 a las 14:50, David Rowley (<[hidden email]>) escribió:
On Wed, 24 Jun 2020 at 20:41, Jaime Soler <[hidden email]> wrote:
>
> Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql:
>
>  (select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' -  timestamp '2005-09-01 00:00:00' ));
> pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
> ERROR:  la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»
>
> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David
Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

Tom Lane-2
In reply to this post by David Rowley
David Rowley <[hidden email]> writes:
> On Wed, 24 Jun 2020 at 20:41, Jaime Soler <[hidden email]> wrote:
>> I don't know why pgbench use  timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

> I've not debugged it, but it looks like pgbench thinks that :00 is a
> pgbench variable and is replacing each instance with a query
> parameter.

Yeah.

> I don't often do much with pgbench and variables, but there are a few
> things that surprise me here.

> 1) That pgbench replaces variables within single quotes, and;
> 2) that we still think it's a variable name when it starts with a digit, and;
> 3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

I can think of use-cases for substituting variables inside quotes,
so maybe (1) isn't a bug; but it sure seems like (3) and (4) are.

In any case, the documentation about this seems pretty inadequate.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

Tom Lane-2
I wrote:
> David Rowley <[hidden email]> writes:
>> I don't often do much with pgbench and variables, but there are a few
>> things that surprise me here.
>> 1) That pgbench replaces variables within single quotes, and;
>> 2) that we still think it's a variable name when it starts with a digit, and;
>> 3) We replace variables that are undefined.

> Also (4) this only happens when in non-simple query mode --- the
> example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode.  So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet.  We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode.  The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

Fabien COELHO-3

I'll look into it. Thanks for the analysis and CC-ing.

--
Fabien.


Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

jaime soler
In reply to this post by Tom Lane-2
Thanks for your analysis.


Regards

El mié., 24 jun. 2020 a las 17:17, Tom Lane (<[hidden email]>) escribió:
I wrote:
> David Rowley <[hidden email]> writes:
>> I don't often do much with pgbench and variables, but there are a few
>> things that surprise me here.
>> 1) That pgbench replaces variables within single quotes, and;
>> 2) that we still think it's a variable name when it starts with a digit, and;
>> 3) We replace variables that are undefined.

> Also (4) this only happens when in non-simple query mode --- the
> example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode.  So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet.  We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode.  The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: pgbench and timestamps

Fabien COELHO-3
In reply to this post by Tom Lane-2

Hello Tom,

The attached patch fixes some of the underlying problems reported by
delaying the :var to $1 substitution to the last possible moments, so that
what variables are actually defined is known. PREPARE-ing is also delayed
to after these substitutions are done.

It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.

The attached patch fixes (2) & (3) for extended & prepared.

I have a doubt about fixing (1) because it would be a significant
behavioral change and it requires changing the replace routine
significantly to check for quoting, comments, and so on. This means that
currently ':var' is still broken under -M extended & prepared, I could
only break it differently by providing a nicer error message and also
break it under simple whereas it currently works there. I'm not thrilled
by spending efforts to do that.

The patches change the name of "parseQuery" to "makeVariablesParameters",
because it was not actually parsing any query. Maybe the new name could be
improved.

In passing, there was a bug in how NULL was passed, which I tried to fix
as well.

>>> I don't often do much with pgbench and variables, but there are a few
>>> things that surprise me here.
>>> 1) That pgbench replaces variables within single quotes, and;
>>> 2) that we still think it's a variable name when it starts with a digit, and;
>>> 3) We replace variables that are undefined.
>
>> Also (4) this only happens when in non-simple query mode --- the
>> example works fine without "-M prepared".
>
> After looking around in the code, it seems like the core of the issue
> is that pgbench.c's parseQuery() doesn't check whether a possible
> variable name is actually defined, unlike assignVariables() which is
> what does the same job in simple query mode.  So that explains the
> behavioral difference.
Yes.

> The reason for doing that probably was that parseQuery() is run when
> the input file is read, so that relevant variables might not be set
> yet.  We could fix that by postponing the work to be done at first
> execution of the query, as is already the case for PQprepare'ing the
> query.

Yep, done at first execution of the Command, so that variables are known.

> Also, after further thought I realize that (1) absolutely is a bug
> in the non-simple query modes, whatever you think about it in simple
> mode.  The non-simple modes are trying to pass the variable values
> as extended-query-protocol parameters, and the backend is not going
> to recognize $n inside a literal as being a parameter.

Yep. See my comments above.

> If we fixed (1) and (3) I think there wouldn't be any great need
> to tighten up (2).

I did (2) but not (1), for now.

--
Fabien.

pgbench-var-fix-1.patch (23K) Download Attachment