Cached plans and statement generalization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
104 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

Cached plans and statement generalization

konstantin knizhnik
Hi hackers,

There were a lot of discussions about query plan caching in hackers
mailing list, but I failed to find some clear answer for my question and
the current consensus on this question in Postgres community. As far as
I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve
now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded
value for partitioning key.
Only in this case optimizer will be able to construct efficient query
plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman
+ postgres_fdw shows 3 times degrade of performance in case of using
prepared statements.
But without prepared statements substantial amount of time is spent in
query compilation and planning. I was be able to speed up benchmark more
than two time by
sending prepared queries directly to the remote nodes.

So what I am thinking now is implicit query caching. If the same query
with different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters. I am not considering now shared query cache: is seems to be
much harder to implement. But local caching of generalized queries seems
to be not so difficult to implement and requires not so much changes in
Postgres code. And it can be useful not only for sharding, but for many
other cases where prepared statements can not be used.

I wonder if such option was already considered and if it was for some
reasons rejected: can you point me at this reasons?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Alexander Korotkov
Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <[hidden email]> wrote:
There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.
However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism.  Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik


On 24.04.2017 13:24, Alexander Korotkov wrote:
Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <[hidden email]> wrote:
There were a lot of discussions about query plan caching in hackers mailing list, but I failed to find some clear answer for my question and the current consensus on this question in Postgres community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman + postgres_fdw shows 3 times degrade of performance in case of using prepared statements.
But without prepared statements substantial amount of time is spent in query compilation and planning. I was be able to speed up benchmark more than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem which arises with pg_pathman while pg_pathman is an extension supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that community should address this issue in the context of declarative partitioning.
However, it's unlikely we can spot this issue with declarative partitioning because it still uses very inefficient constraint exclusion mechanism.  Thus, issues you are writing about would become visible on declarative partitioning only when constraint exclusion would be replaced with something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

 
Sorry, I have mentioned pg_pathman just as example. 
The same problems takes place with partitioning based on standard Postgres inheritance mechanism (when I manually create derived tables and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect the that it will also suffer from this problem (because is based on inheritance).
But as I wrote, I think that the problem with plan caching is wider and is not bounded just to partitioning.
 


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Andres Freund
In reply to this post by konstantin knizhnik
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> So what I am thinking now is implicit query caching. If the same query with
> different literal values is repeated many times, then we can try to
> generalize this query and replace it with prepared query with
> parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
  How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
  parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
  nice if that happens implicitly.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik
On 24.04.2017 21:43, Andres Freund wrote:
Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.
That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
  How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
  parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
  nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with default parameters,  scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase performance almost two times.

My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs  of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS  vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

autoprepare.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

srielau

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <[hidden email]> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?

Cheers
Serge Rielau

PS: FWIW, I like this feature.
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik


On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <[hidden email]> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?


Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.

I do not think that this optimization will be useful only for poorly designed application.
I already pointed on two use cases where prepapred statements can not be used:
1. pgbouncer without session-level pooling.
2. partitioning

Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking. 
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?
Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic plan.
I am not sure if postgres now is able to do it.



Cheers
Serge Rielau

PS: FWIW, I like this feature.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

David Fetter
In reply to this post by konstantin knizhnik
On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

> On 24.04.2017 21:43, Andres Freund wrote:
> > Hi,
> >
> > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > So what I am thinking now is implicit query caching. If the same query with
> > > different literal values is repeated many times, then we can try to
> > > generalize this query and replace it with prepared query with
> > > parameters.
> > That's not actuall all that easy:
> > - You pretty much do parse analysis to be able to do an accurate match.
> >    How much overhead is parse analysis vs. planning in your cases?
> > - The invalidation infrastructure for this, if not tied to to fully
> >    parse-analyzed statements, is going to be hell.
> > - Migrating to parameters can actually cause significant slowdowns, not
> >    nice if that happens implicitly.
>
> Well, first of all I want to share results I already get: pgbench with
> default parameters,  scale 10 and one connection:
>
> protocol
> TPS
> simple
> 3492
> extended
> 2927
> prepared
> 6865
> simple + autoprepare
> 6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

srielau
In reply to this post by konstantin knizhnik

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <[hidden email]> wrote:
On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <[hidden email]> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.
Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?
SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.

Cheers
Serge
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik
In reply to this post by David Fetter
On 04/25/2017 07:54 PM, David Fetter wrote:

> On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
>> On 24.04.2017 21:43, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
>>>> So what I am thinking now is implicit query caching. If the same query with
>>>> different literal values is repeated many times, then we can try to
>>>> generalize this query and replace it with prepared query with
>>>> parameters.
>>> That's not actuall all that easy:
>>> - You pretty much do parse analysis to be able to do an accurate match.
>>>     How much overhead is parse analysis vs. planning in your cases?
>>> - The invalidation infrastructure for this, if not tied to to fully
>>>     parse-analyzed statements, is going to be hell.
>>> - Migrating to parameters can actually cause significant slowdowns, not
>>>     nice if that happens implicitly.
>> Well, first of all I want to share results I already get: pgbench with
>> default parameters,  scale 10 and one connection:
>>
>> protocol
>> TPS
>> simple
>> 3492
>> extended
>> 2927
>> prepared
>> 6865
>> simple + autoprepare
>> 6844
> If this is string mashing on the unparsed query, as it appears to be,
> it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals from query. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters.  What I am doing is
very similar with standard prepared statements.
And moreover query is parsed! Only query which was already parsed and executed (but with different values of parameters) can be autoprepared.


>
> Best,
> David.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik
In reply to this post by srielau
On 04/25/2017 08:09 PM, Serge Rielau wrote:

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik [hidden email] wrote:
On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <[hidden email]> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.
Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?
SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?


Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.

Cheers
Serge


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

srielau

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <[hidden email]> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge

Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

konstantin knizhnik
On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <[hidden email]> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.




Cheers
Serge



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Doug Doole
When I did this in DB2, I didn't use the parser - it was too expensive. I just tokenized the statement and used some simple rules to bypass the invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd disallow replacement replacement until I hit the end of the current subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you to cast using function notation like VARCHAR(foo, 10). This meant I would never replace the second parameter of any VARCHAR function. Now it's possible that when the statement was fully compiled we'd find that VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in cast function. Our thinking that these cases were rare enough that we wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite simple (a few hundred line of code) with no significant maintenance issues. (Other developers had no problem adding in new cases where constants had to be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the same code to also normalize statements (uppercase all keywords, collapse whitespace to a single blank, etc.) but that feature was never added to the product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <[hidden email]> wrote:
On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <[hidden email]> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.




Cheers
Serge



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Andres Freund
On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Doug Doole
Plan invalidation was no different than for any SQL statement. DB2 keeps a list of the objects the statement depends on. If any of the objects changes in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called the "compilation environment". This is a collection of everything that impacts how a statement is compiled (SQL path, optimization level, etc.). Plan lookup is done using both the statement text and the compilation environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your path is ANDRES, MYTEAM, SYSIBM we will have different compilation environments. If we both issue "SELECT * FROM T" we'll end up with different cache entries even if T in both of our statements resolves to MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then execute "SELECT * FROM T" again, I have a new compilation environment so the second invocation of the statement will create a new entry in the cache. The first entry is not kicked out - it will still be there for re-use if I change my SQL path back to my original value (modulo LRU for cache memory management of course).

With literal replacement, the cache entry is on the modified statement text. Given the modified statement text and the compilation environment, you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <[hidden email]> wrote:
On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

David Fetter
In reply to this post by konstantin knizhnik
On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:

> On 04/25/2017 07:54 PM, David Fetter wrote:
> > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> > > On 24.04.2017 21:43, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > > > So what I am thinking now is implicit query caching. If the same query with
> > > > > different literal values is repeated many times, then we can try to
> > > > > generalize this query and replace it with prepared query with
> > > > > parameters.
> > > > That's not actuall all that easy:
> > > > - You pretty much do parse analysis to be able to do an accurate match.
> > > >     How much overhead is parse analysis vs. planning in your cases?
> > > > - The invalidation infrastructure for this, if not tied to to fully
> > > >     parse-analyzed statements, is going to be hell.
> > > > - Migrating to parameters can actually cause significant slowdowns, not
> > > >     nice if that happens implicitly.
> > > Well, first of all I want to share results I already get: pgbench with
> > > default parameters,  scale 10 and one connection:
> > >
> > > protocol
> > > TPS
> > > simple
> > > 3492
> > > extended
> > > 2927
> > > prepared
> > > 6865
> > > simple + autoprepare
> > > 6844
> > If this is string mashing on the unparsed query, as it appears to be,
> > it's going to be a perennial source of security issues.
>
> Sorry, may be I missed something, but I can not understand how
> security can be violated by extracting string literals from query. I
> am just copying bytes from one buffer to another. I do not try to
> somehow interpret this parameters.  What I am doing is very similar
> with standard prepared statements.  And moreover query is parsed!
> Only query which was already parsed and executed (but with different
> values of parameters) can be autoprepared.

I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Andres Freund
In reply to this post by Doug Doole
Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +0000, Doug Doole wrote:

> Plan invalidation was no different than for any SQL statement. DB2 keeps a
> list of the objects the statement depends on. If any of the objects changes
> in an incompatible way the plan is invalidated and kicked out of the cache.
>
> I suspect what is more interesting is plan lookup. DB2 has something called
> the "compilation environment". This is a collection of everything that
> impacts how a statement is compiled (SQL path, optimization level, etc.).
> Plan lookup is done using both the statement text and the compilation
> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
> environments. If we both issue "SELECT * FROM T" we'll end up with
> different cache entries even if T in both of our statements resolves to
> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
> execute "SELECT * FROM T" again, I have a new compilation environment so
> the second invocation of the statement will create a new entry in the
> cache. The first entry is not kicked out - it will still be there for
> re-use if I change my SQL path back to my original value (modulo LRU for
> cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

David G Johnston
In reply to this post by David Fetter
On Tue, Apr 25, 2017 at 3:24 PM, David Fetter <[hidden email]> wrote:
I don't have an exploit yet.  What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

​If they want to see how it works they can read the source code.  In terms of runtime data it would limited to whatever the session itself created.  In most cases the presence of the cache would be invisible.  I suppose it might appear if one were to explain a query, reset the session, explain another query and then re-explain the original.  If the chosen plan in the second pass differed because of the presence of the leading query it would be noticeable but not revealing.  Albeit I'm a far cry from a security expert...

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Cached plans and statement generalization

Doug Doole
In reply to this post by Andres Freund
(FWIW, on this list we don't do top-quotes)

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
search_path.  Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it worked though.

A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.

- Doug
1234 ... 6