Feature: triggers on materialized views

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Feature: triggers on materialized views

Mitar
Hi!

Based on discussion about observing changes on an open query in a
reactive manner (to support reactive web applications) [1], I
identified that one critical feature is missing to fully implement
discussed design of having reactive queries be represented as
materialized views, and changes to these materialized views would then
be observed and pushed to the client through LISTEN/NOTIFY.

This is my first time contributing to PostgreSQL, so I hope I am
starting this process well.

I would like to propose that support for AFTER triggers are added to
materialized views. I experimented a bit and it seems this is mostly
just a question of enabling/exposing them. See attached patch. This
enabled me to add trigger to a material view which mostly worked. Here
are my findings.

Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
per statement and per row. There are few improvements which could be
done:

- Currently only insert and remove operations are done on the
materialized view. This is because the current logic just removes
changed rows and inserts new rows.
- In current concurrently refresh logic those insert and remove
operations are made even if there are no changes to be done. Which
triggers a statement trigger unnecessary. A small improvement could be
to skip the statement in that case, but looking at the code this seems
maybe tricky because both each of inserts and deletions are done
inside one query each.
- Current concurrently refresh logic does never do updates on existing
rows. It would be nicer to have that so that triggers are more aligned
with real changes to the data. So current two queries could be changed
to three, each doing one of the insert, update, and delete.

Non-concurrent refresh does not trigger any trigger. But it seems all
data to do so is there (previous table, new table), at least for the
statement-level trigger. Row-level triggers could also be simulated
probably (with TRUNCATE and INSERT triggers).

[1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

matview.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

David Fetter
On Mon, Dec 24, 2018 at 12:59:43PM -0800, Mitar wrote:

> Hi!
>
> Based on discussion about observing changes on an open query in a
> reactive manner (to support reactive web applications) [1], I
> identified that one critical feature is missing to fully implement
> discussed design of having reactive queries be represented as
> materialized views, and changes to these materialized views would then
> be observed and pushed to the client through LISTEN/NOTIFY.
>
> This is my first time contributing to PostgreSQL, so I hope I am
> starting this process well.

You've got the right mailing list, a description of what you want, and
a PoC patch. You also got the patch in during the time between
Commitfests. You're doing great!

> I would like to propose that support for AFTER triggers are added to
> materialized views. I experimented a bit and it seems this is mostly
> just a question of enabling/exposing them. See attached patch.

About that. When there's a change (or possible change) in user-visible
behavior, it should come with regression tests, which it would make
sense to add to src/tests/regress/matview.sql along with the
corresponding changes to src/tests/regress/expected/matview.out

> This enabled me to add trigger to a material view which mostly
> worked. Here are my findings.
>
> Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> per statement and per row.

You'd want at least one test for each of those new features.

> There are few improvements which could be
> done:
>
> - Currently only insert and remove operations are done on the
> materialized view. This is because the current logic just removes
> changed rows and inserts new rows.

What other operations might you want to support?

> - In current concurrently refresh logic those insert and remove
> operations are made even if there are no changes to be done. Which
> triggers a statement trigger unnecessary. A small improvement could be
> to skip the statement in that case, but looking at the code this seems
> maybe tricky because both each of inserts and deletions are done
> inside one query each.

As far as you can tell, is this just an efficiency optimization, or
might it go to correctness of the behavior?

> - Current concurrently refresh logic does never do updates on existing
> rows. It would be nicer to have that so that triggers are more aligned
> with real changes to the data. So current two queries could be changed
> to three, each doing one of the insert, update, and delete.

I'm not sure I understand the problem being described here. Do you see
these as useful to separate for some reason?

> Non-concurrent refresh does not trigger any trigger. But it seems
> all data to do so is there (previous table, new table), at least for
> the statement-level trigger. Row-level triggers could also be
> simulated probably (with TRUNCATE and INSERT triggers).

Would it make more sense to fill in the missing implementations of NEW
and OLD for per-row triggers instead of adding another hack?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

Thanks for reply!

On Mon, Dec 24, 2018 at 2:20 PM David Fetter <[hidden email]> wrote:
> You've got the right mailing list, a description of what you want, and
> a PoC patch. You also got the patch in during the time between
> Commitfests. You're doing great!

Great!

One thing I am unclear about is how it is determined if this is a
viable feature to be eventually included. You gave me some suggestions
to improve in my patch (adding tests and so on). Does this mean that
the patch should be fully done before a decision is made?

Also, the workflow is that I improve things, and resubmit a patch to
the mailing list, for now?

> > - Currently only insert and remove operations are done on the
> > materialized view. This is because the current logic just removes
> > changed rows and inserts new rows.
>
> What other operations might you want to support?

Update. So if a row is changing, instead of doing a remove and insert,
what currently is being done, I would prefer an update. Then UPDATE
trigger operation would happen as well. Maybe the INSERT query could
be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
one does UPDATE trigger operation on conflict), and REMOVE changed to
remove just rows which were really removed, but not only updated.

> As far as you can tell, is this just an efficiency optimization, or
> might it go to correctness of the behavior?

It is just an optimization. Or maybe even just a surprise. Maybe a
documentation addition could help here. In my use case I would loop
over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
be done. But it is just surprising that DELETE trigger is called even
when no rows are being deleted in the materialized view.

> I'm not sure I understand the problem being described here. Do you see
> these as useful to separate for some reason?

So rows which are just updated currently get first DELETE trigger
called and then INSERT. The issue is that if I am observing this
behavior from outside, it makes it unclear when I see DELETE if this
means really that a row has been deleted or it just means that later
on an INSERT would happen. Now I have to wait for an eventual INSERT
to determine that. But how long should I wait? It makes consuming
these notifications tricky.

If I just blindly respond to those notifications, this could introduce
other problems. For example, if I have a reactive web application it
could mean a visible flicker to the user. Instead of updating rendered
row, I would first delete it and then later on re-insert it.

> > Non-concurrent refresh does not trigger any trigger. But it seems
> > all data to do so is there (previous table, new table), at least for
> > the statement-level trigger. Row-level triggers could also be
> > simulated probably (with TRUNCATE and INSERT triggers).
>
> Would it make more sense to fill in the missing implementations of NEW
> and OLD for per-row triggers instead of adding another hack?

You lost me here. But I agree, we should implement this fully, without
hacks. I just do not know how exactly.

Are you saying that we should support only row-level triggers, or that
we should support both statement-level and row-level triggers, but
just make sure we implement this properly? I think that my suggestion
of using TRUNCATE and INSERT triggers is reasonable in the case of
full refresh. This is what happens. If we would want to have
DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
concurrent version has to do, which would defeat the difference
between the two. But yes, all INSERT trigger calls should have NEW
provided.

So per-statement trigger would have TRUNCATE and INSERT called. And
per-row trigger would have TRUNCATE and per-row INSERTs called.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

I made another version of the patch. This one does UPDATEs for changed
row instead of DELETE/INSERT.

All existing regression tests are still passing (make check).


Mitar

On Mon, Dec 24, 2018 at 4:13 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> Thanks for reply!
>
> On Mon, Dec 24, 2018 at 2:20 PM David Fetter <[hidden email]> wrote:
> > You've got the right mailing list, a description of what you want, and
> > a PoC patch. You also got the patch in during the time between
> > Commitfests. You're doing great!
>
> Great!
>
> One thing I am unclear about is how it is determined if this is a
> viable feature to be eventually included. You gave me some suggestions
> to improve in my patch (adding tests and so on). Does this mean that
> the patch should be fully done before a decision is made?
>
> Also, the workflow is that I improve things, and resubmit a patch to
> the mailing list, for now?
>
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> >
> > What other operations might you want to support?
>
> Update. So if a row is changing, instead of doing a remove and insert,
> what currently is being done, I would prefer an update. Then UPDATE
> trigger operation would happen as well. Maybe the INSERT query could
> be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> one does UPDATE trigger operation on conflict), and REMOVE changed to
> remove just rows which were really removed, but not only updated.
>
> > As far as you can tell, is this just an efficiency optimization, or
> > might it go to correctness of the behavior?
>
> It is just an optimization. Or maybe even just a surprise. Maybe a
> documentation addition could help here. In my use case I would loop
> over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> be done. But it is just surprising that DELETE trigger is called even
> when no rows are being deleted in the materialized view.
>
> > I'm not sure I understand the problem being described here. Do you see
> > these as useful to separate for some reason?
>
> So rows which are just updated currently get first DELETE trigger
> called and then INSERT. The issue is that if I am observing this
> behavior from outside, it makes it unclear when I see DELETE if this
> means really that a row has been deleted or it just means that later
> on an INSERT would happen. Now I have to wait for an eventual INSERT
> to determine that. But how long should I wait? It makes consuming
> these notifications tricky.
>
> If I just blindly respond to those notifications, this could introduce
> other problems. For example, if I have a reactive web application it
> could mean a visible flicker to the user. Instead of updating rendered
> row, I would first delete it and then later on re-insert it.
>
> > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > all data to do so is there (previous table, new table), at least for
> > > the statement-level trigger. Row-level triggers could also be
> > > simulated probably (with TRUNCATE and INSERT triggers).
> >
> > Would it make more sense to fill in the missing implementations of NEW
> > and OLD for per-row triggers instead of adding another hack?
>
> You lost me here. But I agree, we should implement this fully, without
> hacks. I just do not know how exactly.
>
> Are you saying that we should support only row-level triggers, or that
> we should support both statement-level and row-level triggers, but
> just make sure we implement this properly? I think that my suggestion
> of using TRUNCATE and INSERT triggers is reasonable in the case of
> full refresh. This is what happens. If we would want to have
> DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> concurrent version has to do, which would defeat the difference
> between the two. But yes, all INSERT trigger calls should have NEW
> provided.
>
> So per-statement trigger would have TRUNCATE and INSERT called. And
> per-row trigger would have TRUNCATE and per-row INSERTs called.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m


--
http://mitar.tnode.com/
https://twitter.com/mitar_m

matview.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

So I think this makes it work great for REFRESH MATERIALIZED VIEW
CONCURRENTLY. I think we can leave empty statement triggers as they
are. I have not found a nice way to not do them.

For adding triggers to REFRESH MATERIALIZED VIEW I would need some
help and pointers. I am not sure how to write calling triggers there.
Any reference to an existing code which does something similar would
be great. So I think after swapping heaps we should call TRUNCATE
trigger and then INSERT for all new rows.


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar <[hidden email]> wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <[hidden email]> wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

David Fetter
In reply to this post by Mitar
On Mon, Dec 24, 2018 at 04:13:44PM -0800, Mitar wrote:

> Hi!
>
> Thanks for reply!
>
> On Mon, Dec 24, 2018 at 2:20 PM David Fetter <[hidden email]> wrote:
> > You've got the right mailing list, a description of what you want, and
> > a PoC patch. You also got the patch in during the time between
> > Commitfests. You're doing great!
>
> Great!
>
> One thing I am unclear about is how it is determined if this is a
> viable feature to be eventually included. You gave me some suggestions
> to improve in my patch (adding tests and so on). Does this mean that
> the patch should be fully done before a decision is made?
>
> Also, the workflow is that I improve things, and resubmit a patch to
> the mailing list, for now?
>
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> >
> > What other operations might you want to support?
>
> Update. So if a row is changing, instead of doing a remove and insert,
> what currently is being done, I would prefer an update. Then UPDATE
> trigger operation would happen as well. Maybe the INSERT query could
> be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> one does UPDATE trigger operation on conflict), and REMOVE changed to
> remove just rows which were really removed, but not only updated.

There might be a reason it's the way it is. Looking at the commits
that introduced this might shed some light.

> > I'm not sure I understand the problem being described here. Do you see
> > these as useful to separate for some reason?
>
> So rows which are just updated currently get first DELETE trigger
> called and then INSERT. The issue is that if I am observing this
> behavior from outside, it makes it unclear when I see DELETE if this
> means really that a row has been deleted or it just means that later
> on an INSERT would happen. Now I have to wait for an eventual INSERT
> to determine that. But how long should I wait? It makes consuming
> these notifications tricky.

If it helps you think about it better, all NOTIFICATIONs are sent on
COMMIT, i.e. you don't need to worry as much about what things should
or shouldn't have arrived. The down side, such as it is, is that they
don't convey premature knowledge about a state that may never arrive.

> If I just blindly respond to those notifications, this could introduce
> other problems. For example, if I have a reactive web application it
> could mean a visible flicker to the user. Instead of updating rendered
> row, I would first delete it and then later on re-insert it.

This is at what I hope is a level quite distinct from database
operations. Separation of concerns via the model-view-controller (or
similar) architecture and all that.

> > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > all data to do so is there (previous table, new table), at least for
> > > the statement-level trigger. Row-level triggers could also be
> > > simulated probably (with TRUNCATE and INSERT triggers).
> >
> > Would it make more sense to fill in the missing implementations of NEW
> > and OLD for per-row triggers instead of adding another hack?
>
> You lost me here. But I agree, we should implement this fully, without
> hacks. I just do not know how exactly.

Sorry I was unclear.  The SQL standard defines both transition tables,
which we have, for per-statement triggers, and transition variables,
which we don't, for per-row triggers. Here's the relevant part of the
syntax:

<trigger definition> ::=
    CREATE TRIGGER <trigger name> <trigger action time> <trigger event>
    ON <table name> [ REFERENCING <transition table or variable list> ]
    <triggered action>

<transition table or variable list> ::=
    <transition table or variable>...
<transition table or
    OLD [ ROW ] [ AS
    | NEW [ ROW ] [ AS
    | OLD TABLE [ AS ]
    | NEW TABLE [ AS ]
variable> ::=
] <old transition variable name>
] <new transition variable name>
<old transition table name>
<new transition table name>

> Are you saying that we should support only row-level triggers, or that
> we should support both statement-level and row-level triggers, but
> just make sure we implement this properly?

The latter, although we might need to defer the row-level triggers
until we support transition variables.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

On Tue, Dec 25, 2018 at 10:03 AM David Fetter <[hidden email]> wrote:
> If it helps you think about it better, all NOTIFICATIONs are sent on
> COMMIT, i.e. you don't need to worry as much about what things should
> or shouldn't have arrived. The down side, such as it is, is that they
> don't convey premature knowledge about a state that may never arrive.

This fact does not really help me. My client code listening to
NOTIFICATIONs does not know when some other client made COMMIT. There
is no NOTIFICATION saying "this is the end of the commit for which I
just sent you notifications".

> This is at what I hope is a level quite distinct from database
> operations. Separation of concerns via the model-view-controller (or
> similar) architecture and all that.

Of course, but garbage in, garbage out. If notifications are
superfluous then another abstraction layer has to fix them. I would
prefer if this would not have to be the case.

But it seems it is relatively easy to fix this logic and have both
INSERTs, DELETEs and UPDATEs. The patch I updated and attached
previously does that.

> Sorry I was unclear.  The SQL standard defines both transition tables,
> which we have, for per-statement triggers, and transition variables,
> which we don't, for per-row triggers.

I thought that PostgreSQL has transition variables per-row triggers,
only that it is not possible to (re)name them (are depend on the
trigger function language)? But there are OLD and NEW variables
available in per-row triggers, or equivalent?

> The latter, although we might need to defer the row-level triggers
> until we support transition variables.

Not sure how transition variables are implemented currently for
regular tables, but we could probably do the same?

Anyway, I do not know how to proceed here to implement or
statement-level or row-level triggers here. It could be just a matter
a calling some function to call them, but I am not familiar with the
codebase enough to know what. So any pointers to existing code which
does something similar would be great. So, what to call once material
views' heaps are swapped to call triggers?


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Alvaro Herrera-9
In reply to this post by Mitar
On 2018-Dec-24, Mitar wrote:


> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).

Okay, but you don't add any for your new feature, which is not good.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <[hidden email]> wrote:
> > I made another version of the patch. This one does UPDATEs for changed
> > row instead of DELETE/INSERT.
> >
> > All existing regression tests are still passing (make check).
>
> Okay, but you don't add any for your new feature, which is not good.

Yes, I have not yet done that. I want first to also add calling
triggers for non-concurrent refresh, but I would need a bit help there
(what to call, example of maybe code which does something similar
already).


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Alvaro Herrera-9
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <[hidden email]> wrote:
> > > I made another version of the patch. This one does UPDATEs for changed
> > > row instead of DELETE/INSERT.
> > >
> > > All existing regression tests are still passing (make check).
> >
> > Okay, but you don't add any for your new feature, which is not good.
>
> Yes, I have not yet done that. I want first to also add calling
> triggers for non-concurrent refresh, but I would need a bit help there
> (what to call, example of maybe code which does something similar
> already).

Well, REFRESH CONCURRENTLY runs completely different than non-concurrent
REFRESH.  The former updates the existing data by observing the
differences with the previous data; the latter simply re-runs the query
and overwrites everything.  So if you simply enabled triggers on
non-concurrent refresh, you'd just see a bunch of inserts into a
throwaway data area (a transient relfilenode, we call it), then a swap
of the MV's relfilenode with the throwaway one.  I doubt it'd be useful.
But then I'm not clear *why* you would like to do a non-concurrent
refresh.  Maybe your situation would be best served by forbidding non-
concurrent refresh when the MV contains any triggers.

Alternatively, maybe reimplement non-concurrent refresh so that it works
identically to concurrent refresh (except with a stronger lock).  Not
sure if this implies any performance penalties.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <[hidden email]> wrote:
> But then I'm not clear *why* you would like to do a non-concurrent
> refresh.

I mostly wanted to support if for two reasons:

- completeness: maybe we cannot imagine the use case yet, but somebody
might in the future
- getting trigger calls for initial inserts: you can then create
materialized view without data, attach triggers, and then run a
regular refresh; this allows you to have only one code path to process
any (including initial) changes to the view through notifications,
instead of handling initial data differently

> Maybe your situation would be best served by forbidding non-
> concurrent refresh when the MV contains any triggers.

If this would be acceptable by the community, I could do it. I worry
though that one could probably get themselves into a situation where
materialized view losses all data through some WITH NO DATA operation
and concurrent refresh is not possible. Currently concurrent refresh
works only with data. We could make concurrent refresh also work when
materialized view has no data easily (it would just insert data and
not compute diff).

> Alternatively, maybe reimplement non-concurrent refresh so that it works
> identically to concurrent refresh (except with a stronger lock).  Not
> sure if this implies any performance penalties.

Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
That would then generate reasonable trigger calls.

Are there any existing benchmarks for such operations I could use to
see if there are any performance changes if I change implementation
here? Any guidelines how to evaluate this?


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
In reply to this post by Mitar
Hi!

I did a bit of benchmarking. It seems my version with UPDATE takes
even slightly less time (~5%).


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar <[hidden email]> wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <[hidden email]> wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Alvaro Herrera-9
In reply to this post by Mitar
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <[hidden email]> wrote:
> > But then I'm not clear *why* you would like to do a non-concurrent
> > refresh.
>
> I mostly wanted to support if for two reasons:
>
> - completeness: maybe we cannot imagine the use case yet, but somebody
> might in the future

Understood.  We don't like features that fail to work in conjunction
with other features, so this is a good goal to keep in mind.

> - getting trigger calls for initial inserts: you can then create
> materialized view without data, attach triggers, and then run a
> regular refresh; this allows you to have only one code path to process
> any (including initial) changes to the view through notifications,
> instead of handling initial data differently

Sounds like you could do this by fixing concurrent refresh to also work
when the MV is WITH NO DATA.

> > Maybe your situation would be best served by forbidding non-
> > concurrent refresh when the MV contains any triggers.
>
> If this would be acceptable by the community, I could do it.

I think your chances are 50%/50% that this would appear acceptable.

> > Alternatively, maybe reimplement non-concurrent refresh so that it works
> > identically to concurrent refresh (except with a stronger lock).  Not
> > sure if this implies any performance penalties.
>
> Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> That would then generate reasonable trigger calls.

Right.

> Are there any existing benchmarks for such operations I could use to
> see if there are any performance changes if I change implementation
> here? Any guidelines how to evaluate this?

Not that I know of.  Typically the developer of a feature comes up with
appropriate performance tests also, targetting average and worst cases.

If the performance worsens with the different implementation, one idea
is to keep both and only use the slow one when triggers are present.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

On Wed, Dec 26, 2018 at 4:38 AM Alvaro Herrera <[hidden email]> wrote:
> Sounds like you could do this by fixing concurrent refresh to also work
> when the MV is WITH NO DATA.

Yes, I do not think this would be too hard to fix. I could do this nevertheless.

> > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> > That would then generate reasonable trigger calls.
>
> Right.

I have tested this yesterday and performance is 2x worse that heap
swap on the benchmark I made. So I do not think this is a viable
approach.

I am now looking into simply triggering TRUNCATE and INSERT triggers
after heap swap simulating the above. I made AFTER STATEMENT triggers
and it looks like it is working, only NEW table is not populated for
some reason. Any suggestions? See attached patch.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

heapswaptriggers.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
In reply to this post by Mitar
Hi!

I have made an updated version of the patch, added tests and
documentation changes. This is my view now a complete patch. Please
provide any feedback or comments you might have for me to improve the
patch. I will also add it to commitfest.

A summary of the patch: This patch enables adding AFTER triggers (both
ROW and STATEMENT) on materialized views. They are fired when doing
REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
Triggers are not fired if you call REFRESH without CONCURRENTLY. This
is based on some discussion on the mailing list because implementing
it for without CONCURRENTLY would require us to add logic for firing
triggers where there was none before (and is just an efficient heap
swap).

To be able to create a materialized view without data, specify
triggers, and REFRESH CONCURRENTLY so that those triggers would be
called also for initial data, I have tested and determined that there
is no reason why REFRESH CONCURRENTLY could not be run on
uninitialized materialized view. So I removed that check and things
seem to just work. Including triggers being called for initial data. I
think this makes REFRESH CONCURRENTLY have one less special case which
is in general nicer.

I have run tests and all old tests still succeed. I have added more
tests for the new feature.

I have run benchmark to evaluate the impact of me changing
refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
rows which were just updated. In fact it seems this improves
performance slightly (4% in my benchmark, mean over 10 runs). I guess
that this is because it is cheaper to just change one column's values
(what benchmark is doing when changing rows) instead of removing and
inserting the whole row. Because REFRESH MATERIALIZED VIEW
CONCURRENTLY is meant to be used when not a lot of data has been
changed anyway, I find this a pleasantly surprising additional
improvement in this patch. I am attaching the benchmark script I have
used. I compared the time of the final refresh query in the script. (I
would love if pgbench could take a custom init script to run before
the timed part of the script.)


Mitar

On Mon, Dec 24, 2018 at 12:59 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> Based on discussion about observing changes on an open query in a
> reactive manner (to support reactive web applications) [1], I
> identified that one critical feature is missing to fully implement
> discussed design of having reactive queries be represented as
> materialized views, and changes to these materialized views would then
> be observed and pushed to the client through LISTEN/NOTIFY.
>
> This is my first time contributing to PostgreSQL, so I hope I am
> starting this process well.
>
> I would like to propose that support for AFTER triggers are added to
> materialized views. I experimented a bit and it seems this is mostly
> just a question of enabling/exposing them. See attached patch. This
> enabled me to add trigger to a material view which mostly worked. Here
> are my findings.
>
> Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> per statement and per row. There are few improvements which could be
> done:
>
> - Currently only insert and remove operations are done on the
> materialized view. This is because the current logic just removes
> changed rows and inserts new rows.
> - In current concurrently refresh logic those insert and remove
> operations are made even if there are no changes to be done. Which
> triggers a statement trigger unnecessary. A small improvement could be
> to skip the statement in that case, but looking at the code this seems
> maybe tricky because both each of inserts and deletions are done
> inside one query each.
> - Current concurrently refresh logic does never do updates on existing
> rows. It would be nicer to have that so that triggers are more aligned
> with real changes to the data. So current two queries could be changed
> to three, each doing one of the insert, update, and delete.
>
> Non-concurrent refresh does not trigger any trigger. But it seems all
> data to do so is there (previous table, new table), at least for the
> statement-level trigger. Row-level triggers could also be simulated
> probably (with TRUNCATE and INSERT triggers).
>
> [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m


--
http://mitar.tnode.com/
https://twitter.com/mitar_m

matviewtriggers-v2.patch (28K) Download Attachment
bench.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

One more version of the patch with slightly more deterministic tests.


Mitar

On Thu, Dec 27, 2018 at 11:43 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> I have made an updated version of the patch, added tests and
> documentation changes. This is my view now a complete patch. Please
> provide any feedback or comments you might have for me to improve the
> patch. I will also add it to commitfest.
>
> A summary of the patch: This patch enables adding AFTER triggers (both
> ROW and STATEMENT) on materialized views. They are fired when doing
> REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> is based on some discussion on the mailing list because implementing
> it for without CONCURRENTLY would require us to add logic for firing
> triggers where there was none before (and is just an efficient heap
> swap).
>
> To be able to create a materialized view without data, specify
> triggers, and REFRESH CONCURRENTLY so that those triggers would be
> called also for initial data, I have tested and determined that there
> is no reason why REFRESH CONCURRENTLY could not be run on
> uninitialized materialized view. So I removed that check and things
> seem to just work. Including triggers being called for initial data. I
> think this makes REFRESH CONCURRENTLY have one less special case which
> is in general nicer.
>
> I have run tests and all old tests still succeed. I have added more
> tests for the new feature.
>
> I have run benchmark to evaluate the impact of me changing
> refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> rows which were just updated. In fact it seems this improves
> performance slightly (4% in my benchmark, mean over 10 runs). I guess
> that this is because it is cheaper to just change one column's values
> (what benchmark is doing when changing rows) instead of removing and
> inserting the whole row. Because REFRESH MATERIALIZED VIEW
> CONCURRENTLY is meant to be used when not a lot of data has been
> changed anyway, I find this a pleasantly surprising additional
> improvement in this patch. I am attaching the benchmark script I have
> used. I compared the time of the final refresh query in the script. (I
> would love if pgbench could take a custom init script to run before
> the timed part of the script.)
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 12:59 PM Mitar <[hidden email]> wrote:
> >
> > Hi!
> >
> > Based on discussion about observing changes on an open query in a
> > reactive manner (to support reactive web applications) [1], I
> > identified that one critical feature is missing to fully implement
> > discussed design of having reactive queries be represented as
> > materialized views, and changes to these materialized views would then
> > be observed and pushed to the client through LISTEN/NOTIFY.
> >
> > This is my first time contributing to PostgreSQL, so I hope I am
> > starting this process well.
> >
> > I would like to propose that support for AFTER triggers are added to
> > materialized views. I experimented a bit and it seems this is mostly
> > just a question of enabling/exposing them. See attached patch. This
> > enabled me to add trigger to a material view which mostly worked. Here
> > are my findings.
> >
> > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > per statement and per row. There are few improvements which could be
> > done:
> >
> > - Currently only insert and remove operations are done on the
> > materialized view. This is because the current logic just removes
> > changed rows and inserts new rows.
> > - In current concurrently refresh logic those insert and remove
> > operations are made even if there are no changes to be done. Which
> > triggers a statement trigger unnecessary. A small improvement could be
> > to skip the statement in that case, but looking at the code this seems
> > maybe tricky because both each of inserts and deletions are done
> > inside one query each.
> > - Current concurrently refresh logic does never do updates on existing
> > rows. It would be nicer to have that so that triggers are more aligned
> > with real changes to the data. So current two queries could be changed
> > to three, each doing one of the insert, update, and delete.
> >
> > Non-concurrent refresh does not trigger any trigger. But it seems all
> > data to do so is there (previous table, new table), at least for the
> > statement-level trigger. Row-level triggers could also be simulated
> > probably (with TRUNCATE and INSERT triggers).
> >
> > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m


--
http://mitar.tnode.com/
https://twitter.com/mitar_m

matviewtriggers-v3.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

Hm, why in commitfest it does not display the latest patch?

https://commitfest.postgresql.org/21/1953/

It does display correctly the latest e-mail, but not the link to the patch. :-(


Mitar

On Thu, Dec 27, 2018 at 11:51 PM Mitar <[hidden email]> wrote:

>
> Hi!
>
> One more version of the patch with slightly more deterministic tests.
>
>
> Mitar
>
> On Thu, Dec 27, 2018 at 11:43 PM Mitar <[hidden email]> wrote:
> >
> > Hi!
> >
> > I have made an updated version of the patch, added tests and
> > documentation changes. This is my view now a complete patch. Please
> > provide any feedback or comments you might have for me to improve the
> > patch. I will also add it to commitfest.
> >
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> > Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> > is based on some discussion on the mailing list because implementing
> > it for without CONCURRENTLY would require us to add logic for firing
> > triggers where there was none before (and is just an efficient heap
> > swap).
> >
> > To be able to create a materialized view without data, specify
> > triggers, and REFRESH CONCURRENTLY so that those triggers would be
> > called also for initial data, I have tested and determined that there
> > is no reason why REFRESH CONCURRENTLY could not be run on
> > uninitialized materialized view. So I removed that check and things
> > seem to just work. Including triggers being called for initial data. I
> > think this makes REFRESH CONCURRENTLY have one less special case which
> > is in general nicer.
> >
> > I have run tests and all old tests still succeed. I have added more
> > tests for the new feature.
> >
> > I have run benchmark to evaluate the impact of me changing
> > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> > rows which were just updated. In fact it seems this improves
> > performance slightly (4% in my benchmark, mean over 10 runs). I guess
> > that this is because it is cheaper to just change one column's values
> > (what benchmark is doing when changing rows) instead of removing and
> > inserting the whole row. Because REFRESH MATERIALIZED VIEW
> > CONCURRENTLY is meant to be used when not a lot of data has been
> > changed anyway, I find this a pleasantly surprising additional
> > improvement in this patch. I am attaching the benchmark script I have
> > used. I compared the time of the final refresh query in the script. (I
> > would love if pgbench could take a custom init script to run before
> > the timed part of the script.)
> >
> >
> > Mitar
> >
> > On Mon, Dec 24, 2018 at 12:59 PM Mitar <[hidden email]> wrote:
> > >
> > > Hi!
> > >
> > > Based on discussion about observing changes on an open query in a
> > > reactive manner (to support reactive web applications) [1], I
> > > identified that one critical feature is missing to fully implement
> > > discussed design of having reactive queries be represented as
> > > materialized views, and changes to these materialized views would then
> > > be observed and pushed to the client through LISTEN/NOTIFY.
> > >
> > > This is my first time contributing to PostgreSQL, so I hope I am
> > > starting this process well.
> > >
> > > I would like to propose that support for AFTER triggers are added to
> > > materialized views. I experimented a bit and it seems this is mostly
> > > just a question of enabling/exposing them. See attached patch. This
> > > enabled me to add trigger to a material view which mostly worked. Here
> > > are my findings.
> > >
> > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > > per statement and per row. There are few improvements which could be
> > > done:
> > >
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> > > - In current concurrently refresh logic those insert and remove
> > > operations are made even if there are no changes to be done. Which
> > > triggers a statement trigger unnecessary. A small improvement could be
> > > to skip the statement in that case, but looking at the code this seems
> > > maybe tricky because both each of inserts and deletions are done
> > > inside one query each.
> > > - Current concurrently refresh logic does never do updates on existing
> > > rows. It would be nicer to have that so that triggers are more aligned
> > > with real changes to the data. So current two queries could be changed
> > > to three, each doing one of the insert, update, and delete.
> > >
> > > Non-concurrent refresh does not trigger any trigger. But it seems all
> > > data to do so is there (previous table, new table), at least for the
> > > statement-level trigger. Row-level triggers could also be simulated
> > > probably (with TRUNCATE and INSERT triggers).
> > >
> > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> > >
> > >
> > > Mitar
> > >
> > > --
> > > http://mitar.tnode.com/
> > > https://twitter.com/mitar_m
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

False alarm. It just looks like updating patches takes longer than
updating e-mails.


Mitar

On Fri, Dec 28, 2018 at 12:11 AM Mitar <[hidden email]> wrote:

>
> Hi!
>
> Hm, why in commitfest it does not display the latest patch?
>
> https://commitfest.postgresql.org/21/1953/
>
> It does display correctly the latest e-mail, but not the link to the patch. :-(
>
>
> Mitar
>
> On Thu, Dec 27, 2018 at 11:51 PM Mitar <[hidden email]> wrote:
> >
> > Hi!
> >
> > One more version of the patch with slightly more deterministic tests.
> >
> >
> > Mitar
> >
> > On Thu, Dec 27, 2018 at 11:43 PM Mitar <[hidden email]> wrote:
> > >
> > > Hi!
> > >
> > > I have made an updated version of the patch, added tests and
> > > documentation changes. This is my view now a complete patch. Please
> > > provide any feedback or comments you might have for me to improve the
> > > patch. I will also add it to commitfest.
> > >
> > > A summary of the patch: This patch enables adding AFTER triggers (both
> > > ROW and STATEMENT) on materialized views. They are fired when doing
> > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> > > Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> > > is based on some discussion on the mailing list because implementing
> > > it for without CONCURRENTLY would require us to add logic for firing
> > > triggers where there was none before (and is just an efficient heap
> > > swap).
> > >
> > > To be able to create a materialized view without data, specify
> > > triggers, and REFRESH CONCURRENTLY so that those triggers would be
> > > called also for initial data, I have tested and determined that there
> > > is no reason why REFRESH CONCURRENTLY could not be run on
> > > uninitialized materialized view. So I removed that check and things
> > > seem to just work. Including triggers being called for initial data. I
> > > think this makes REFRESH CONCURRENTLY have one less special case which
> > > is in general nicer.
> > >
> > > I have run tests and all old tests still succeed. I have added more
> > > tests for the new feature.
> > >
> > > I have run benchmark to evaluate the impact of me changing
> > > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> > > rows which were just updated. In fact it seems this improves
> > > performance slightly (4% in my benchmark, mean over 10 runs). I guess
> > > that this is because it is cheaper to just change one column's values
> > > (what benchmark is doing when changing rows) instead of removing and
> > > inserting the whole row. Because REFRESH MATERIALIZED VIEW
> > > CONCURRENTLY is meant to be used when not a lot of data has been
> > > changed anyway, I find this a pleasantly surprising additional
> > > improvement in this patch. I am attaching the benchmark script I have
> > > used. I compared the time of the final refresh query in the script. (I
> > > would love if pgbench could take a custom init script to run before
> > > the timed part of the script.)
> > >
> > >
> > > Mitar
> > >
> > > On Mon, Dec 24, 2018 at 12:59 PM Mitar <[hidden email]> wrote:
> > > >
> > > > Hi!
> > > >
> > > > Based on discussion about observing changes on an open query in a
> > > > reactive manner (to support reactive web applications) [1], I
> > > > identified that one critical feature is missing to fully implement
> > > > discussed design of having reactive queries be represented as
> > > > materialized views, and changes to these materialized views would then
> > > > be observed and pushed to the client through LISTEN/NOTIFY.
> > > >
> > > > This is my first time contributing to PostgreSQL, so I hope I am
> > > > starting this process well.
> > > >
> > > > I would like to propose that support for AFTER triggers are added to
> > > > materialized views. I experimented a bit and it seems this is mostly
> > > > just a question of enabling/exposing them. See attached patch. This
> > > > enabled me to add trigger to a material view which mostly worked. Here
> > > > are my findings.
> > > >
> > > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > > > per statement and per row. There are few improvements which could be
> > > > done:
> > > >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > > > - In current concurrently refresh logic those insert and remove
> > > > operations are made even if there are no changes to be done. Which
> > > > triggers a statement trigger unnecessary. A small improvement could be
> > > > to skip the statement in that case, but looking at the code this seems
> > > > maybe tricky because both each of inserts and deletions are done
> > > > inside one query each.
> > > > - Current concurrently refresh logic does never do updates on existing
> > > > rows. It would be nicer to have that so that triggers are more aligned
> > > > with real changes to the data. So current two queries could be changed
> > > > to three, each doing one of the insert, update, and delete.
> > > >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems all
> > > > data to do so is there (previous table, new table), at least for the
> > > > statement-level trigger. Row-level triggers could also be simulated
> > > > probably (with TRUNCATE and INSERT triggers).
> > > >
> > > > [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> > > >
> > > >
> > > > Mitar
> > > >
> > > > --
> > > > http://mitar.tnode.com/
> > > > https://twitter.com/mitar_m
> > >
> > >
> > >
> > > --
> > > http://mitar.tnode.com/
> > > https://twitter.com/mitar_m
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Peter Eisentraut-6
In reply to this post by Mitar
On 28/12/2018 08:43, Mitar wrote:
> A summary of the patch: This patch enables adding AFTER triggers (both
> ROW and STATEMENT) on materialized views. They are fired when doing
> REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.

What bothers me about this patch is that it subtly changes what a
trigger means.  It currently means, say, INSERT was executed on this
table.  You are expanding that to mean, a row was inserted into this
table -- somehow.

Triggers should generally refer to user-facing commands.  Could you not
make a trigger on REFRESH itself?

> Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> is based on some discussion on the mailing list because implementing
> it for without CONCURRENTLY would require us to add logic for firing
> triggers where there was none before (and is just an efficient heap
> swap).

This is also a problem, because it would allow bypassing the trigger
accidentally.

Moreover, consider that there could be updatable materialized views,
just like there are updatable normal views.  And there could be triggers
on those updatable materialized views.  Those would look similar but
work quite differently from what you are proposing here.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Feature: triggers on materialized views

Mitar
Hi!

I am new to contributing to PostgreSQL and this is my first time
having patches in commit fest, so I am not sure about details of the
process here, but I assume that replying and discuss the patch during
this period is one of the actives, so I am replying to the comment. If
I should wait or something like that, please advise.

On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut
<[hidden email]> wrote:
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
>
> What bothers me about this patch is that it subtly changes what a
> trigger means.  It currently means, say, INSERT was executed on this
> table.  You are expanding that to mean, a row was inserted into this
> table -- somehow.

Aren't almost all statements these days generated by some sort of
automatic logic? Which generates those INSERTs and then you get
triggers on them? I am not sure where is this big difference in your
view coming from? Triggers are not defined as "user-made INSERT was
executed on this table". I think it has always been defined as "INSERT
modified this table", no matter where this insert came from (from
user, from some other trigger, by backup process). I mean, this is the
beauty of declarative programming. You define it once and you do not
care who triggers it.

Materialized views are anyway just built-in implementation of tables +
triggers to rerun the query. You could reconstruct them manually. And
why would not triggers be called if tables is being modified through
INSERTs? So if PostgreSQL has such a feature, why make it limited and
artificially make it less powerful? It is literally not possible to
have triggers only because there is "if trigger on a materialized
view, throw an exception".

> Triggers should generally refer to user-facing commands

So triggers on table A are not run when some other trigger from table
B decides to insert data into table A? Not true. I think triggers have
never cared who and where an INSERT came from. They just trigger. From
user, from another trigger, or from some built-in PostgreSQL procedure
called REFRESH.

> Could you not make a trigger on REFRESH itself?

If you mean if I could simulate this somehow before or after I call
REFRESH, then not really. I would not have access to previous state of
the table to compute the diff anymore. Moreover, I would have to
recompute the diff again, when REFRESH already did it once.

I could implement materialized views myself using regular tables and
triggers. And then have triggers after change on that table. But this
sounds very sad.

Or, are you saying that we should introduce a whole new type of of
trigger, REFRESH trigger, which would be valid only on materialized
views, and get OLD and NEW relations for previous and old state? I
think this could be an option, but it would require much more work,
and more changes to API. Is this what community would prefer?

> This is also a problem, because it would allow bypassing the trigger
> accidentally.

Sure, this is why it is useful to explain that CONCURRENT REFRESH uses
INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH
does not (but it is faster). I explained this in documentation.

But yes, this is downside. I checked the idea of calling row-level
triggers after regular REFRESH, but it seems it will introduce a lot
of overhead and special handling. I tried implementing it as TRUNCATE
+ INSERTS instead of heap swap and it is 2x slower.

> Moreover, consider that there could be updatable materialized views,
> just like there are updatable normal views.  And there could be triggers
> on those updatable materialized views.  Those would look similar but
> work quite differently from what you are proposing here.

Hm, not really. I would claim they would behave exactly the same.
AFTER trigger on INSERT on a materialized view would trigger for rows
which have changed through user updating materialized view directly,
or by calling CONCURRENT REFRESH which inserted a row. In both cases
the same trigger would run because materialized view had a row
inserted. Pretty nice.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

12