jsonb_set() strictness considered harmful to data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
71 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

jsonb_set() strictness considered harmful to data

Ariadne Conill
Hello,

I am one of the primary maintainers of Pleroma, a federated social
networking application written in Elixir, which uses PostgreSQL in
ways that may be considered outside the typical usage scenarios for
PostgreSQL.

Namely, we leverage JSONB heavily as a backing store for JSON-LD
documents[1].  We also use JSONB in combination with Ecto's "embedded
structs" to store things like user preferences.

The fact that we can use JSONB to achieve our design goals is a
testament to the flexibility PostgreSQL has.

However, in the process of doing so, we have discovered a serious flaw
in the way jsonb_set() functions, but upon reading through this
mailing list, we have discovered that this flaw appears to be an
intentional design.[2]

A few times now, we have written migrations that do things like copy
keys in a JSONB object to a new key, to rename them.  These migrations
look like so:

   update users set info=jsonb_set(info, '{bar}', info->'foo');

Typically, this works nicely, except for cases where evaluating
info->'foo' results in an SQL null being returned.  When that happens,
jsonb_set() returns an SQL null, which then results in data loss.[3]

This is not acceptable.  PostgreSQL is a database that is renowned for
data integrity, but here it is wiping out data when it encounters a
failure case.  The way jsonb_set() should fail in this case is to
simply return the original input: it should NEVER return SQL null.

But hey, we've been burned by this so many times now that we'd like to
donate a useful function to the commons, consider it a mollyguard for
the real jsonb_set() function.

    create or replace function safe_jsonb_set(target jsonb, path
text[], new_value jsonb, create_missing boolean default true) returns
jsonb as $$
    declare
      result jsonb;
    begin
      result := jsonb_set(target, path, coalesce(new_value,
'null'::jsonb), create_missing);
      if result is NULL then
        return target;
      else
        return result;
      end if;
    end;
    $$ language plpgsql;

This safe_jsonb_set() wrapper should not be necessary.  PostgreSQL's
own jsonb_set() should have this safety feature built in.  Without it,
using jsonb_set() is like playing russian roulette with your data,
which is not a reasonable expectation for a database renowned for its
commitment to data integrity.

Please fix this bug so that we do not have to hack around this bug.
It has probably ruined countless people's days so far.  I don't want
to hear about how the function is strict, I'm aware it is strict, and
that strictness is harmful.  Please fix the function so that it is
actually safe to use.

[1]: JSON-LD stands for JSON Linked Data.  Pleroma has an "internal
representation" that shares similar qualities to JSON-LD, so I use
JSON-LD here as a simplification.

[2]: https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1@...

[3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an
example of data loss induced by this issue.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Mark Felder-2


On Fri, Oct 18, 2019, at 12:37, Ariadne Conill wrote:

> Hello,
>
> I am one of the primary maintainers of Pleroma, a federated social
> networking application written in Elixir, which uses PostgreSQL in
> ways that may be considered outside the typical usage scenarios for
> PostgreSQL.
>
> Namely, we leverage JSONB heavily as a backing store for JSON-LD
> documents[1].  We also use JSONB in combination with Ecto's "embedded
> structs" to store things like user preferences.
>
> The fact that we can use JSONB to achieve our design goals is a
> testament to the flexibility PostgreSQL has.
>
> However, in the process of doing so, we have discovered a serious flaw
> in the way jsonb_set() functions, but upon reading through this
> mailing list, we have discovered that this flaw appears to be an
> intentional design.[2]
>
> A few times now, we have written migrations that do things like copy
> keys in a JSONB object to a new key, to rename them.  These migrations
> look like so:
>
>    update users set info=jsonb_set(info, '{bar}', info->'foo');
>
> Typically, this works nicely, except for cases where evaluating
> info->'foo' results in an SQL null being returned.  When that happens,
> jsonb_set() returns an SQL null, which then results in data loss.[3]
>
> This is not acceptable.  PostgreSQL is a database that is renowned for
> data integrity, but here it is wiping out data when it encounters a
> failure case.  The way jsonb_set() should fail in this case is to
> simply return the original input: it should NEVER return SQL null.
>
> But hey, we've been burned by this so many times now that we'd like to
> donate a useful function to the commons, consider it a mollyguard for
> the real jsonb_set() function.
>
>     create or replace function safe_jsonb_set(target jsonb, path
> text[], new_value jsonb, create_missing boolean default true) returns
> jsonb as $$
>     declare
>       result jsonb;
>     begin
>       result := jsonb_set(target, path, coalesce(new_value,
> 'null'::jsonb), create_missing);
>       if result is NULL then
>         return target;
>       else
>         return result;
>       end if;
>     end;
>     $$ language plpgsql;
>
> This safe_jsonb_set() wrapper should not be necessary.  PostgreSQL's
> own jsonb_set() should have this safety feature built in.  Without it,
> using jsonb_set() is like playing russian roulette with your data,
> which is not a reasonable expectation for a database renowned for its
> commitment to data integrity.
>
> Please fix this bug so that we do not have to hack around this bug.
> It has probably ruined countless people's days so far.  I don't want
> to hear about how the function is strict, I'm aware it is strict, and
> that strictness is harmful.  Please fix the function so that it is
> actually safe to use.
>
> [1]: JSON-LD stands for JSON Linked Data.  Pleroma has an "internal
> representation" that shares similar qualities to JSON-LD, so I use
> JSON-LD here as a simplification.
>
> [2]: https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1@...
>
> [3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an
> example of data loss induced by this issue.
>
> Ariadne
>

This should be directed towards the hackers list, too.

What will it take to change the semantics of jsonb_set()? MySQL implements safe behavior here. It's a real shame Postgres does not. I'll offer a $200 bounty to whoever fixes it. I'm sure it's destroyed more than $200 worth of data and people's time by now, but it's something.


Kind regards,



--
  Mark Felder
  ports-secteam & portmgr alumni
  [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Christoph Moench-Tegeder
In reply to this post by Ariadne Conill
## Ariadne Conill ([hidden email]):

>    update users set info=jsonb_set(info, '{bar}', info->'foo');
>
> Typically, this works nicely, except for cases where evaluating
> info->'foo' results in an SQL null being returned.  When that happens,
> jsonb_set() returns an SQL null, which then results in data loss.[3]

So why don't you use the facilities of SQL to make sure to only
touch the rows which match the prerequisites?

  UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
    WHERE info->'foo' IS NOT NULL;

No special wrappers required.

Regards,
Christoph

--
Spare Space


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

David G Johnston
On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
## Ariadne Conill ([hidden email]):

>    update users set info=jsonb_set(info, '{bar}', info->'foo');
>
> Typically, this works nicely, except for cases where evaluating
> info->'foo' results in an SQL null being returned.  When that happens,
> jsonb_set() returns an SQL null, which then results in data loss.[3]

So why don't you use the facilities of SQL to make sure to only
touch the rows which match the prerequisites?

  UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
    WHERE info->'foo' IS NOT NULL;


There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
In reply to this post by Christoph Moench-Tegeder
Hello,

On Fri, Oct 18, 2019 at 4:50 PM Christoph Moench-Tegeder
<[hidden email]> wrote:

>
> ## Ariadne Conill ([hidden email]):
>
> >    update users set info=jsonb_set(info, '{bar}', info->'foo');
> >
> > Typically, this works nicely, except for cases where evaluating
> > info->'foo' results in an SQL null being returned.  When that happens,
> > jsonb_set() returns an SQL null, which then results in data loss.[3]
>
> So why don't you use the facilities of SQL to make sure to only
> touch the rows which match the prerequisites?
>
>   UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>     WHERE info->'foo' IS NOT NULL;

Why don't we fix the database engine to not eat data when the
jsonb_set() operation fails?  Telling people to work around design
flaws in the software is what I would expect of MySQL, not a database
known for its data integrity.

Obviously, it is possible to adjust the UPDATE statement to only match
certain pre-conditions, *if you know those pre-conditions may be a
problem*.  What happens with us, and with other people who have hit
this bug with jsonb_set() is that they hit issues that were not
previously known about, and that's when jsonb_set() eats your data.

I would also like to point out that the MySQL equivalent, json_set()
when presented with a similar failure simply returns the unmodified
input.  It is not unreasonable to do the same in PostgreSQL.
Personally, as a developer, I expect PostgreSQL to be on their game
better than MySQL.

> No special wrappers required.

A special wrapper is needed because jsonb_set() does broken things
when invoked in situations that do not match the preconceptions of
those situations.  We will have to ship this wrapper for several years
because of the current behaviour of the jsonb_set() function.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
In reply to this post by David G Johnston
Hello,

On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
<[hidden email]> wrote:

>
> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
>>
>> ## Ariadne Conill ([hidden email]):
>>
>> >    update users set info=jsonb_set(info, '{bar}', info->'foo');
>> >
>> > Typically, this works nicely, except for cases where evaluating
>> > info->'foo' results in an SQL null being returned.  When that happens,
>> > jsonb_set() returns an SQL null, which then results in data loss.[3]
>>
>> So why don't you use the facilities of SQL to make sure to only
>> touch the rows which match the prerequisites?
>>
>>   UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>>     WHERE info->'foo' IS NOT NULL;
>>
>
> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.

A great example of how we got burned by this last year: Pleroma
maintains pre-computed counters in JSONB for various types of
activities (posts, followers, followings).  Last year, another counter
was added, with a migration.  But some people did not run the
migration, because they are users, and that's what users do.  This
resulted in Pleroma blanking out the `info` structure for users as
they performed new activities that incremented that counter.  At that
time, Pleroma maintained various things like private keys used to sign
things in that JSONB column (we no longer do this because of being
burned by this several times now), which broke federation temporarily
for the affected accounts with other servers for up to a week as those
servers had to learn new public keys for those accounts (since the
original private keys were lost).

I believe that anything that can be catastrophically broken by users
not following upgrade instructions precisely is a serious problem, and
can lead to serious problems.  I am sure that this is not the only
project using JSONB which have had users destroy their own data in
such a completely preventable fashion.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Christoph Moench-Tegeder
In reply to this post by Ariadne Conill
## Ariadne Conill ([hidden email]):

> Why don't we fix the database engine to not eat data when the
> jsonb_set() operation fails?

It didn't fail, it worked like SQL (you've been doing SQL for too
long when you get used to the NULL propagation, but that's still
what SQL does - check "+" for example).
And changing a function will cause fun for everyone who relies on
the current behaviour - so at least it shouldn't be done on a whim
(some might argue that a whim was what got us into this situation
in the first place).

Continuing along that thought, I'd even argue that your are
writing code which relies on properties of the data which you never
guaranteed. There is a use case for data types and constraints.
Not that I'm arguing for maximum surprise in programming, but
I'm a little puzzled when people eschew thew built-in tools and
start implmenting them again side-to-side with what's already
there.

Regards,
Christoph

--
Spare Space


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Adrian Klaver-4
In reply to this post by Ariadne Conill
On 10/18/19 3:11 PM, Ariadne Conill wrote:

> Hello,
>
> On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
> <[hidden email]> wrote:
>>
>> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
>>>
>>> ## Ariadne Conill ([hidden email]):
>>>
>>>>     update users set info=jsonb_set(info, '{bar}', info->'foo');
>>>>
>>>> Typically, this works nicely, except for cases where evaluating
>>>> info->'foo' results in an SQL null being returned.  When that happens,
>>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
>>>
>>> So why don't you use the facilities of SQL to make sure to only
>>> touch the rows which match the prerequisites?
>>>
>>>    UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>>>      WHERE info->'foo' IS NOT NULL;
>>>
>>
>> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.
>
> A great example of how we got burned by this last year: Pleroma
> maintains pre-computed counters in JSONB for various types of
> activities (posts, followers, followings).  Last year, another counter
> was added, with a migration.  But some people did not run the
> migration, because they are users, and that's what users do.  This

So you are more forgiving of your misstep, allowing users to run
outdated code, then of running afoul of Postgres documented behavior:

https://www.postgresql.org/docs/11/functions-json.html
" The field/element/path extraction operators return NULL, rather than
failing, if the JSON input does not have the right structure to match
the request; for example if no such element exists"

Just trying to figure why one is worse then the other.

> resulted in Pleroma blanking out the `info` structure for users as
> they performed new activities that incremented that counter.  At that
> time, Pleroma maintained various things like private keys used to sign
> things in that JSONB column (we no longer do this because of being
> burned by this several times now), which broke federation temporarily
> for the affected accounts with other servers for up to a week as those
> servers had to learn new public keys for those accounts (since the
> original private keys were lost).
>
> I believe that anything that can be catastrophically broken by users
> not following upgrade instructions precisely is a serious problem, and
> can lead to serious problems.  I am sure that this is not the only
> project using JSONB which have had users destroy their own data in
> such a completely preventable fashion.
>
> Ariadne
>
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
Hello,

On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:

>
> On 10/18/19 3:11 PM, Ariadne Conill wrote:
> > Hello,
> >
> > On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
> > <[hidden email]> wrote:
> >>
> >> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
> >>>
> >>> ## Ariadne Conill ([hidden email]):
> >>>
> >>>>     update users set info=jsonb_set(info, '{bar}', info->'foo');
> >>>>
> >>>> Typically, this works nicely, except for cases where evaluating
> >>>> info->'foo' results in an SQL null being returned.  When that happens,
> >>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
> >>>
> >>> So why don't you use the facilities of SQL to make sure to only
> >>> touch the rows which match the prerequisites?
> >>>
> >>>    UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
> >>>      WHERE info->'foo' IS NOT NULL;
> >>>
> >>
> >> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.
> >
> > A great example of how we got burned by this last year: Pleroma
> > maintains pre-computed counters in JSONB for various types of
> > activities (posts, followers, followings).  Last year, another counter
> > was added, with a migration.  But some people did not run the
> > migration, because they are users, and that's what users do.  This
>
> So you are more forgiving of your misstep, allowing users to run
> outdated code, then of running afoul of Postgres documented behavior:

I'm not forgiving of either.

> https://www.postgresql.org/docs/11/functions-json.html
> " The field/element/path extraction operators return NULL, rather than
> failing, if the JSON input does not have the right structure to match
> the request; for example if no such element exists"

It is known that the extraction operators return NULL.  The problem
here is jsonb_set() returning NULL when it encounters SQL NULL.

> Just trying to figure why one is worse then the other.

Any time a user loses data, it is worse.  The preference for not
having data loss is why Pleroma uses PostgreSQL as it's database of
choice, as PostgreSQL has traditionally valued durability.  If we
should not use PostgreSQL, just say so.

Ariadne

>
> > resulted in Pleroma blanking out the `info` structure for users as
> > they performed new activities that incremented that counter.  At that
> > time, Pleroma maintained various things like private keys used to sign
> > things in that JSONB column (we no longer do this because of being
> > burned by this several times now), which broke federation temporarily
> > for the affected accounts with other servers for up to a week as those
> > servers had to learn new public keys for those accounts (since the
> > original private keys were lost).
> >
> > I believe that anything that can be catastrophically broken by users
> > not following upgrade instructions precisely is a serious problem, and
> > can lead to serious problems.  I am sure that this is not the only
> > project using JSONB which have had users destroy their own data in
> > such a completely preventable fashion.
> >
> > Ariadne
> >
> >
> >
>
>
> --
> Adrian Klaver
> [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
In reply to this post by Christoph Moench-Tegeder
Hello,

On Fri, Oct 18, 2019 at 5:57 PM Christoph Moench-Tegeder
<[hidden email]> wrote:

>
> ## Ariadne Conill ([hidden email]):
>
> > Why don't we fix the database engine to not eat data when the
> > jsonb_set() operation fails?
>
> It didn't fail, it worked like SQL (you've been doing SQL for too
> long when you get used to the NULL propagation, but that's still
> what SQL does - check "+" for example).
> And changing a function will cause fun for everyone who relies on
> the current behaviour - so at least it shouldn't be done on a whim
> (some might argue that a whim was what got us into this situation
> in the first place).

NULL propagation makes sense in the context of traditional SQL.  What
users expect from the JSONB support is for it to behave as JSON
manipulation behaves everywhere else.  It makes sense that 2 + NULL
returns NULL -- it's easily understood as a type mismatch.  It does
not make sense that jsonb_set('{}'::jsonb, '{foo}', NULL) returns NULL
because a *value* was SQL NULL.  In this case, it should, at the
least, automatically coalesce to 'null'::jsonb.

> Continuing along that thought, I'd even argue that your are
> writing code which relies on properties of the data which you never
> guaranteed. There is a use case for data types and constraints.

There is a use case, but this frequently comes up as a question people
ask.  At some point, you have to start pondering whether the behaviour
does not make logical sense in the context that people frame the JSONB
type and it's associated manipulation functions.  It is not *obvious*
that jsonb_set() will trash your data, but that is what it is capable
of doing.  In a database that is advertised as being durable and not
trashing data, even.

> Not that I'm arguing for maximum surprise in programming, but
> I'm a little puzzled when people eschew thew built-in tools and
> start implmenting them again side-to-side with what's already
> there.

If you read the safe_jsonb_set() function, all we do is coalesce any
SQL NULL to 'null'::jsonb, which is what it should be doing anyway,
and then additionally handling any *unanticipated* failure case on top
of that.  While you are arguing that we should use tools to work
around unanticipated effects (that are not even documented -- in no
place in the jsonb_set() documentation does it say "if you pass SQL
NULL to this function as a value, it will return SQL NULL"), I am
arguing that jsonb_set() shouldn't set people up for their data to be
trashed in the first place.

Even MySQL gets this right.  MySQL!  The database that everyone knows
takes your data out for a night it will never forget.  This argument
is miserable.  It does not matter to me how jsonb_set() works as long
as it does not return NULL when passed NULL as a value to set.  JSONB
columns should be treated as the complex types that they are: you
don't null out an entire hash table because someone set a key to SQL
NULL.  So, please, let us fix this.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Stephen Frost
In reply to this post by Ariadne Conill
Greetings,

* Ariadne Conill ([hidden email]) wrote:

> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:
> > https://www.postgresql.org/docs/11/functions-json.html
> > " The field/element/path extraction operators return NULL, rather than
> > failing, if the JSON input does not have the right structure to match
> > the request; for example if no such element exists"
>
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.
>
> > Just trying to figure why one is worse then the other.
>
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.
Your contention that the documented, clear, and easily addressed
behavior of a particular strict function equates to "the database system
loses data and isn't durable" is really hurting your arguments here, not
helping it.

The argument about how it's unintuitive and can cause application
developers to misuse the function (which is clearly an application bug,
but perhaps an understandable one if the function interface isn't
intuitive or is confusing) is a reasonable one and might be convincing
enough to result in a change here.

I'd suggest sticking to the latter argument when making this case.

> > > I believe that anything that can be catastrophically broken by users
> > > not following upgrade instructions precisely is a serious problem, and
> > > can lead to serious problems.  I am sure that this is not the only
> > > project using JSONB which have had users destroy their own data in
> > > such a completely preventable fashion.

Let's be clear here that the issue with the upgrade instructions was
that the user didn't follow your *application's* upgrade instructions,
and your later code wasn't written to use the function, as documented,
properly- this isn't a case of PG destroying your data.  It's fine to
contend that the interface sucks and that we should change it, but the
argument that PG is eating data because the application sent a query to
the database telling it, based on our documentation, to eat the data,
isn't appropriate.  Again, let's have a reasonable discussion here about
if it makes sense to make a change here because the interface isn't
intuitive and doesn't match what other systems do (I'm guessing it isn't
in the SQL standard either, so we unfortunately can't look to that for
help; though I'd hardly be surprised if they supported what PG does
today anyway).

As a practical response to the issue you've raised- have you considered
using a trigger to check the validity of the new jsonb?  Or, maybe, just
made the jsonb column not nullable?  With a trigger you could disallow
non-null->null transistions, for example, or if it just shouldn't ever
be null then making the column 'not null' would suffice.

I'll echo Christoph's comments up thread too, though in my own language-
these are risks you've explicitly accepted by using JSONB and writing
your own validation and checks (or, not, apparently) rather than using
what the database system provides.  That doesn't mean I'm against
making the change you suggest, but it certainly should become a lesson
to anyone who is considering using primairly jsonb for their storage
that it's risky to do so, because you're removing the database system's
knowledge and understanding of the data, and further you tend to end up
not having the necessary constraints in place to ensure that the data
doesn't end up being garbage- thus letting your application destroy all
the data easily due to an application bug.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Stephen Frost
In reply to this post by Ariadne Conill
Greetings,

* Ariadne Conill ([hidden email]) wrote:

> On Fri, Oct 18, 2019 at 5:57 PM Christoph Moench-Tegeder
> <[hidden email]> wrote:
> > ## Ariadne Conill ([hidden email]):
> > > Why don't we fix the database engine to not eat data when the
> > > jsonb_set() operation fails?
> >
> > It didn't fail, it worked like SQL (you've been doing SQL for too
> > long when you get used to the NULL propagation, but that's still
> > what SQL does - check "+" for example).
> > And changing a function will cause fun for everyone who relies on
> > the current behaviour - so at least it shouldn't be done on a whim
> > (some might argue that a whim was what got us into this situation
> > in the first place).
>
> NULL propagation makes sense in the context of traditional SQL.  What
> users expect from the JSONB support is for it to behave as JSON
> manipulation behaves everywhere else.  It makes sense that 2 + NULL
> returns NULL -- it's easily understood as a type mismatch.  It does
> not make sense that jsonb_set('{}'::jsonb, '{foo}', NULL) returns NULL
> because a *value* was SQL NULL.  In this case, it should, at the
> least, automatically coalesce to 'null'::jsonb.
2 + NULL isn't a type mismatch, just to be clear, it's "2 + unknown =
unknown", which is pretty reasonable, if you accept the general notion
of what NULL is to begin with.

And as such, what follows with "set this blob of stuff to include this
unknown thing ... implies ... we don't know what the result of the set
is and therefore it's unknown" isn't entirely unreasonable, but I can
agree that in this specific case, because what we're dealing with
regarding JSONB is a complex data structure, not an individual value,
that it's surprising to a developer and there can be an argument made
there that we should consider changing it.

> > Continuing along that thought, I'd even argue that your are
> > writing code which relies on properties of the data which you never
> > guaranteed. There is a use case for data types and constraints.
>
> There is a use case, but this frequently comes up as a question people
> ask.  At some point, you have to start pondering whether the behaviour
> does not make logical sense in the context that people frame the JSONB
> type and it's associated manipulation functions.  It is not *obvious*
> that jsonb_set() will trash your data, but that is what it is capable
> of doing.  In a database that is advertised as being durable and not
> trashing data, even.
Having the result of a call to a strict function be NULL isn't
"trashing" your data.

> > Not that I'm arguing for maximum surprise in programming, but
> > I'm a little puzzled when people eschew thew built-in tools and
> > start implmenting them again side-to-side with what's already
> > there.
>
> If you read the safe_jsonb_set() function, all we do is coalesce any
> SQL NULL to 'null'::jsonb, which is what it should be doing anyway,

I'm not convinced that this is at all the right answer, particularly
since we don't do that generally.  We don't return the string 'null'
when you do: NULL || 'abc', we return NULL.  There might be something we
can do here that doesn't result in the whole jsonb document becoming
NULL though.

> and then additionally handling any *unanticipated* failure case on top
> of that.  While you are arguing that we should use tools to work
> around unanticipated effects (that are not even documented -- in no
> place in the jsonb_set() documentation does it say "if you pass SQL
> NULL to this function as a value, it will return SQL NULL"), I am
> arguing that jsonb_set() shouldn't set people up for their data to be
> trashed in the first place.

The function is marked as strict, and the meaning of that is quite
clearly defined in the documentation.  I'm not against including a
comment regarding this in the documentation, to be clear, though I
seriously doubt it would actually have changed anything in this case.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Adrian Klaver-4
In reply to this post by Ariadne Conill
On 10/18/19 4:31 PM, Ariadne Conill wrote:

> Hello,
>
> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:
>>
>> On 10/18/19 3:11 PM, Ariadne Conill wrote:
>>> Hello,
>>>
>>> On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
>>> <[hidden email]> wrote:
>>>>
>>>> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
>>>>>
>>>>> ## Ariadne Conill ([hidden email]):
>>>>>
>>>>>>      update users set info=jsonb_set(info, '{bar}', info->'foo');
>>>>>>
>>>>>> Typically, this works nicely, except for cases where evaluating
>>>>>> info->'foo' results in an SQL null being returned.  When that happens,
>>>>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
>>>>>
>>>>> So why don't you use the facilities of SQL to make sure to only
>>>>> touch the rows which match the prerequisites?
>>>>>
>>>>>     UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>>>>>       WHERE info->'foo' IS NOT NULL;
>>>>>
>>>>
>>>> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.
>>>
>>> A great example of how we got burned by this last year: Pleroma
>>> maintains pre-computed counters in JSONB for various types of
>>> activities (posts, followers, followings).  Last year, another counter
>>> was added, with a migration.  But some people did not run the
>>> migration, because they are users, and that's what users do.  This
>>
>> So you are more forgiving of your misstep, allowing users to run
>> outdated code, then of running afoul of Postgres documented behavior:
>
> I'm not forgiving of either.
>
>> https://www.postgresql.org/docs/11/functions-json.html
>> " The field/element/path extraction operators return NULL, rather than
>> failing, if the JSON input does not have the right structure to match
>> the request; for example if no such element exists"
>
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.

I'm not following. Your original case was:

jsonb_set(info, '{bar}', info->'foo');

where info->'foo' is equivalent to:

test=# select '{"f1":1,"f2":null}'::jsonb ->'f3';
  ?column?
----------
  NULL

So you know there is a possibility that a value extraction could return
NULL and from your wrapper that COALESCE is the way to deal with this.


>
>> Just trying to figure why one is worse then the other.
>
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.

There are any number of ways you can make Postgres lose data that are
not related to durability e.g build the following in code:

DELETE FROM some_table;

and forget the WHERE.

>
> Ariadne
>
>>
>>> resulted in Pleroma blanking out the `info` structure for users as
>>> they performed new activities that incremented that counter.  At that
>>> time, Pleroma maintained various things like private keys used to sign
>>> things in that JSONB column (we no longer do this because of being
>>> burned by this several times now), which broke federation temporarily
>>> for the affected accounts with other servers for up to a week as those
>>> servers had to learn new public keys for those accounts (since the
>>> original private keys were lost).
>>>
>>> I believe that anything that can be catastrophically broken by users
>>> not following upgrade instructions precisely is a serious problem, and
>>> can lead to serious problems.  I am sure that this is not the only
>>> project using JSONB which have had users destroy their own data in
>>> such a completely preventable fashion.
>>>
>>> Ariadne
>>>
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> [hidden email]
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
In reply to this post by Stephen Frost
Hello,

On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <[hidden email]> wrote:

>
> Greetings,
>
> * Ariadne Conill ([hidden email]) wrote:
> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:
> > > https://www.postgresql.org/docs/11/functions-json.html
> > > " The field/element/path extraction operators return NULL, rather than
> > > failing, if the JSON input does not have the right structure to match
> > > the request; for example if no such element exists"
> >
> > It is known that the extraction operators return NULL.  The problem
> > here is jsonb_set() returning NULL when it encounters SQL NULL.
> >
> > > Just trying to figure why one is worse then the other.
> >
> > Any time a user loses data, it is worse.  The preference for not
> > having data loss is why Pleroma uses PostgreSQL as it's database of
> > choice, as PostgreSQL has traditionally valued durability.  If we
> > should not use PostgreSQL, just say so.
>
> Your contention that the documented, clear, and easily addressed
> behavior of a particular strict function equates to "the database system
> loses data and isn't durable" is really hurting your arguments here, not
> helping it.
>
> The argument about how it's unintuitive and can cause application
> developers to misuse the function (which is clearly an application bug,
> but perhaps an understandable one if the function interface isn't
> intuitive or is confusing) is a reasonable one and might be convincing
> enough to result in a change here.
>
> I'd suggest sticking to the latter argument when making this case.
>
> > > > I believe that anything that can be catastrophically broken by users
> > > > not following upgrade instructions precisely is a serious problem, and
> > > > can lead to serious problems.  I am sure that this is not the only
> > > > project using JSONB which have had users destroy their own data in
> > > > such a completely preventable fashion.
>
> Let's be clear here that the issue with the upgrade instructions was
> that the user didn't follow your *application's* upgrade instructions,
> and your later code wasn't written to use the function, as documented,
> properly- this isn't a case of PG destroying your data.  It's fine to
> contend that the interface sucks and that we should change it, but the
> argument that PG is eating data because the application sent a query to
> the database telling it, based on our documentation, to eat the data,
> isn't appropriate.  Again, let's have a reasonable discussion here about
> if it makes sense to make a change here because the interface isn't
> intuitive and doesn't match what other systems do (I'm guessing it isn't
> in the SQL standard either, so we unfortunately can't look to that for
> help; though I'd hardly be surprised if they supported what PG does
> today anyway).

Okay, I will admit that saying PG is eating data is perhaps
hyperbolic, but I will also say that the behaviour of jsonb_set()
under this type of edge case is unintuitive and frequently results in
unintended data loss.  So, while PostgreSQL is not actually eating the
data, it is putting the user in a position where they may suffer data
loss if they are not extremely careful.

Here is how other implementations handle this case:

MySQL/MariaDB:

select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
   {"a":null,"b":2,"c":3}

Microsoft SQL Server:

select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
   {"b":2,"c":3}

Both of these outcomes make sense, given the nature of JSON objects.
I am actually more in favor of what MSSQL does however, I think that
makes the most sense of all.

I did not compare to other database systems, because using them I
found that there is a JSON_TABLE type function and then you do stuff
with that to rewrite the object and dump it back out as JSON, and it's
quite a mess.  But MySQL and MSSQL have an equivalent jsonb inline
modification function, as seen above.

> As a practical response to the issue you've raised- have you considered
> using a trigger to check the validity of the new jsonb?  Or, maybe, just
> made the jsonb column not nullable?  With a trigger you could disallow
> non-null->null transistions, for example, or if it just shouldn't ever
> be null then making the column 'not null' would suffice.

We have already mitigated the issue in a way we find appropriate to
do.  The suggestion of having a non-null constraint does seem useful
as well and I will look into that.

> I'll echo Christoph's comments up thread too, though in my own language-
> these are risks you've explicitly accepted by using JSONB and writing
> your own validation and checks (or, not, apparently) rather than using
> what the database system provides.  That doesn't mean I'm against
> making the change you suggest, but it certainly should become a lesson
> to anyone who is considering using primairly jsonb for their storage
> that it's risky to do so, because you're removing the database system's
> knowledge and understanding of the data, and further you tend to end up
> not having the necessary constraints in place to ensure that the data
> doesn't end up being garbage- thus letting your application destroy all
> the data easily due to an application bug.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
In reply to this post by Adrian Klaver-4
Hello,

On Fri, Oct 18, 2019 at 7:04 PM Adrian Klaver <[hidden email]> wrote:

>
> On 10/18/19 4:31 PM, Ariadne Conill wrote:
> > Hello,
> >
> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:
> >>
> >> On 10/18/19 3:11 PM, Ariadne Conill wrote:
> >>> Hello,
> >>>
> >>> On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
> >>> <[hidden email]> wrote:
> >>>>
> >>>> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <[hidden email]> wrote:
> >>>>>
> >>>>> ## Ariadne Conill ([hidden email]):
> >>>>>
> >>>>>>      update users set info=jsonb_set(info, '{bar}', info->'foo');
> >>>>>>
> >>>>>> Typically, this works nicely, except for cases where evaluating
> >>>>>> info->'foo' results in an SQL null being returned.  When that happens,
> >>>>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
> >>>>>
> >>>>> So why don't you use the facilities of SQL to make sure to only
> >>>>> touch the rows which match the prerequisites?
> >>>>>
> >>>>>     UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
> >>>>>       WHERE info->'foo' IS NOT NULL;
> >>>>>
> >>>>
> >>>> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function.  I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict.  Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.
> >>>
> >>> A great example of how we got burned by this last year: Pleroma
> >>> maintains pre-computed counters in JSONB for various types of
> >>> activities (posts, followers, followings).  Last year, another counter
> >>> was added, with a migration.  But some people did not run the
> >>> migration, because they are users, and that's what users do.  This
> >>
> >> So you are more forgiving of your misstep, allowing users to run
> >> outdated code, then of running afoul of Postgres documented behavior:
> >
> > I'm not forgiving of either.
> >
> >> https://www.postgresql.org/docs/11/functions-json.html
> >> " The field/element/path extraction operators return NULL, rather than
> >> failing, if the JSON input does not have the right structure to match
> >> the request; for example if no such element exists"
> >
> > It is known that the extraction operators return NULL.  The problem
> > here is jsonb_set() returning NULL when it encounters SQL NULL.
>
> I'm not following. Your original case was:
>
> jsonb_set(info, '{bar}', info->'foo');
>
> where info->'foo' is equivalent to:
>
> test=# select '{"f1":1,"f2":null}'::jsonb ->'f3';
>   ?column?
> ----------
>   NULL
>
> So you know there is a possibility that a value extraction could return
> NULL and from your wrapper that COALESCE is the way to deal with this.

You're not following because you don't want to follow.

It does not matter that info->'foo' is in my example.  That's not what
I am talking about.

What I am talking about is that jsonb_set(..., ..., NULL) returns SQL NULL.

postgres=# \pset null '(null)'
Null display is "(null)".
postgres=# select jsonb_set('{"a":1,"b":2,"c":3}'::jsonb, '{a}', NULL);
jsonb_set
-----------
(null)
(1 row)

This behaviour is basically giving an application developer a loaded
shotgun and pointing it at their feet.  It is not a good design.  It
is a design which has likely lead to many users experiencing
unintentional data loss.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Pavel Stehule
Hi


What I am talking about is that jsonb_set(..., ..., NULL) returns SQL NULL.

postgres=# \pset null '(null)'
Null display is "(null)".
postgres=# select jsonb_set('{"a":1,"b":2,"c":3}'::jsonb, '{a}', NULL);
jsonb_set
-----------
(null)
(1 row)

This behaviour is basically giving an application developer a loaded
shotgun and pointing it at their feet.  It is not a good design.  It
is a design which has likely lead to many users experiencing
unintentional data loss.

on second hand - PostgreSQL design is one possible that returns additional information if value was changed or not.

Unfortunately It is very low probably so the design of this function will be changed - just it is not a bug (although I fully agree, it has different behave than has other databases and for some usages it is not practical). Probably there will be some applications that needs NULL result in situations when value was not changed or when input value has not expected format. Design using in Postgres allows later customization - you can implement with COALESCE very simply behave that you want (sure, you have to know what you do). If Postgres implement design used by MySQL, then there is not any possibility to react on situation when update is not processed.

Is not hard to implement second function with different name that has behave that you need and you expect - although it is just

CREATE OR REPLACE FUNCTION jsonb_modify(jsonb, text[], jsonb)
RETURNS jsonb AS $$
SELECT jsonb_set($1, $2, COALESCE($3, "null"::jsonb), true);
$$ LANGUAGE sql;

It is important to understand so JSON NULL is not PostgreSQL NULL. In this case is not problem in PostgreSQL design because it is consistent with everything in PG, but in bad expectations. Unfortunately, there are lot of wrong expectations, and these cannot be covered by Postgres design because then Postgres will be very not consistent software. You can see - my function jsonb_modify is what you are expect, and can works for you perfectly, but from system perspective is not consistent, and very strong not consistent. Users should not to learn where NULL has different behave or where NULL is JSON__NULL. Buildin functions should be consistent in Postgres. It is Postgres, not other databases.

Pavel





Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

David G Johnston
On Friday, October 18, 2019, Pavel Stehule <[hidden email]> wrote:
 
Probably there will be some applications that needs NULL result in situations when value was not changed or when input value has not expected format. Design using in Postgres allows later customization - you can implement with COALESCE very simply behave that you want (sure, you have to know what you do). If Postgres implement design used by MySQL, then there is not any possibility to react on situation when update is not processed.

A CASE expression seems like it would work well for such detection in the rare case it is needed.  Current behavior is unsafe with minimal or no redeeming qualities.  Change it so passing in null raises an exception and make the user decide their own behavior if we don’t want to choose one for them.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Pavel Stehule


so 19. 10. 2019 v 7:41 odesílatel David G. Johnston <[hidden email]> napsal:
On Friday, October 18, 2019, Pavel Stehule <[hidden email]> wrote:
 
Probably there will be some applications that needs NULL result in situations when value was not changed or when input value has not expected format. Design using in Postgres allows later customization - you can implement with COALESCE very simply behave that you want (sure, you have to know what you do). If Postgres implement design used by MySQL, then there is not any possibility to react on situation when update is not processed.

A CASE expression seems like it would work well for such detection in the rare case it is needed.  Current behavior is unsafe with minimal or no redeeming qualities.  Change it so passing in null raises an exception and make the user decide their own behavior if we don’t want to choose one for them.

How you can do it? Buildn functions cannot to return more than one value. The NULL is one possible signal how to emit this informations.

The NULL value can be problem everywhere - and is not consistent to raise exception somewhere and elsewhere not.

I agree so the safe way is raising exception on NULL. Unfortunately, exception handling is pretty expensive in Postres (more in write transactions), so it should be used only when it is really necessary.





David J.

Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Ariadne Conill
Hello,

On Sat, Oct 19, 2019 at 12:52 AM Pavel Stehule <[hidden email]> wrote:

>
>
>
> so 19. 10. 2019 v 7:41 odesílatel David G. Johnston <[hidden email]> napsal:
>>
>> On Friday, October 18, 2019, Pavel Stehule <[hidden email]> wrote:
>>
>>>
>>> Probably there will be some applications that needs NULL result in situations when value was not changed or when input value has not expected format. Design using in Postgres allows later customization - you can implement with COALESCE very simply behave that you want (sure, you have to know what you do). If Postgres implement design used by MySQL, then there is not any possibility to react on situation when update is not processed.
>>
>>
>> A CASE expression seems like it would work well for such detection in the rare case it is needed.  Current behavior is unsafe with minimal or no redeeming qualities.  Change it so passing in null raises an exception and make the user decide their own behavior if we don’t want to choose one for them.
>
>
> How you can do it? Buildn functions cannot to return more than one value. The NULL is one possible signal how to emit this informations.
>
> The NULL value can be problem everywhere - and is not consistent to raise exception somewhere and elsewhere not.
>
> I agree so the safe way is raising exception on NULL. Unfortunately, exception handling is pretty expensive in Postres (more in write transactions), so it should be used only when it is really necessary.

I would say that any thing like

update whatever set column=jsonb_set(column, '{foo}', NULL)

should throw an exception.  It should do, literally, *anything* else
but blank that column.

Ariadne


Reply | Threaded
Open this post in threaded view
|

Re: jsonb_set() strictness considered harmful to data

Tomas Vondra-4
In reply to this post by Ariadne Conill
On Fri, Oct 18, 2019 at 09:14:09PM -0500, Ariadne Conill wrote:

>Hello,
>
>On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <[hidden email]> wrote:
>>
>> Greetings,
>>
>> * Ariadne Conill ([hidden email]) wrote:
>> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <[hidden email]> wrote:
>> > > https://www.postgresql.org/docs/11/functions-json.html
>> > > " The field/element/path extraction operators return NULL, rather than
>> > > failing, if the JSON input does not have the right structure to match
>> > > the request; for example if no such element exists"
>> >
>> > It is known that the extraction operators return NULL.  The problem
>> > here is jsonb_set() returning NULL when it encounters SQL NULL.
>> >
>> > > Just trying to figure why one is worse then the other.
>> >
>> > Any time a user loses data, it is worse.  The preference for not
>> > having data loss is why Pleroma uses PostgreSQL as it's database of
>> > choice, as PostgreSQL has traditionally valued durability.  If we
>> > should not use PostgreSQL, just say so.
>>
>> Your contention that the documented, clear, and easily addressed
>> behavior of a particular strict function equates to "the database system
>> loses data and isn't durable" is really hurting your arguments here, not
>> helping it.
>>
>> The argument about how it's unintuitive and can cause application
>> developers to misuse the function (which is clearly an application bug,
>> but perhaps an understandable one if the function interface isn't
>> intuitive or is confusing) is a reasonable one and might be convincing
>> enough to result in a change here.
>>
>> I'd suggest sticking to the latter argument when making this case.
>>
>> > > > I believe that anything that can be catastrophically broken by users
>> > > > not following upgrade instructions precisely is a serious problem, and
>> > > > can lead to serious problems.  I am sure that this is not the only
>> > > > project using JSONB which have had users destroy their own data in
>> > > > such a completely preventable fashion.
>>
>> Let's be clear here that the issue with the upgrade instructions was
>> that the user didn't follow your *application's* upgrade instructions,
>> and your later code wasn't written to use the function, as documented,
>> properly- this isn't a case of PG destroying your data.  It's fine to
>> contend that the interface sucks and that we should change it, but the
>> argument that PG is eating data because the application sent a query to
>> the database telling it, based on our documentation, to eat the data,
>> isn't appropriate.  Again, let's have a reasonable discussion here about
>> if it makes sense to make a change here because the interface isn't
>> intuitive and doesn't match what other systems do (I'm guessing it isn't
>> in the SQL standard either, so we unfortunately can't look to that for
>> help; though I'd hardly be surprised if they supported what PG does
>> today anyway).
>
>Okay, I will admit that saying PG is eating data is perhaps
>hyperbolic,

My experience is that using such hyperbole is pretty detrimental, even
when one is trying to make a pretty sensible case. The problem is that
people often respond in a similarly hyperbolic claims, particularly when
you hit a nerve. And that's exactly what happened here, becase we're
*extremely* sensitive about data corruption issues, so when you claim
PostgreSQL is "eating data" people are likely to jump on you, beating
you with the documentation stick. It's unfortunate, but it's also
entirely predictable.

>but I will also say that the behaviour of jsonb_set()
>under this type of edge case is unintuitive and frequently results in
>unintended data loss.  So, while PostgreSQL is not actually eating the
>data, it is putting the user in a position where they may suffer data
>loss if they are not extremely careful.
>
>Here is how other implementations handle this case:
>
>MySQL/MariaDB:
>
>select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
>   {"a":null,"b":2,"c":3}
>
>Microsoft SQL Server:
>
>select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
>   {"b":2,"c":3}
>
>Both of these outcomes make sense, given the nature of JSON objects.
>I am actually more in favor of what MSSQL does however, I think that
>makes the most sense of all.
>

I do mostly agree with this. The json[b]_set behavior seems rather
surprising, and I think I've seen a couple of cases running into exactly
this issue. I've solved that with a simple CASE, but maybe changing the
behavior would be better. That's unlikely to be back-patchable, though,
so maybe a better option is to create a non-strict wrappers. But that
does not work when the user is unaware of the behavior :-(

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


1234