Improper use about DatumGetInt32

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Improper use about DatumGetInt32

Hou, Zhijie
Hi

In (/contrib/bloom/blutils.c:277), I found it use DatumGetInt32 to get UInt32 type.
Is it more appropriate to use DatumGetUInt32 here?

See the attachment for the patch

Bes regards,
houzj




0001-DatumGetUInt32.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Bharath Rupireddy
On Mon, Sep 21, 2020 at 6:47 AM Hou, Zhijie <[hidden email]> wrote:
>
> In (/contrib/bloom/blutils.c:277), I found it use DatumGetInt32 to get UInt32 type.
> Is it more appropriate to use DatumGetUInt32 here?
>

Makes sense. +1 for the patch. I think with the existing code also we
don't have any problem. If we assume that the hash functions return
uint32, with DatumGetInt32() we are typecasting that uint32 result to
int32, we are assigning it to uint32 i.e. typecasting int32 back to
uint32. Eventually, I think, we will see the proper value in hashVal.
I did a small experiment to prove this [1].

    uint32        hashVal;
    hashVal = DatumGetInt32(FunctionCall1Coll(&state->hashFn[attno],
state->collations[attno], value));

It's good to run a few test cases/test suites(if they exist) that hit
this part of the code, just to ensure we don't break anything.

[1]
int main()
{
    unsigned int u = 3 * 1024 * 1024 * 1024;

    printf("%u\n", u);
    int i = u;
    printf("%d\n", i);
    unsigned int u1 = i;
    printf("%u\n", u1);

    return 0;
}
Output of the above snippet:
3221225472
-1073741824
3221225472

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Robert Haas
In reply to this post by Hou, Zhijie
On Sun, Sep 20, 2020 at 9:17 PM Hou, Zhijie <[hidden email]> wrote:
> In (/contrib/bloom/blutils.c:277), I found it use DatumGetInt32 to get UInt32 type.
> Is it more appropriate to use DatumGetUInt32 here?

Typically, the DatumGetBlah() function that you pick should match the
SQL data type that the function is returning. So if the function
returns pg_catalog.int4, which corresponds to the C data type int32,
you would use DatumGetInt32. There is no SQL type corresponding to the
C data type uint32, so I'm not sure why we even have DatumGetUInt32.
I'm sort of suspicious that there's some fuzzy thinking going on
there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Tom Lane-2
Robert Haas <[hidden email]> writes:
> Typically, the DatumGetBlah() function that you pick should match the
> SQL data type that the function is returning. So if the function
> returns pg_catalog.int4, which corresponds to the C data type int32,
> you would use DatumGetInt32. There is no SQL type corresponding to the
> C data type uint32, so I'm not sure why we even have DatumGetUInt32.

xid?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2020-09-21 14:08:22 -0400, Robert Haas wrote:
> There is no SQL type corresponding to the C data type uint32, so I'm
> not sure why we even have DatumGetUInt32.  I'm sort of suspicious that
> there's some fuzzy thinking going on there.

I think we mostly use it for the few places where we currently expose
data as a signed integer on the SQL level, but internally actually treat
it as a unsigned data. There's not a lot of those, but there e.g. is
pg_class.relpages.  There also may be places where we use it for
functions that can be created but not called from SQL (using the
INTERNAL type).

- Andres


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Robert Haas
On Mon, Sep 21, 2020 at 3:53 PM Andres Freund <[hidden email]> wrote:
> On 2020-09-21 14:08:22 -0400, Robert Haas wrote:
> > There is no SQL type corresponding to the C data type uint32, so I'm
> > not sure why we even have DatumGetUInt32.  I'm sort of suspicious that
> > there's some fuzzy thinking going on there.
>
> I think we mostly use it for the few places where we currently expose
> data as a signed integer on the SQL level, but internally actually treat
> it as a unsigned data.

So why is the right solution to that not DatumGetInt32() + a cast to uint32?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Mon, Sep 21, 2020 at 3:53 PM Andres Freund <[hidden email]> wrote:
>> I think we mostly use it for the few places where we currently expose
>> data as a signed integer on the SQL level, but internally actually treat
>> it as a unsigned data.

> So why is the right solution to that not DatumGetInt32() + a cast to uint32?

You're ignoring the xid use-case, for which DatumGetUInt32 actually is
the right thing.  I tend to agree though that if the SQL argument is
of a signed type, the least API-abusing answer is a signed DatumGetXXX
macro followed by whatever cast you need.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Ashutosh Bapat-2
On Wed, Sep 23, 2020 at 1:41 AM Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > On Mon, Sep 21, 2020 at 3:53 PM Andres Freund <[hidden email]> wrote:
> >> I think we mostly use it for the few places where we currently expose
> >> data as a signed integer on the SQL level, but internally actually treat
> >> it as a unsigned data.
>
> > So why is the right solution to that not DatumGetInt32() + a cast to uint32?
>
> You're ignoring the xid use-case, for which DatumGetUInt32 actually is
> the right thing.
There is DatumGetTransactionId() which should be used instead.
That made me search if there's PG_GETARG_TRANSACTIONID() and yes it's
there but only defined in xid.c. So pg_xact_commit_timestamp(),
pg_xact_commit_timestamp_origin() and pg_get_multixact_members() use
PG_GETARG_UNIT32. IMO those should be changed to use
PG_GETARG_TRANSACTIONID. That would require moving
PG_GETARG_TRANSACTIONID somewhere outside xid.c; may be fmgr.h where
other PG_GETARG_* are.

> I tend to agree though that if the SQL argument is
> of a signed type, the least API-abusing answer is a signed DatumGetXXX
> macro followed by whatever cast you need.
>

I looked for some uses of PG_GETARG_UNIT32() which is the counterpart
of DatumGetUint32(). Found some buggy usages apart from the ones which
can be converted to PG_GETARG_TRANSACTIONID listed above.
normal_rand() for example returns a huge number of rows and takes
forever if we pass a negative first argument to it. Someone could
misuse that for a DOS attack or it could be just an accident that they
pass a negative value to that function and the query takes forever.
explain analyze select count(*) from normal_rand(-1000000, 1.0, 1.0);
                                                               QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12.50..12.51 rows=1 width=8) (actual
time=2077574.718..2077574.719 rows=1 loops=1)
   ->  Function Scan on normal_rand  (cost=0.00..10.00 rows=1000
width=0) (actual time=1005176.149..1729994.366 rows=4293967296
loops=1)
 Planning Time: 0.346 ms
 Execution Time: 2079034.835 ms

get_raw_page() also does similar thing but the effect is not as dangerous
SELECT octet_length(get_raw_page('test1', 'main', -1)) AS main_1;
  ERROR:  block number 4294967295 is out of range for relation "test1"
Similarly for bt_page_stats() and bt_page_items()

PFA patches to correct those.

There's Oracle compatible chr() which also uses PG_GETARG_UINT32() but
it's (accidentally?) reporting the negative inputs correctly because
it filters out very large values and reports those using %d. It's
arguable whether we should change that, so I have left it untouched.
But I think we should change that as well and get rid of
PG_GETARG_UNIT32 altogether. This will prevent any future misuse.


--
Best Wishes,
Ashutosh Bapat

0001-Handle-negative-number-of-tuples-passed-to-normal_ra.patch (3K) Download Attachment
0002-Negative-block-number-passed-to-functions-in-pageins.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Álvaro Herrera
On 2020-Sep-23, Ashutosh Bapat wrote:

> > You're ignoring the xid use-case, for which DatumGetUInt32 actually is
> > the right thing.
>
> There is DatumGetTransactionId() which should be used instead.
> That made me search if there's PG_GETARG_TRANSACTIONID() and yes it's
> there but only defined in xid.c. So pg_xact_commit_timestamp(),
> pg_xact_commit_timestamp_origin() and pg_get_multixact_members() use
> PG_GETARG_UNIT32. IMO those should be changed to use
> PG_GETARG_TRANSACTIONID. That would require moving
> PG_GETARG_TRANSACTIONID somewhere outside xid.c; may be fmgr.h where
> other PG_GETARG_* are.

Hmm, yeah, I think this would be a good idea.

> get_raw_page() also does similar thing but the effect is not as dangerous
> SELECT octet_length(get_raw_page('test1', 'main', -1)) AS main_1;
>   ERROR:  block number 4294967295 is out of range for relation "test1"
> Similarly for bt_page_stats() and bt_page_items()

Hmm, but page numbers above signed INT_MAX are valid.  So this would
prevent reading all legitimate pages past that.



Reply | Threaded
Open this post in threaded view
|

Re: Improper use about DatumGetInt32

Ashutosh Bapat-3


On Fri, 16 Oct 2020 at 19:26, Alvaro Herrera <[hidden email]> wrote:
On 2020-Sep-23, Ashutosh Bapat wrote:

> > You're ignoring the xid use-case, for which DatumGetUInt32 actually is
> > the right thing.
>
> There is DatumGetTransactionId() which should be used instead.
> That made me search if there's PG_GETARG_TRANSACTIONID() and yes it's
> there but only defined in xid.c. So pg_xact_commit_timestamp(),
> pg_xact_commit_timestamp_origin() and pg_get_multixact_members() use
> PG_GETARG_UNIT32. IMO those should be changed to use
> PG_GETARG_TRANSACTIONID. That would require moving
> PG_GETARG_TRANSACTIONID somewhere outside xid.c; may be fmgr.h where
> other PG_GETARG_* are.

Hmm, yeah, I think this would be a good idea.

The patch 0003 does that.
 

> get_raw_page() also does similar thing but the effect is not as dangerous
> SELECT octet_length(get_raw_page('test1', 'main', -1)) AS main_1;
>   ERROR:  block number 4294967295 is out of range for relation "test1"
> Similarly for bt_page_stats() and bt_page_items()

Hmm, but page numbers above signed INT_MAX are valid.  So this would
prevent reading all legitimate pages past that.


According to https://www.postgresql.org/docs/12/datatype-numeric.html, these functions shouldn't be accepting values higher than INT_MAX since it's outside the integer data type range. But may be it's a convenient way to avoid using bigint. Anyway those changes are separate in 0002 patch which can be discarded as a whole. But for now I am keeping it in the bunch.

--
Best Wishes,
Ashutosh

0003-Extern-alize-PG_GETARG_TRANSACTIONID-and-PG_RETURN_T.patch (5K) Download Attachment
0001-Handle-negative-number-of-tuples-passed-to-normal_ra.patch (3K) Download Attachment
0002-Negative-block-number-passed-to-functions-in-pageins.patch (9K) Download Attachment