CHAR vs NVARCHAR vs TEXT performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CHAR vs NVARCHAR vs TEXT performance

Rob-3
Hi Guys,

I wanted to get some thoughts about a type-specific performance problem
we hit
through our application tier.

The full conversation is here:
https://github.com/npgsql/npgsql/issues/2283

Basically, if a table exists with a PK which is CHAR(n) and a query is
sent with
VARCHAR or CHAR then it uses an Index Scan. If the query is sent with
TEXT as the
type then postgresql casts the column to TEXT (rather than the value to
CHAR) and
it does a Seq Scan.

So far this has only showed itself on npgsql (I've been unable to
reproduce on
other drivers), I think it's because npgsql only sends TEXT whereas
other drivers
tend to send VARCHAR (other drivers includes the official JDBC driver).

I guess the root question is: is TEXT supposed to be identical to
VARCHAR in all scenarios?

Thanks,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: CHAR vs NVARCHAR vs TEXT performance

Tom Lane-2
Rob <[hidden email]> writes:
> Basically, if a table exists with a PK which is CHAR(n) and a query is
> sent with VARCHAR or CHAR then it uses an Index Scan. If the query is
> sent with TEXT as the type then postgresql casts the column to TEXT
> (rather than the value to CHAR) and it does a Seq Scan.

Yeah, this is an artifact of the fact that text is considered a
"preferred type" so it wins out in the parser's choice of which
type to promote to.  See

https://www.postgresql.org/docs/current/typeconv-oper.html

> I guess the root question is: is TEXT supposed to be identical to
> VARCHAR in all scenarios?

It's not for this purpose, because varchar isn't a preferred type.

FWIW, my recommendation for this sort of thing is almost always
to not use CHAR(n).  The use-case for that datatype pretty much
disappeared with the last IBM Model 029 card punch.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: CHAR vs NVARCHAR vs TEXT performance

Rob-3
I agree in principle, however in this particular scenario it's not
our schema so we're a little reluctant to migrate the types etc.

We're in a bit of a bad place because the combination of NHibernate
+ npgsql3/4 + this table = seqScans everywhere. Basically when npgsql
changed their default type for strings from VARCHAR to TEXT it caused
this behaviour.

I suppose the follow up question is: should drivers
default to sending types that are preferred by postgres (i.e. TEXT)
rather than compatible types (VARCHAR). If so, is there a reason why
the JDBC driver doesn't send TEXT  (possibly a question for the JDBC
guys rather than here)?

Thanks,
Rob

On 2019-04-30 00:16, Thomas Munro wrote:

> On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <[hidden email]> wrote:
>> FWIW, my recommendation for this sort of thing is almost always
>> to not use CHAR(n).  The use-case for that datatype pretty much
>> disappeared with the last IBM Model 029 card punch.
>
> +1 on the recommendation for PostgreSQL.
>
> I do think it's useful on slightly more recent IBM technology than the
> 029 though.  It's been a few years since I touched it, but DB2 manuals
> and experts in this decade recommended fixed size types in some
> circumstances, and they might in theory be useful on any
> in-place-update system (and maybe us in some future table AM?).  For
> example, you can completely exclude the possibility of having to spill
> to another page when updating (DB2 DBAs measure and complain about
> rate of 'overflow' page usage which they consider failure and we
> consider SOP), you can avoid wasting space on the length (at the cost
> of wasting space on trailing spaces, if the contents vary in length),
> you can get O(1) access to fixed sized attributes (perhaps even
> updating single attributes).  These aren't nothing, and I've seen DB2
> DBAs get TPS improvements from that kind of stuff.  (From memory this
> type of thing was also a reason to think carefully about which tables
> should use compression, because the fixed size space guarantees went
> out the window.).



Reply | Threaded
Open this post in threaded view
|

Re: CHAR vs NVARCHAR vs TEXT performance

Steve Crawford

> On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <[hidden email]> wrote:
>> FWIW, my recommendation for this sort of thing is almost always
>> to not use CHAR(n).  The use-case for that datatype pretty much
>> disappeared with the last IBM Model 029 card punch.
...



Perhaps the "tip" on the character datatype page (https://www.postgresql.org/docs/11/datatype-character.html) should be updated as the statement "There is no performance difference among these three types..." could easily lead a reader down the wrong path. The statement may be true if one assumes the planner is able to make an optimal choice but clearly there are cases that prevent that. If the situation is better explained elsewhere in the documentation then just a link to that explanation may be all that is needed.

Cheers,
Steve