Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

tutiluren
Even though I highly appreciate that PostgreSQL, a database software which doesn't cost money, exists *at all*, that fact is oftentimes overshadowed by a small but important number of very frustrating issues which I consider to largely ruin the overall "experience" of using PostgreSQL. I'd call them "almost show-stopping". I realize very well that not everyone "has the same priorities" (clearly not) or the expertise, will and free time to work on a certain "area" of the overall project/ecosystem, but these issues/bugs/limitations are so problematic to me that I have to express them directly to the PostgreSQL developers:

1. All non-ANSI characters are turned into "?"s for application_name. My administration panel is thus full of gibberish such as: "?a??? ???? ? ?m??? ??i???u?" and it always looks as if something is awfully broken. I would not dare to show it to a CEO or other important person, as they'd just go: "We're switching to IBM, effective immediately. Throw this open source rubbish out at once!" The explanation I've heard for this is that it's basically a security issue, as it's possible to set the application_name before the something-something (safe Unicode handler code?) has kicked in, but I have no problems with setting the application_name to Unicode characters *after* the database connection has already been fully established, in a separate query, as I already do, and I doubt that anyone else would have, either. So that explanation, while probably technically true, doesn't seem to make any sense.

2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options (at least the --exclude-table-data one, which is the one I've tested) on Windows, resulting in it being impossible to make more "sophisticated" backups of PostgreSQL databases; it's either all or nothing. Other programs, including my own test scripts and commands, are perfectly able to use any Unicode character sent from/through both cmd.exe and PHP CLI, but not pg_dump, so the idea that "Windows it at fault" here just doesn't seem true. (Although I don't doubt for a second that it often *is* the case... Microsoft is not a nice entity in any way.) I spent a lot of time and efforts experimenting with and asking about this, but eventually gave up and concluded that it was yet another bug in an open source project "only" on Windows with no real/pressing interest in fixing it. For me, this means that I lose a ton of fresh data every day, or have to make *gigantic* backups. (I have several huge "temporary debug log" tables whose data have zero long-term value but tons of short-term value.) It makes me feel crippled and excluded in an uncomfortable manner.

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

5. Ever since my early days with PG in the mid-2000s, I've tried numerous times to read the manual, wikis and comments for the configuration files, specifically the performance directives, and asked many, many times for help about that, yet never been able to figure out what they want me to enter for all the numerous options. At this point, it isn't me being lazy/stupid; it's objectively very difficult to understand all of that. The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available". I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour. Also, besides the purely monetary issue, there are serious privacy implications with that scene. I wouldn't want an outsider to have intimate knowledge of my database system, which more than probably is a requirement for them to be able to do their job properly.

I'm sorry if any of the above sounds insulting/"entitled". These are the main things which truly bother me about PG and the "PG ecosystem", and I'd love to hear some first-hand comments on them. At least point 1 and 2 seem like they would be almost trivial to fix, at least compared to the rest.
Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Guyren Howe
On Sep 14, 2020, 13:22 -0700, [hidden email], wrote:
3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial

I can’t for the life of me imagine how you arrived at this. SQLite is very capable indeed. Its dialect of SQL is (deliberately) very similar to Postgres, featuring such niceties as recursive CTEs and window functions, and it can handle heavy use and multi-terabyte databases if you need (cf bedrockdb).
Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Guyren Howe
In reply to this post by tutiluren
Regarding indexes, I recommend the website use-the-index-luke.com. That guy’s other website about modern SQL is also great.

Regarding self-tuning, it was actually part of the original vision for relational databases that they would do that, but I’m not aware of any modern SQL database that does, although there are third-party tools that will eg offer index suggestions (I don’t know which is best, so I won’t propose any).
On Sep 14, 2020, 13:22 -0700, [hidden email], wrote:
4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!
Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Ravi Krishna-6
In reply to this post by Guyren Howe
I can’t for the life of me imagine how you arrived at this. SQLite is very capable indeed.
>Its dialect of SQL is (deliberately) very similar to Postgres, featuring such niceties as
>recursive CTEs and window functions, and it can handle heavy use and multi-terabyte
> databases if you need (cf bedrockdb).


But it is still a single user database in the sense that concurrent access to a table is not
handled well.  I use it for single user embedded applications


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Tom Lane-2
In reply to this post by tutiluren
[hidden email] writes:
> 1. All non-ANSI characters are turned into "?"s for application_name.

Yeah, that's hard to do much with unfortunately.  We cannot assume that
all databases in an installation share the same encoding, so for globally
visible strings like application_name, the only safe solution is to
restrict them to ASCII.

On the other hand, the very same thing could be said of database names
and role names, yet we have never worried much about whether those were
encoding-safe when viewed from databases with different encodings, nor
have there been many complaints about the theoretical unsafety.  So maybe
this is just overly anal-retentive and we should drop the restriction,
or at least pass through data that doesn't appear to be invalidly
encoded.

> 2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options
> (at least the --exclude-table-data one, which is the one I've tested) on
> Windows, resulting in it being impossible to make more "sophisticated"
> backups of PostgreSQL databases; it's either all or nothing.

TBH, I'm going to throw that back on you as probably pilot error.  There's
no reason for such cases not to work if you're running the terminal window
in the same encoding that pg_dump thinks it's using.

> I spent a lot of time and efforts experimenting with and asking about
> this, but eventually gave up and concluded that it was yet another bug
> in an open source project "only" on Windows with no real/pressing
> interest in fixing it.

Yeah, I remember that discussion.  It being open source cuts two ways:
problems get fixed by people who have the motivation to find a workable
fix.  Not being a Windows user, I don't particularly care about whether
this case is broken or not, and even if I cared more, I do not have
the resources to figure it out myself.  There are other people around the
project who do Windows, but you evidently have not managed to persuade
them that this is something they should spend time on, either.

> 3. The ability to embed PG to run in an automatic, quiet manner as part
> of something else.

That would be a packaging problem for the "something else" to figure out,
no?  There are enough platform-specific issues about "automatic" that
I don't see it as very practical for the core Postgres project to take
on, in any case.  I'm aware of some packagings, such as Postgres.app for
macOS (https://postgresapp.com), that might be closer to what you are
looking for than the core project ever could be.

> 4. There is no built-in means to have PG manage (or even suggest)
> indexes on its own.

Indeed, and that does not seem to me like something that ought to be
built in.  There is at least one "index advisor" plug-in floating
around, which perhaps could help you.

> 5. Ever since my early days with PG in the mid-2000s, I've tried
> numerous times to read the manual, wikis and comments for the
> configuration files, specifically the performance directives, and asked
> many, many times for help about that, yet never been able to figure out
> what they want me to enter for all the numerous options. At this point,
> it isn't me being lazy/stupid; it's objectively very difficult to
> understand all of that.

We do the best we can as far as the documentation goes; if you have
concrete suggestions about how to improve that, we'll surely listen.

> The practical end result of this is that I've always gone back to using
> the untouched default configuration file (except for the logging-related
> options), which, especially in the past on FreeBSD, *severely* crippled
> my PG database to not even come close to taking advantage of the full
> power of the hardware.

You seem to be wishing that the default configuration was designed to
try to commandeer the whole machine.  It's deliberate project policy
that that not be so --- and given that a couple of bullet points up,
you were arguing for Postgres to be easily usable as an invisible
part of some other app, it doesn't seem like you really want it to
be so either.

There is a lot of info out there about Postgres tuning, and yes some
of it is contradictory, because one size doesn't fit all.  People
have different goals about how they want the system to act, and the
proper settings may vary across platforms or PG versions, and there's a
fair amount of just plain different opinions.  But I think you may be
overcomplicating it.  IMO there's not that much you have to adjust to
start with --- maybe just increase shared_buffers and possibly work_mem
--- and then tweak other settings only when you find yourself running into
that limit.  If you can figure out our logging settings then you can
figure out the rest of this, too.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

raf-6
On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane <[hidden email]> wrote:

> [hidden email] writes:
> > 1. All non-ANSI characters are turned into "?"s for application_name.
>
> Yeah, that's hard to do much with unfortunately.  We cannot assume that
> all databases in an installation share the same encoding, so for globally
> visible strings like application_name, the only safe solution is to
> restrict them to ASCII.
>
> On the other hand, the very same thing could be said of database names
> and role names, yet we have never worried much about whether those were
> encoding-safe when viewed from databases with different encodings, nor
> have there been many complaints about the theoretical unsafety.  So maybe
> this is just overly anal-retentive and we should drop the restriction,
> or at least pass through data that doesn't appear to be invalidly
> encoded.

Perhaps recode database/role names from the source
database's encoding into utf8, and then recode from utf8
to the destination database's encoding?

For "globally visible strings", maybe recode to the
client_encoding setting, or a new encoding setting for
this purpose since client_encoding seems to be linked
to the database that the client is connected to. I'd've
thought that the application name would arrive encoded
as client_encoding (which defaults to the database
encoding). Maybe globally visible strings need to be
recoded on arrival from the client_encoding to utf8 (or
a "server_encoding" or "global_encoding" setting) so
they are always stored in a known encoding so they can
be recoded as necessary when viewed via connections to
other databases using a different encoding.

Just some thoughts. If they don't make any sense, feel
free to ignore them. :-)

> > 5. Ever since my early days with PG in the mid-2000s, I've tried
> > numerous times to read the manual, wikis and comments for the
> > configuration files, specifically the performance directives, and asked
> > many, many times for help about that, yet never been able to figure out
> > what they want me to enter for all the numerous options. At this point,
> > it isn't me being lazy/stupid; it's objectively very difficult to
> > understand all of that.
>
> We do the best we can as far as the documentation goes; if you have
> concrete suggestions about how to improve that, we'll surely listen.

I thought the documentation on postgres performance tuning was fine.
Here's what I got from it (for a dedicated database server host):

  shared_buffers = 1GB # Should be 1/4 of RAM
  work_mem = 16MB # Should be bigger to do more sorts in-memory but it's per sort per user so not too big
  maintenance_work_mem = 128MB # Should be <= 256MB to make vacuum fast without taking away too much RAM from other tasks
  min_wal_size = 480MB # Should be at least 3 * 10 * 16MB to avoid too many checkpoints slowing down writes
  checkpoint_completion_target = 0.9 # Should be 0.9 if checkpoint_segments was increased to spread out checkpoint writes
  random_page_cost = 4.0 # Should be 4.0 for HDD, 1.5-2.5 for SSD, 1.0-1.01 if db fits entirely in RAM
  effective_cache_size = 3GB # Should be 1/2 to 3/4 of RAM

It's the "Should..." comments that matter. And it might be out of date...
Actually, it is out of date. The comment for checkpoint_completion_target
refers to checkpoint_segments which no longer exists (since 9.5) so
disregard that.

cheers,
raf



Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Tom Lane-2
raf <[hidden email]> writes:
> On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane <[hidden email]> wrote:
>> On the other hand, the very same thing could be said of database names
>> and role names, yet we have never worried much about whether those were
>> encoding-safe when viewed from databases with different encodings, nor
>> have there been many complaints about the theoretical unsafety.  So maybe
>> this is just overly anal-retentive and we should drop the restriction,
>> or at least pass through data that doesn't appear to be invalidly
>> encoded.

> Perhaps recode database/role names from the source
> database's encoding into utf8, and then recode from utf8
> to the destination database's encoding?

A lot of people seem to believe that transcoding through utf8
is 100% safe.  They're wrong :-( --- the Japanese, at least,
have reason not to trust it, because of the existence of multiple
incompatible conversion standards.  And you're still left with the
question of what to do when the destination encoding hasn't
got the character.

Moreover, this is all moderately expensive unless the encodings in
question are already utf8 or latin1.  So if we go this way I'd
prefer to do it as I said above -- just drop or question-mark-ize
any characters that don't pass validation in the recipient DB.
That's fairly cheap and it will work perfectly in the typical case
where the whole cluster is on one encoding anyway.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Bruce Momjian
In reply to this post by Tom Lane-2
On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane wrote:

> [hidden email] writes:
> > 1. All non-ANSI characters are turned into "?"s for application_name.
>
> Yeah, that's hard to do much with unfortunately.  We cannot assume that
> all databases in an installation share the same encoding, so for globally
> visible strings like application_name, the only safe solution is to
> restrict them to ASCII.
>
> On the other hand, the very same thing could be said of database names
> and role names, yet we have never worried much about whether those were
> encoding-safe when viewed from databases with different encodings, nor
> have there been many complaints about the theoretical unsafety.  So maybe
> this is just overly anal-retentive and we should drop the restriction,
> or at least pass through data that doesn't appear to be invalidly
> encoded.

I think the issue is that role and database names are controlled by
privileged users, while application_name is not.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Bruce Momjian
In reply to this post by tutiluren
On Mon, Sep 14, 2020 at 10:22:31PM +0200, [hidden email] wrote:

> 4. There is no built-in means to have PG manage (or even suggest) indexes on
> its own. Trying to figure out what indexes to create/delete/fine-tune, and
> determine all the extremely complex rules for this art (yes, I just called
> index management an *art*, because it is!), is just utterly hopeless to me. It
> never gets any easier. Not even after many years. It's the by far worst part of
> databases to me (combined with point five). Having to use third-party solutions
> ensures that it isn't done in practice, at least for me. I don't trust, nor do
> I want to deal with, external software and extensions in my databases. I still
> have nightmares from PostGIS, which I only keep around, angrily, out of
> absolute necessity. I fundamentally don't like third-party add-ons to things,
> but want the core product to properly support things. Besides, this (adding/
> managing indexes) is not even some niche/obscure use-case, but something which
> is crucial for basically any nontrivial database of any kind!

I think you did a very good job of explaining your issues.  I think the
underlying problem is that Postgres is targeting a wide market, and your
use-case for a more limited or self-contained database doesn't fit many
of those markets.  Also, PostGIS is one of the most complex extensions,
so adding simpler ones should not be as hard.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Tom Lane-2
In reply to this post by Bruce Momjian
Bruce Momjian <[hidden email]> writes:
> On Mon, Sep 14, 2020 at 05:39:57PM -0400, Tom Lane wrote:
>> On the other hand, the very same thing could be said of database names
>> and role names, yet we have never worried much about whether those were
>> encoding-safe when viewed from databases with different encodings, nor
>> have there been many complaints about the theoretical unsafety.  So maybe
>> this is just overly anal-retentive and we should drop the restriction,
>> or at least pass through data that doesn't appear to be invalidly
>> encoded.

> I think the issue is that role and database names are controlled by
> privileged users, while application_name is not.

That's certainly an argument against a completely laissez-faire approach,
but if we filtered invalidly-encoded data on the reading side, it seems
like we would be in good enough shape.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Joshua Drake-2
In reply to this post by tutiluren
Howdy,

First let me say thanks for the feedback! It is rare that we receive detailed "user feedback" on these lists so it is good to hear from the outside world. I am only going to address a few things as others have addressed the rest.
 
2. pg_dump misinterprets non-ANSI values for the "--exclude-*" options (at least the --exclude-table-data one, which is the one I've tested) on Windows, resulting in it being impossible to make more "sophisticated" backups of PostgreSQL databases; it's either all or nothing. Other programs, including my own test scripts and commands, are perfectly able to use any Unicode character sent from/through both cmd.exe and PHP CLI, but not pg_dump, so the idea that "Windows it at fault" here just doesn't seem true. (Although I don't doubt for a second that it often *is* the case... Microsoft is not a nice entity in any way.) I spent a lot of time and efforts experimenting with and asking about this, but eventually gave up and concluded that it was yet another bug in an open source project "only" on Windows with no real/pressing interest in fixing it. For me, this means that I lose a ton of fresh data every day, or have to make *gigantic* backups. (I have several huge "temporary debug log" tables whose data have zero long-term value but tons of short-term value.) It makes me feel crippled and excluded in an uncomfortable manner.


I have to agree that pg_dump is largely a step child backup program. It has consistently been found over the years to be lacking in a number of areas. Unfortunately, working on pg_dump isn't sexy and it is difficult to get volunteers or even paid resources to do such a thing. The real solution for pg_dump is a complete refactor which includes pg_dumpall and it is not a small undertaking. It should be noted that it is also a less and less used program. On our team it is normally used for only very specific needs (grabbing a schema) and we use binary backups or logical replication to receive specific data.

 
3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)


This is really using the wrong tool for the job type of issue. PG was never designed for such a scenario.
 
4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

I think you are looking at this from a very windows centric way. Open Source has its origins from the Unix paradigm where each tool was designed to solve one type of problem and you used multiple tools to create a "solution". Though we have strayed from that on some items due to the evolving nature of software needs, that is still at our core and for good reason. Having tools, flags etc... to do such things (including your point #3) creates complexity best left to "vendors" not the software project.
 

5. Ever since my early days with PG in the mid-2000s, I've tried numerous times to read the manual, wikis and comments for the configuration files, specifically the performance directives, and asked many, many times for help about that, yet never been able to figure out what they want me to enter for all the numerous options. At this point, it isn't me being lazy/stupid; it's objectively very difficult to understand all of that.

This is absolutely true. The Postgresql documentation is FANTASTIC if you already understand what is going on or you need a reference. We have improved this a bit in 13 with the glossary but we still don't have definitive "5 steps to make your PostgreSQL server fast" and that is very much because it is a complicated question and it takes a lot of knowledge to do it correctly.
 
The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available".

Not to be unkind but this does seem lazy. There are literally hundreds of "how to make postgres go fast", "how to optimize postgres" if you take 15 minutes to Google. It is true that the project (outside of the wiki) doesn't have much information in the official documentation but that doesn't mean that the information is not available.
 
I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour.

See my comment about Google. The information is out there and easy to find. There is a lot of fresh and free content right here:

 
I hope you find the information you are looking for. There is a very helpful community on Discord and Slack as well:


JD

Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Fabio Ugo Venchiarutti-2

On 21/09/2020 17:53, Joshua Drake wrote:

> 3. The ability to embed PG to run in an automatic, quiet manner as part
> of something else. I know about SQLite, but it's extremely limited to
> the point of being virtually useless IMO, which is why I cannot use that
> for anything nontrivial. I want my familiar PostgreSQL, only not require
> it to be manually and separately installed on the machine where it is to
> run as part of some "application". If I could just "embed" it, this
> would allow me to create a single EXE which I can simply put on a
> different machine to run my entire "system" which otherwise takes *tons*
> of tedious, error-prone manual labor to install, set up and maintain. Of
> course, this is probably much easier said than done, but I don't
> understand why PG's architecture necessarily dictates that PG must be a
> stand-alone, separate thing. Or rather, why some "glue" cannot enable it
> to be used just like SQLite from a *practical* perspective, even if it
> still is a "server-client model" underneath the hood. (Which doesn't
> matter at all to me, nor should it matter to anyone else.)

It depends what you mean by "embedded".
If you want sqlite's linked library approach, where the database calls
literally run your process' address space, then that's a no go, as
postgres is a multi-user database server with its own process hierarchy.


However, postgres also is a rather agnostic command that does not detach
from the parent's terminal/stdio unless instructed to, so nothing stops
your bespoke application from launching and managing its own postmaster
as a directly managed child process - started as part of application
initialisation - and having that listen on a local socket only available
to the application itself; this is what we implemented in some of our
installations where postgres is enslaved to the cluster control system -
the configuration is re-generated at every restart (IIRC Patroni does
something similar minus the unix socket part).


A plethora of systems are built around the notion of programs calling
other programs and managing the process' life cycle. The limiting factor
to such architecture tends to be the OS's process control semantics and
API (notoriously weaker or more contrived on non-unix-like OSs), but
that's not postgres' fault.





--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Jonathan Strong
Joshua - adding to what Fabio said (with which I agree wholeheartedly!) -

It will be worthwhile for you to itemize the attributes and needed features for making a final decision about the architecture of the application you plan to deploy.

While you are familiar with PostgreSQL and like its features, it may not wind up being the best match for what you are trying to accomplish. I'm not sure your concerns about SQLite are true roadblocks. If you are looking to deploy an executable supported by one or more DLLs but no external processes, you won't find many mature and truly embeddable DBMSs out there. SQLite is widely used and well supported, and some of the functionality you'd normally consider putting in stored procedures might fit in with the idea of creating custom functions. A few more steps than creating a PostgreSQL stored procedure or function, but perhaps this approach would make SQLite a reasonable choice for you. See:


Back to the question about attributes and features of the DBMS, and Fabio's question to you re: your definition of an embedded database. An embedded database is typically used by a single user and is only available within the context of the current application and user / memory space. It should also be self-tuning and self-managing, as a typical end user can't and shouldn't be expected to also handle any DBA tasks. Several databases are considered "embedded", but the criteria used for this label may not match your own. If you can spell out your specific expectations (e.g., the app and database can be deployed in a single exe, etc.), this will help tremendously in homing in on the right choice.

- Jon

    

Jonathan Strong

CIO / CTO / Consultant

P: 609-532-1715 E: [hidden email]

Quora Top Writer



On Tue, Sep 22, 2020 at 10:18 AM Fabio Ugo Venchiarutti <[hidden email]> wrote:

On 21/09/2020 17:53, Joshua Drake wrote:
> 3. The ability to embed PG to run in an automatic, quiet manner as part
> of something else. I know about SQLite, but it's extremely limited to
> the point of being virtually useless IMO, which is why I cannot use that
> for anything nontrivial. I want my familiar PostgreSQL, only not require
> it to be manually and separately installed on the machine where it is to
> run as part of some "application". If I could just "embed" it, this
> would allow me to create a single EXE which I can simply put on a
> different machine to run my entire "system" which otherwise takes *tons*
> of tedious, error-prone manual labor to install, set up and maintain. Of
> course, this is probably much easier said than done, but I don't
> understand why PG's architecture necessarily dictates that PG must be a
> stand-alone, separate thing. Or rather, why some "glue" cannot enable it
> to be used just like SQLite from a *practical* perspective, even if it
> still is a "server-client model" underneath the hood. (Which doesn't
> matter at all to me, nor should it matter to anyone else.)

It depends what you mean by "embedded".
If you want sqlite's linked library approach, where the database calls
literally run your process' address space, then that's a no go, as
postgres is a multi-user database server with its own process hierarchy.


However, postgres also is a rather agnostic command that does not detach
from the parent's terminal/stdio unless instructed to, so nothing stops
your bespoke application from launching and managing its own postmaster
as a directly managed child process - started as part of application
initialisation - and having that listen on a local socket only available
to the application itself; this is what we implemented in some of our
installations where postgres is enslaved to the cluster control system -
the configuration is re-generated at every restart (IIRC Patroni does
something similar minus the unix socket part).


A plethora of systems are built around the notion of programs calling
other programs and managing the process' life cycle. The limiting factor
to such architecture tends to be the OS's process control semantics and
API (notoriously weaker or more contrived on non-unix-like OSs), but
that's not postgres' fault.





--
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

--


Notice:
This email is confidential and may contain copyright material of
members of the Ocado Group. Opinions and views expressed in this message
may not necessarily reflect the opinions and views of the members of the
Ocado Group.

If you are not the intended recipient, please notify us
immediately and delete all copies of this message. Please note that it is
your responsibility to scan this message for viruses.

References to the
"Ocado Group" are to Ocado Group plc (registered in England and Wales with
number 7098618) and its subsidiary undertakings (as that expression is
defined in the Companies Act 2006) from time to time. The registered office
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way,
Hatfield, Hertfordshire, AL10 9UL.


Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

tutiluren
In reply to this post by tutiluren

Sep 21, 2020, 7:53 PM by [hidden email]:
I have to agree that pg_dump is largely a step child backup program. It has consistently been found over the years to be lacking in a number of areas. Unfortunately, working on pg_dump isn't sexy and it is difficult to get volunteers or even paid resources to do such a thing. The real solution for pg_dump is a complete refactor which includes pg_dumpall and it is not a small undertaking. It should be noted that it is also a less and less used program. On our team it is normally used for only very specific needs (grabbing a schema) and we use binary backups or logical replication to receive specific data.
Huh? Are you saying that there is another, superior way to back up PostgreSQL databases other than pg_dump? I re-read the manual on it just now, but didn't see a single word about it being "legacy" or "deprecated" or even that there's any other way to do it. What do you mean?

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)


This is really using the wrong tool for the job type of issue. PG was never designed for such a scenario.
I hate the "wrong tool for the job" argument. It assumes that everyone has infinite time, energy and brain capacity to learn endless redundant tools just to "use the right tool for the job" rather than "use what you actually know". I know PG. I don't know SQLite. They are very different. So obviously, I want to use PG.

What exactly makes PG unsuitable for this? I don't get it. But at the same time, I also realize that it's not going to happen at this point. The entire concept of a desktop computer appears to be phased out as we speak...

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

I think you are looking at this from a very windows centric way. Open Source has its origins from the Unix paradigm where each tool was designed to solve one type of problem and you used multiple tools to create a "solution". Though we have strayed from that on some items due to the evolving nature of software needs, that is still at our core and for good reason. Having tools, flags etc... to do such things (including your point #3) creates complexity best left to "vendors" not the software project.
While I understand what you mean, and even agree in theory, in practice, this always results in crappy third-party solutions which I don't want to deal with. PostGIS, for example, forces me to use "postgis" for its schema instead of "PostGIS" just because they arrogantly didn't construct their internal queries properly. "Little" things like that.

The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available".

Not to be unkind but this does seem lazy. There are literally hundreds of "how to make postgres go fast", "how to optimize postgres" if you take 15 minutes to Google. It is true that the project (outside of the wiki) doesn't have much information in the official documentation but that doesn't mean that the information is not available.
Hundreds of crappy, outdated, confusing, badly written "web tutorialz" are worth nothing. A couple of clear, unambiguous documentation paragraphs are worth their (metaphorical) weight in gold.

Claiming that "the information is out there" is just hand-waving. It's shifting the burden to the user to actively hunt for information, and very likely be misled by all the garbage articles out there. I learned some horrible practices early on from "web tutz" and it took me many years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also, my overall point was to not even have to deal with the specifics, but just be able to tell PG with a single config option that it's allowed to "use most of the machine's resources".

I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour.

See my comment about Google. The information is out there and easy to find.
I guess I'm the worst idiot in the world, then, who can't DuckDuckGo (Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for very intelligent people to understand (as often is the case, because you operate on a whole different level), but the performance-related PostgreSQL configuration options are a *nightmare* to me and many others. I spent *forever* reading about them and couldn't make any sense of it all. Each time I tried, I would give up, frustrated and angry, with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to understand. Even if we disregard the sheer intelligence factor, it's clear that users of PG don't have the same intimate knowledge of PG's internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and over again. Anyone who doesn't is either a genius or pretends/thinks that they understand it. (Or I'm extremely dumb.)

Very often, I get the feeling that things like that are the way they are on purpose. Work security and whatnot. But it's very frustrating for people like me who can't afford to buy help and don't have the enormous brain capacity necessary to comprehend the complex relations between the numerous performance-related config options. It really is that difficult.

Discord and Slack
Those modern services don't even let me load them. It's the same thing with everything these days: "verify with phone", "we've detected suspicious activity", "fake error message", etc.
Reply | Threaded
Open this post in threaded view
|

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Adrian Klaver-4
On 9/22/20 3:28 PM, [hidden email] wrote:

>
> Sep 21, 2020, 7:53 PM by [hidden email]:
>
>     I have to agree that pg_dump is largely a step child backup program.
>     It has consistently been found over the years to be lacking in a
>     number of areas. Unfortunately, working on pg_dump isn't sexy and it
>     is difficult to get volunteers or even paid resources to do such a
>     thing. The real solution for pg_dump is a complete refactor which
>     includes pg_dumpall and it is not a small undertaking. It should be
>     noted that it is also a less and less used program. On our team it
>     is normally used for only very specific needs (grabbing a schema)
>     and we use binary backups or logical replication to receive specific
>     data.
>
> Huh? Are you saying that there is another, superior way to back up
> PostgreSQL databases other than pg_dump? I re-read the manual on it just
> now, but didn't see a single word about it being "legacy" or
> "deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue
is that people try to do 'partial' dumps without reading the docs on
what those switches actually do and the potential conflicts therein. For
large installations there it is often better to use some form of
replication(binary and/or logical) to maintain a continuous backup.
pg_dump can take a long time on very big databases, so it may not keep up.


>
>     This is really using the wrong tool for the job type of issue. PG
>     was never designed for such a scenario.
>
> I hate the "wrong tool for the job" argument. It assumes that everyone
> has infinite time, energy and brain capacity to learn endless redundant
> tools just to "use the right tool for the job" rather than "use what you
> actually know". I know PG. I don't know SQLite. They are very different.
> So obviously, I want to use PG.

This comes down to what your definition of embedded is?  A matter of
determining whether we are talking apples or oranges.


>
> What exactly makes PG unsuitable for this? I don't get it. But at the
> same time, I also realize that it's not going to happen at this point.
> The entire concept of a desktop computer appears to be phased out as we
> speak...

>
> While I understand what you mean, and even agree in theory, in practice,
> this always results in crappy third-party solutions which I don't want
> to deal with. PostGIS, for example, forces me to use "postgis" for its
> schema instead of "PostGIS" just because they arrogantly didn't
> construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension
chooses to define it? I mean you could have number of permutations of
postgis.

>
>         The practical end result of this is that I've always gone back
>         to using the untouched default configuration file (except for
>         the logging-related options), which, especially in the past on
>         FreeBSD, *severely* crippled my PG database to not even come
>         close to taking advantage of the full power of the hardware.
>         Instead, it felt like I was using maybe 1% of the machine's
>         power, even with a proper database design and indexes and all of
>         that stuff, simply because the default config was so
>         "conservative" and it couldn't be just set to "use whatever
>         resources are available".
>
>
>     Not to be unkind but this does seem lazy. There are literally
>     hundreds of "how to make postgres go fast", "how to optimize
>     postgres" if you take 15 minutes to Google. It is true that the
>     project (outside of the wiki) doesn't have much information in the
>     official documentation but that doesn't mean that the information is
>     not available.
>
> Hundreds of crappy, outdated, confusing, badly written "web tutorialz"
> are worth nothing. A couple of clear, unambiguous documentation
> paragraphs are worth their (metaphorical) weight in gold.

The problem is the 'unambiguous' part. There are so many ways Postgres
is used, writing a configuration doc that suited everyone would have so
many if/and/or/buts that it would define ambiguity. If you want that
information post an email here with the information on your server specs
and proposed usage.

>
> Claiming that "the information is out there" is just hand-waving. It's
> shifting the burden to the user to actively hunt for information, and
> very likely be misled by all the garbage articles out there. I learned
> some horrible practices early on from "web tutz" and it took me many
> years to unlearn that stuff.
>
> I know that writing documentation isn't fun, but it's necessary. Also,
> my overall point was to not even have to deal with the specifics, but
> just be able to tell PG with a single config option that it's allowed to
> "use most of the machine's resources".

That would entail building an AI into  the code that would deal with all
the possible OS(versions), Postgres(versions), hardware permutations.

>
>         I wish so much for PG to have a mode where it self-tunes itself
>         as needed, over time, based on the actual workload, or at least
>         allowed some kind of abstract "performance mode" such as: "you
>         are allowed to use significant system resources, PG", or: "You
>         are one of my most important applications. Just use as much
>         power as you currently need, but at least save about 10% for the
>         rest of the system, will you?" Maybe this is also harder than it
>         sounds to accomplish, but for somebody like me who has zero
>         funding, I cannot hire some professional to sit down with me and
>         fine-tune my system for $899/hour.
>
>
>     See my comment about Google. The information is out there and easy
>     to find.
>
> I guess I'm the worst idiot in the world, then, who can't DuckDuckGo
> (Google is evil) it even after 15 years.
>
> Seriously, I didn't type my feedback "for fun". It may be difficult for
> very intelligent people to understand (as often is the case, because you
> operate on a whole different level), but the performance-related
> PostgreSQL configuration options are a *nightmare* to me and many
> others. I spent *forever* reading about them and couldn't make any sense
> of it all. Each time I tried, I would give up, frustrated and angry,
> with no real clue what "magic numbers" it wanted.
>
> It's quite baffling to me how this can be so difficult for you all to
> understand. Even if we disregard the sheer intelligence factor, it's
> clear that users of PG don't have the same intimate knowledge of PG's
> internals as the PG developers, nor could possibly be expected to.
>
> As mentioned, I kept going back to the default configuration over and
> over again. Anyone who doesn't is either a genius or pretends/thinks
> that they understand it. (Or I'm extremely dumb.)

Does your server runs to your satisfaction with the default settings?
If not what issues do you see?

>
> Very often, I get the feeling that things like that are the way they are
> on purpose. Work security and whatnot. But it's very frustrating for
> people like me who can't afford to buy help and don't have the enormous
> brain capacity necessary to comprehend the complex relations between the
> numerous performance-related config options. It really is that difficult.
>
>     Discord and Slack
>
> Those modern services don't even let me load them. It's the same thing
> with everything these days: "verify with phone", "we've detected
> suspicious activity", "fake error message", etc.


--
Adrian Klaver
[hidden email]