parallel query in 9.6.6 + oracle_fdw -> ERROR: invalid cache ID: 41

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

parallel query in 9.6.6 + oracle_fdw -> ERROR: invalid cache ID: 41

Christian Mair
Hi!

I've found a problem in either PostgreSQL 9.6.6 or oracle_fdw 2.0.0.

Background:

I was testing a setup on a current CentOS 7 system with PostgreSQL 9.6.6
installed from the PGDG repositories and oracle_fdw 2.0.0 installed via pgxn
install. Everything went absolutely fine until I enabled the parallel query
feature (I set max_worker_processes and max_parallel_workers_per_gather both to
8).

Bug:

Whenever a session has performed a query on a foreign table, any subsequent
query on a local table big enough to use the parallel query feature exits with
an error:

     ERROR:  invalid cache ID: 41
     CONTEXT:  parallel worker

Whenever a session has NOT performed a query on a foreign table, parallel
queries are executed all OK. No problem.

The problem is not related to what kind of foreign query was executed.

Reproducability is 100%!

Some debugging work:

I've seen this was reported some time ago here:

https://www.postgresql.org/message-id/29113.1488910563%40sss.pgh.pa.us

I've followed Tom Lanes advice and did - on the same machine - a debug enabled
build of the 9.6.6 source code with

./configure --enable-cassert --enable-debug CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer" --prefix=/home/centos/pgbuild

and changed the ERROR in PANIC in src/backend/utils/cache/syscache.c.

I then redid the pgxn install oracle_fdw with paths pointing to the debug
build.  The problem was easily reproduced. A count(*) on a bigish table with 4
workers produced 4 core dumps!

Below is the back trace from gdb from each of the cores.

Let me know if there's anything else I can do or check!

Thanks :)
Chris.



[centos@asia data]$ ls -l core*
-rw-------. 1 centos centos 152059904 Nov 19 17:43 core.30430
-rw-------. 1 centos centos 152059904 Nov 19 17:43 core.30431
-rw-------. 1 centos centos 152059904 Nov 19 17:43 core.30432
-rw-------. 1 centos centos 152059904 Nov 19 17:43 core.30433

[centos@asia data]$ gdb ../bin/postgres core.30430
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/centos/pgbuild/bin/postgres...done.
[New LWP 30430]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worke'.
Program terminated with signal 6, Aborted.
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 libaio-0.3.109-13.el7.x86_64
(gdb) bt
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
#1  0x00007f16a0f4e8e8 in abort () from /lib64/libc.so.6
#2  0x00000000008094b4 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x000000000080aea2 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x9d9965 "invalid cache ID: %d") at elog.c:1378
#4  0x00000000007ffd82 in SearchSysCacheList (cacheId=cacheId@entry=41, nkeys=nkeys@entry=2, key1=key1@entry=139734905138463, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1210
#5  0x00007f169161a59f in _PG_init () at oracle_fdw.c:709
#6  0x000000000080c476 in internal_load_library (libname=<optimized out>, libname@entry=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:276
#7  0x000000000080c708 in RestoreLibraryState (start_address=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:741
#8  0x00000000004e72cf in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1069
#9  0x000000000069b0bf in StartBackgroundWorker () at bgworker.c:742
#10 0x00000000006a70bd in do_start_bgworker (rw=rw@entry=0x1f3b8d0) at postmaster.c:5612
#11 0x00000000006a722a in maybe_start_bgworkers () at postmaster.c:5809
#12 0x00000000006a7cf3 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4990
#13 <signal handler called>
#14 0x00007f16a1007783 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000006a80e5 in ServerLoop () at postmaster.c:1683
#16 0x00000000006a93c0 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x1f13cc0) at postmaster.c:1327
#17 0x0000000000626905 in main (argc=1, argv=0x1f13cc0) at main.c:228
(gdb) quit

[centos@asia data]$ gdb ../bin/postgres core.30431
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/centos/pgbuild/bin/postgres...done.
[New LWP 30431]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worke'.
Program terminated with signal 6, Aborted.
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 libaio-0.3.109-13.el7.x86_64
(gdb) bt
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
#1  0x00007f16a0f4e8e8 in abort () from /lib64/libc.so.6
#2  0x00000000008094b4 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x000000000080aea2 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x9d9965 "invalid cache ID: %d") at elog.c:1378
#4  0x00000000007ffd82 in SearchSysCacheList (cacheId=cacheId@entry=41, nkeys=nkeys@entry=2, key1=key1@entry=139734905138463, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1210
#5  0x00007f169161a59f in _PG_init () at oracle_fdw.c:709
#6  0x000000000080c476 in internal_load_library (libname=<optimized out>, libname@entry=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:276
#7  0x000000000080c708 in RestoreLibraryState (start_address=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:741
#8  0x00000000004e72cf in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1069
#9  0x000000000069b0bf in StartBackgroundWorker () at bgworker.c:742
#10 0x00000000006a70bd in do_start_bgworker (rw=rw@entry=0x1f3b730) at postmaster.c:5612
#11 0x00000000006a722a in maybe_start_bgworkers () at postmaster.c:5809
#12 0x00000000006a7cf3 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4990
#13 <signal handler called>
#14 0x00007f16a1007783 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000006a80e5 in ServerLoop () at postmaster.c:1683
#16 0x00000000006a93c0 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x1f13cc0) at postmaster.c:1327
#17 0x0000000000626905 in main (argc=1, argv=0x1f13cc0) at main.c:228
(gdb) quit

[centos@asia data]$ gdb ../bin/postgres core.30432
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/centos/pgbuild/bin/postgres...done.
[New LWP 30432]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worke'.
Program terminated with signal 6, Aborted.
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 libaio-0.3.109-13.el7.x86_64
(gdb) bt
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
#1  0x00007f16a0f4e8e8 in abort () from /lib64/libc.so.6
#2  0x00000000008094b4 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x000000000080aea2 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x9d9965 "invalid cache ID: %d") at elog.c:1378
#4  0x00000000007ffd82 in SearchSysCacheList (cacheId=cacheId@entry=41, nkeys=nkeys@entry=2, key1=key1@entry=139734905138463, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1210
#5  0x00007f169161a59f in _PG_init () at oracle_fdw.c:709
#6  0x000000000080c476 in internal_load_library (libname=<optimized out>, libname@entry=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:276
#7  0x000000000080c708 in RestoreLibraryState (start_address=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:741
#8  0x00000000004e72cf in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1069
#9  0x000000000069b0bf in StartBackgroundWorker () at bgworker.c:742
#10 0x00000000006a70bd in do_start_bgworker (rw=rw@entry=0x1f3b590) at postmaster.c:5612
#11 0x00000000006a722a in maybe_start_bgworkers () at postmaster.c:5809
#12 0x00000000006a7cf3 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4990
#13 <signal handler called>
#14 0x00007f16a1007783 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000006a80e5 in ServerLoop () at postmaster.c:1683
#16 0x00000000006a93c0 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x1f13cc0) at postmaster.c:1327
#17 0x0000000000626905 in main (argc=1, argv=0x1f13cc0) at main.c:228
(gdb) quit

[centos@asia data]$ gdb ../bin/postgres core.30433
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/centos/pgbuild/bin/postgres...done.
[New LWP 30433]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worke'.
Program terminated with signal 6, Aborted.
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 libaio-0.3.109-13.el7.x86_64
(gdb) bt
#0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
#1  0x00007f16a0f4e8e8 in abort () from /lib64/libc.so.6
#2  0x00000000008094b4 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x000000000080aea2 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x9d9965 "invalid cache ID: %d") at elog.c:1378
#4  0x00000000007ffd82 in SearchSysCacheList (cacheId=cacheId@entry=41, nkeys=nkeys@entry=2, key1=key1@entry=139734905138463, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1210
#5  0x00007f169161a59f in _PG_init () at oracle_fdw.c:709
#6  0x000000000080c476 in internal_load_library (libname=<optimized out>, libname@entry=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:276
#7  0x000000000080c708 in RestoreLibraryState (start_address=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:741
#8  0x00000000004e72cf in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1069
#9  0x000000000069b0bf in StartBackgroundWorker () at bgworker.c:742
#10 0x00000000006a70bd in do_start_bgworker (rw=rw@entry=0x1f3b390) at postmaster.c:5612
#11 0x00000000006a722a in maybe_start_bgworkers () at postmaster.c:5809
#12 0x00000000006a7cf3 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4990
#13 <signal handler called>
#14 0x00007f16a1007783 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000006a80e5 in ServerLoop () at postmaster.c:1683
#16 0x00000000006a93c0 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x1f13cc0) at postmaster.c:1327
#17 0x0000000000626905 in main (argc=1, argv=0x1f13cc0) at main.c:228
(gdb) quit




--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: parallel query in 9.6.6 + oracle_fdw -> ERROR: invalid cache ID: 41

Tom Lane-2
Chris Mair <[hidden email]> writes:
> Whenever a session has performed a query on a foreign table, any subsequent
> query on a local table big enough to use the parallel query feature exits with
> an error:
>      ERROR:  invalid cache ID: 41
>      CONTEXT:  parallel worker

Hm, syscache 41 is PROCNAMEARGSNSP in 9.6, if I counted right.

> (gdb) bt
> #0  0x00007f16a0f4d1f7 in raise () from /lib64/libc.so.6
> #1  0x00007f16a0f4e8e8 in abort () from /lib64/libc.so.6
> #2  0x00000000008094b4 in errfinish (dummy=dummy@entry=0) at elog.c:557
> #3  0x000000000080aea2 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x9d9965 "invalid cache ID: %d") at elog.c:1378
> #4  0x00000000007ffd82 in SearchSysCacheList (cacheId=cacheId@entry=41, nkeys=nkeys@entry=2, key1=key1@entry=139734905138463, key2=<optimized out>, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1210
> #5  0x00007f169161a59f in _PG_init () at oracle_fdw.c:709
> #6  0x000000000080c476 in internal_load_library (libname=<optimized out>, libname@entry=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:276
> #7  0x000000000080c708 in RestoreLibraryState (start_address=0x7f16a1bfdde8 <Address 0x7f16a1bfdde8 out of bounds>) at dfmgr.c:741
> #8  0x00000000004e72cf in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1069

Apparently, oracle_fdw is trying to do a procedure lookup in its _PG_init
function.  This is a horrible idea: it assumes that _PG_init is invoked
inside a transaction, which is wrong if the library is preloaded, for
example.  (I'd bet that adding oracle_fdw to shared_preload_libraries
would fail badly, though perhaps not with this exact error message.)

So I'd call this an oracle_fdw bug.  It needs to postpone what it's
doing here to the first normal FDW function call in a session.

                        regards, tom lane


--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: parallel query in 9.6.6 + oracle_fdw -> ERROR: invalid cache ID: 41

Christian Mair
Hi,

> So I'd call this an oracle_fdw bug.  It needs to postpone what it's
> doing here to the first normal FDW function call in a session.

Thanks a lot for looking so quickly into this!

I've opened an issue with oracle_fdw:

https://github.com/laurenz/oracle_fdw/issues/215

Thanks,
Chris.



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general