BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

apt.postgresql.org Repository Update
The following bug has been logged on the website:

Bug reference:      16722
Logged by:          Sergey Koposov
Email address:      [hidden email]
PostgreSQL version: 11.9
Operating system:   debian
Description:        

Hi,

When ingesting a billion or so rows in the table that has some array columns
(see schema below),  at some point all the backends doing the ingestion hang
with 100% CPU usage. When investigating, I think I can trace this to the
limit of 2^32 toasted records per table. See the gdb full backtrace  of
hanged backends in the bottom. When the problem occurs, it has 272 mill
records.
Basically the hanging happens in GetNewOidWithIndex called by
toast_save_datum.
While I understand the limit for the toast number is there to stay, but the
behaviour of PG next to the limit is IMO a bug (or at least non-trivial) .

I would rather prefer to see an error-message as opposed to backends
hanging.  
I see a similar report BUG #14137, but  it didn't seem to receive any reply.

If it matters, the ingestion inside the table is done by 10 parallel
backends doing COPY in the table. (I don't know if it affects the way the
objids are allocated
for toast objects)

Thanks,
        Sergey Koposov

GDB output
*****************
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
No locals.
#1  0x00000000006b3f42 in PGSemaphoreLock (sema=0x7f607fe1cbb8)
    at pg_sema.c:316
        errStatus = <optimized out>
        __func__ = "PGSemaphoreLock"
#2  0x000000000071ea14 in LWLockAcquire (lock=0x7f689257bf20,
    mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1244
        mustwait = <optimized out>
        proc = 0x7f68adeb46b0
        result = true
        extraWaits = 0
        __func__ = "LWLockAcquire"
#3  0x0000000000700440 in StartBufferIO (buf=buf@entry=0x7f6083a2e640,
    forInput=forInput@entry=true) at bufmgr.c:3887
        buf_state = <optimized out>
#4  0x000000000070204d in BufferAlloc (foundPtr=0x7fff36ee18db,
strategy=0x0,
    blockNum=7605075, forkNum=MAIN_FORKNUM, relpersistence=<optimized out>,

    smgr=0x130f380) at bufmgr.c:1251
        newPartitionLock = 0x7f607fe24080
        oldTag = {rnode = {spcNode = 20206528, dbNode = 0, relNode =
8688143},
          forkNum = MAIN_FORKNUM, blockNum = 8469712}
        oldPartitionLock = <optimized out>
        buf = 0x7f6083a2e640
        oldHash = <optimized out>
        oldFlags = <optimized out>
        buf_id = <optimized out>
        newTag = {rnode = {spcNode = 1663, dbNode = 16818,
            relNode = 2229524624}, forkNum = MAIN_FORKNUM, blockNum =
7605075}
        newHash = 4182877260
        valid = false
        buf_state = 0
#5  ReadBuffer_common (smgr=0x130f380, relpersistence=<optimized out>,
    forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=7605075,
    mode=RBM_NORMAL, strategy=0x0, hit=0x7fff36ee198f) at bufmgr.c:745
        bufHdr = <optimized out>
        bufBlock = <optimized out>
        found = true
        isExtend = false
        isLocalBuf = <optimized out>
        __func__ = "ReadBuffer_common"
#6  0x00000000007023e1 in ReadBufferExtended (reln=0x7f607fd9bc08,
    forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=7605075,
    mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at
bufmgr.c:664
        hit = false
        buf = <optimized out>
        __func__ = "ReadBufferExtended"
#7  0x00000000007024eb in ReadBuffer (blockNum=<optimized out>,
    reln=<optimized out>) at bufmgr.c:596
No locals.
#8  ReleaseAndReadBuffer (buffer=buffer@entry=20373756,
    relation=<optimized out>, blockNum=<optimized out>) at bufmgr.c:1545
        bufHdr = <optimized out>
#9  0x00000000004c6db2 in index_fetch_heap (scan=0x0,
scan@entry=0x136e098)
    at indexam.c:595
        prev_buf = 20373756
        tid = 0x136e0fc
        all_dead = false
        got_heap_tuple = <optimized out>
#10 0x00000000004c6e00 in index_getnext (scan=0x136e098,
    direction=direction@entry=ForwardScanDirection) at indexam.c:691
        heapTuple = <optimized out>
        tid = <optimized out>
#11 0x00000000004c6299 in systable_getnext
(sysscan=sysscan@entry=0x136dec8)
    at genam.c:425
        htup = <optimized out>
        __func__ = "systable_getnext"
#12 0x000000000050cd09 in GetNewOidWithIndex (relation=0x7f607fd9bc08,
    indexId=310933857, oidcolumn=1) at catalog.c:371
        newOid = 2744075972
        scan = 0x136dec8
        key = {sk_flags = 0, sk_attno = 1, sk_strategy = 3, sk_subtype = 0,

          sk_collation = 100, sk_func = {fn_addr = 0x7b6320 <oideq>,
            fn_oid = 184, fn_nargs = 2, fn_strict = true, fn_retset = false,

            fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x13453c0,
            fn_expr = 0x0}, sk_argument = 2744075972}
        collides = <optimized out>
#13 0x00000000004c3063 in toast_save_datum (value=20557421, oldexternal=0x0,

    options=0, rel=<optimized out>, rel=<optimized out>) at
tuptoaster.c:1574
        toastrel = 0x7f607fd9bc08
        toastidxs = 0x136bba0
        toasttup = 0x7f607fd98540
        t_values = {300189817, 0, 140734114962192}
        t_isnull = {false, false, false}
        result = 0xfffffffffffffe00
        chunk_data = {hdr = {vl_len_ = "\340\000\000",
            vl_dat = 0x7fff36ee1b14 "\001"},
          data =
"\340\000\000\000\001\000\000\000\000\000\000\000\274\002\000\000\b\000\000\000\001\000\000\000̥\203@\225\002\365@z\210,A/<gA\a\233}AYf\210Aɀ\215AД\226A.U)?\320\325.?\242aA?~\311:?i\215\211?\334\020\203?\000\000\000\000\272
\354@\222\202ي\321\070\354@\356\274\001ԀM\354@\"\272o\016nf\354@Ӯ\327N\275z\354@\242\250\"K\365\223\354@\372k\252\305D\250\354@",
'\000' <repeats 1696 times>..., align_it = 224}
        chunk_size = 0
        num_indexes = 1
        validIndex = 0
#14 0x00000000004c45a2 in toast_insert_or_update (rel=0x7f607fd983a8,
    newtup=0x0, oldtup=0xb5, options=-1) at tuptoaster.c:867
        old_value = 20557421
        tupleDesc = 0x12beb88
        numAttrs = 181
        need_change = 69
        need_free = 109
        toast_action = "pp pppp ppppppxppp ", 'p' <repeats 51 times>,
"xxxxxxxxxx   ", 'x' <repeats 12 times>, 'p' <repeats 61 times>, "
ppppppxxppppppxxpppppppp", '\000' <repeats 1418 times>
        toast_isnull = {false <repeats 1600 times>}
        toast_oldisnull = {false <repeats 1600 times>}
        toast_values = {9011, 508863, 20557040, 2663, 1, 2048, 130,
20557062,
          4642831835879822414, 4629766747987641402, 1478912696, 1475380501,

          1151322629, 1152363744, 20557104, 1018397949, 4678179332087438334,

          4678191406000876033, 20557168, 1325489961962492288,
          18446744072652356428, 18446744072655684425, 1052577735,
1094195074,
          1088406363, 1106636035, 1157361664, 1100416689, 1139960709, 319,

          1100668600, 1108163953, 29, 1100071362, 1111486228, 33, 0, 0, 0,

          300, 297, 1051787629, 0, 2143289344, 2143289344, 1067470643,
          1052143608, 31, 1099325834, 1107053637, 1109951334, 1096537497,
          1083978235, 1077223218, 18446744072661571125, 1115025818,
          1101955695, 1117711694, 1081872323, 1066791225, 987449489,
          930850946, 1095769653, 1103322737, 1107459414, 1095769653,
          1103322737, 1107459414, 20374504, 20557421, 20557474, 20557527,
          20557580, 20557633, 20557686, 20557739, 20557792, 20557845,
          20557898, 20557951, 20373112, 20373032, 20573080, 20558163,
          20558204, 20558245, 20558286, 20558327, 20558368, 20558409,
          20558450, 20558491, 20558532, 20558573, 20558614, 1064299422,
          1064635873, 1064948113, 1065291023, 1065314998, 1065345062,
          1065350129, 1, 1, 4, 278, 278, 31, 16, 1074988052, 1076692416,
          1076469703, 1076226148, 1066023726, 1057373083, 1041813708,
          910775196, 897988541, 958014467, 1073709436, 1080820995,
1087871082,
          1105555510, 996563946, 990516632, 1007439877, 1065342546,
          1065343854, 1065326389, 0, 0, 0, 0, 0, 0, 0, 0, 1070595106,
          1067663254, 1068637012, 1132005523, 1120866882, 1124400175,
          1126997579, 1115152811, 1117702755, 1086250728, 1080808387,
          1082966116, 1086250960, 1080808777, 1082966483, 1089896366,
          1069203394, 987853215, 932092729, 20558872, 1123232006,
1124468680,
          20374216, 20374168, 20374120, 20374072, 20559216, 20559267,
          20374024, 20373976, 20373928, 20373464, 20373416, 20313024,
          20559936, 20559987, 0 <repeats 1427 times>}
        toast_oldvalues = {0 <repeats 1245 times>, 8599425, 0, 256,
          140734115001632, 0, 0, 0, 0, 8602692, 0, 19089440,
140734115001632,
          19356882, 140734115001634, 13444600, 3, 19356880, 13444600,
8796705,
          16967, 8796705, 16967, 0 <repeats 23 times>, 140734115003392,
          140734115003376, 10204964, 10204968, 0, 140087678209897, 0, 0,
          140734115002240, 0, 0, 0, 0, 0, 140734115002192,
          0 <repeats 11 times>, 32, 0, 0, 0, 0, 4294967295, 140087678216602,

          0, 0, 4294967296, 18446744073709551615, 5, 0, 140087678216602, 0,

          10204964, 0, 6, 10204968, 0, 0, 0, 0, 0, 0, 0, 0, 206158430232,
          140734115003968, 140734115003776, 0 <repeats 125 times>,
          140087698075212, 0, 1037797632, 7, 9, 140087700155240,
          140087698075212, 10204964, 2090588023, 1, 9, 140087700155240,
          140087698077630, 140734115003664, 140734115003456,
140087695826704,
          140087695833544, 140734115003728, 32665437, 140734115003712,
          4277816, 0, 140087700155384, 140087700264280, 4607254,
          140087695834744, 4274192, 4294967296, 4294967581, 1,
          140087700155384, 140734115003872, 140087700276488,
140734115003912,
          140087700275632, 1, 140087698078061, 0, 140087700155384, 1, 0, 1,

          140087700275632, 140734115003680, 140734115003664, 1037797632,
          4495133, 0, 140087700276488, 140734115003728, 140734115003712,
          2090588023, 4607254, 4294967295, 140087678425001, 140087695833544,

          140087700264280, 165776, 140087681730168, 131072, 140087678425001,

          18446744073709547520, 140087678400681, 0, 19492864,
140734115005392,
          19327088, 165776, 13428184, 0, 27, 27, 0, 19327104,
140087698096119,
          4294967297, 0, 18446744073709551504, 140087695833544, 28,
19327104,
          140734115004064, 140087698122005, 0, 0, 0, 0, 140734115004064,
          19328986, 140087678216602, 140087700275632, 108, 8796505...}
        toast_oldexternal = {0x0 <repeats 181 times>,
          0x833781 <hash_search_with_hash_value+193>, 0x3c, 0x100,
          0x7fff36eec5e0, 0x0, 0x0, 0x0, 0x0, 0x834444 <string_hash+36>,
0x7b,
          0x1234820, 0x7fff36eec5e0, 0x124eee3, 0x7fff36000000,
          0xcd25f8 <ConfigureNamesString+4408>, 0x0, 0x0, 0x6aa00000000,
          0x1100000009, 0xffffffff00540100, 0xffffffff1a5d09cb,
          0xffffffff9b26ada0, 0xffffffff9bd60520, 0xffffffff9ccf30a0,
          0xffffffff9da4c3a0, 0xffffffff9e9c9da0, 0xffffffff9f971aa0,
          0xffffffffa085ba20, 0xffffffffa176fca0, 0xffffffffa2659c20,
          0xffffffffa37bc8a0, 0xffffffffa44eb8a0, 0xffffffffa53ffb20,
          0xffffffffa6256020, 0xffffffffa727c620, 0xffffffffa82a2c20,
          0xffffffffa8ebf8a0, 0xffffffffaa00d3a0, 0xffffffffaad51520,
          0xffffffffabe9f020, 0xffffffffacc76c20, 0xffffffffadc9d220,
          0xffffffffaea74e20, 0xffffffffafa079a0, 0xffffffffb0873020,
          0xffffffffb192d0a0, 0xffffffffb2704ca0, 0xffffffffb372b2a0,
          0xffffffffb4502ea0, 0xffffffffb5495a20, 0xffffffffb63010a0,
          0xffffffffb73276a0, 0xffffffffb80ff2a0, 0xffffffffb91258a0,
          0xffffffffb9efd4a0, 0xffffffffbae90020, 0xffffffffbbd8f120,
          0xffffffffbcdb5720, 0xffffffffbdb8d320, 0xffffffffbeb1fea0,
          0xffffffffbf98b520, 0xffffffffc09b1b20, 0xffffffffc1789720,
          0xffffffffc27afd20, 0xffffffffc3587920, 0xffffffffc451a4a0,
          0xffffffffc5385b20, 0xffffffffc63ac120, 0xffffffffc758d6a0,
          0xffffffffc7da09a0, 0xffffffffca162690, 0xffffffffca975990,
          0xffffffffcbd11e90, 0xffffffffcc773b90, 0xffffffffcdb10090,
          0xffffffffce605810, 0xffffffffcf90e290, 0xffffffffd06e5e90,
          0xffffffffd1721610, 0xffffffffd1fb3210, 0xffffffffd269fe20,
          0xffffffffd36329a0, 0xffffffffd449e020, 0xffffffffd51e21a0,
          0xffffffffd542fd90, 0xffffffffd5dfe010, 0xffffffffd64eac20,
          0xffffffffd6fe03a0, 0xffffffffd82e8e20, 0xffffffffd8f99520,
          0xffffffffda0e7020, 0xffffffffdaebec20, 0xffffffffdbe517a0,
          0xffffffffdccbce20, 0xffffffffddc4f9a0, 0xffffffffdeb4eaa0,
          0xffffffffdfae1620, 0xffffffffe094cca0, 0xffffffffe17248a0,
          0xffffffffe26b7420, 0xffffffffe3522aa0, 0xffffffffe45490a0,
          0xffffffffe5320ca0, 0xffffffffe63dad20, 0xffffffffe71b2920,
          0xffffffffe81454a0, 0xffffffffe8fb0b20, 0xffffffffe9fd7120,
          0xffffffffeadaed20, 0xffffffffebdd5320, 0xffffffffecbacf20,
          0xffffffffedb3faa0, 0xffffffffee9ab120, 0xffffffffef8167a0,
          0xfffffffff09f7d20, 0xfffffffff16149a0, 0xfffffffff27f5f20,
          0xfffffffff34a6620, 0xfffffffff45f4120, 0xfffffffff5210da0,
          0xfffffffff63f2320, 0xfffffffff700efa0, 0xfffffffff81f0520,
          0xfffffffff8e0d1a0, 0xfffffffff9fee720, 0xfffffffffac0b3a0,
          0xfffffffffbe803a0, 0xfffffffffc7baba0, 0xfffffffffdc7bb70,
          0x370c620, 0x4295820, 0x550a820, 0x6093a20, 0x7308a20, 0x7e91c20,

          0x9106c20, 0x9c8fe20, 0xaf04e20, 0xbb21aa0, 0xcd03020, 0xd91fca0,

          0xeb01220, 0xf71dea0, 0x10992ea0, 0x1151c0a0, 0x127910a0,
          0x1331a2a0, 0x1458f2a0, 0x1523eb90, 0x1638c690, 0x1703cd90,
          0x1818a890, 0x18e3af90, 0x19f88a90, 0x1ac39190, 0x1be1a710,
          0x1cacae10, 0x1dc18910, 0x1e8c9010, 0x1fa16b10, 0x206c7210,
          0x21814d10, 0x224c5410, 0x23612f10, 0x242c3610, 0x254a4b90,
          0x260c1810, 0x272a2d90, 0x27f53490, 0x290a0f90, 0x29d51690,
          0x2ae9f190, 0x2bb4f890, 0x2cc9d390, 0x2d94da90, 0x2ea9b590,
          0x2f74bc90, 0x30899790, 0x315dd910, 0x3272b410, 0x333dbb10,
          0x34529610, 0x351d9d10, 0x36327810, 0x36fd7f10, 0x381b9490,
          0x38dd6110, 0x39fb7690, 0x3abd4310, 0x3bdb5890...}
        toast_sizes = {0, 0, 9, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 41, 0, 0,
0,
          3, 0 <repeats 49 times>, 53 <repeats 12 times>, 27, 27, 27,
          41 <repeats 12 times>, 0 <repeats 61 times>, 9, 0, 0, 81, 81, 81,

          81, 51, 51, 81, 81, 81, 81, 144, 144, 51, 51, 0 <repeats 1427
times>}
        toast_free = {false <repeats 68 times>, true,
          false <repeats 11 times>, true, true, true,
          false <repeats 76 times>, true, true, true, true, false, false,
          true, true, true, true, true, true, false <repeats 1429 times>}
        toast_delold = {false <repeats 1600 times>}
#15 0x00000000004b8a87 in heap_multi_insert (relation=0x7f607fe1cbb8,
    tuples=0x0, tuples@entry=0x1341428, ntuples=0, ntuples@entry=22, cid=0,

    options=0, bistate=0x7fffffff, bistate@entry=0x12c9710) at
heapam.c:2730
        xid = 1708296130
        heaptuples = 0x136dc08
        scratch = {
          data =
"\002/\004\000\311\a\265\000\006\b\030\000\063#\000\000\277\303\a\000\023\062\064\065\061p325\000\000\000b\n\000\000\001\000\000\000\000\000\tPSF\000\000\000\000\000\000g\022\240\276\361\246n@\342\273\327E^1@@\372\036\231N\a\"\256N7A\240DF\355dC\226\000\000\000(\000\000\000\004\001\000\003\001\274\002\000\000\005\b\000\000\000\001\020L\264CB0V\313>B\004\033\002\001\000\000\000J\n\254<\000\000\000\000\230a\311\357p;\354@\001\252\367\061LF\354@\003",
'\000' <repeats 119 times>..., force_align_d = 2.9948299028864589e-305,
          force_align_i64 = 50955530654723842}
        page = 0x1 <error: Cannot access memory at address 0x1>
#16 0x0000000000586edf in CopyFromInsertBatch
(cstate=cstate@entry=0x12c92d0,
    estate=estate@entry=0x12d0e80, mycid=mycid@entry=0,
    hi_options=hi_options@entry=0,
    resultRelInfo=resultRelInfo@entry=0x128fde8,
    myslot=myslot@entry=0x12d2db8, bistate=0x12c9710, nBufferedTuples=22,
    bufferedTuples=0x1341428, firstBufferedLineNo=2663) at copy.c:2973
        oldcontext = 0x128c110
        i = <optimized out>
        save_cur_lineno = 2684
#17 0x000000000058c77e in CopyFrom (cstate=cstate@entry=0x12c92d0)
    at copy.c:2815
        slot = 0x12d2db8
        skip_tuple = false
        loaded_oid = 0
        tuple = 0x136cbf8
        tupDesc = 0x12beb88
        values = 0x12d34b8
        nulls = 0x1329c78
        resultRelInfo = <optimized out>
        saved_resultRelInfo = 0x0
        estate = 0x12d0e80
        mtstate = 0x12c9500
        econtext = 0x12d1090
        myslot = 0x12d2db8
        oldcontext = 0x128c110
        errcallback = {previous = 0x0,
          callback = 0x586d00 <CopyFromErrorCallback>, arg = 0x12c92d0}
        mycid = 0
        hi_options = 0
        bistate = 0x12c9710
        processed = 2683
        useHeapMultiInsert = true
        nBufferedTuples = <optimized out>
        prev_leaf_part_index = -1
        bufferedTuples = 0x1341428
        bufferedTuplesSize = 67408
        firstBufferedLineNo = 2663
        __func__ = "CopyFrom"
#18 0x000000000058cb53 in DoCopy (pstate=0x128c220, stmt=<optimized out>,
    stmt_location=0, stmt_len=2354, processed=0x7fff36ef14b8) at
copy.c:996
        cstate = 0x12c92d0
        is_from = true
        pipe = <optimized out>
        rel = 0x7f607fd983a8
        relid = 310933852
        query = 0x0
        __func__ = "DoCopy"
#19 0x0000000000731258 in standard_ProcessUtility (pstmt=0x1234560,
    queryString=0x1233030 "COPY decals9
(release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"...,

    context=PROCESS_UTILITY_TOPLEVEL, params=0xffffffffffffffff,
queryEnv=0x0,
    dest=0x7fffffff, completionTag=0x7fff36ef1740 "") at utility.c:551
        processed = 19943856
        pstate = 0x128c220
        __func__ = "standard_ProcessUtility"
#20 0x000000000072e5ba in PortalRunUtility (portal=0x12f4150,
pstmt=0x1234560,
    isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>,
    dest=0x1234670, completionTag=<optimized out>) at pquery.c:1178
        utilityStmt = <optimized out>
        snapshot = <optimized out>
#21 0x000000000072f027 in PortalRunMulti (portal=portal@entry=0x12f4150,
    isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x1234670,

    altdest=altdest@entry=0x1234670,
    completionTag=completionTag@entry=0x7fff36ef1740 "") at pquery.c:1331
        pstmt = 0x1234560
        active_snapshot_set = false
        stmtlist_item = 0x12bc400
#22 0x000000000072fbd7 in PortalRun (portal=0x12f4150,
    count=9223372036854775807, isTopLevel=<optimized out>,
    run_once=<optimized out>, dest=0x1234670, altdest=0x1234670,
    completionTag=0x7fff36ef1740 "") at pquery.c:799
        save_exception_stack = 0x7fff36ef19b0
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {1, -6064393298424732441, 19645008,

              140734115026752, 19874128, 2, 6064270506639647975,
              -6064395009656331033}, __mask_was_saved = 0, __saved_mask =
{
              __val = {0, 140734115026655, 0, 9981816, 8602692, 1, 19882336,

                10328646, 19874128, 10328646, 1, 1, 112, 19644960, 19080960,

                9895294}}}}
        result = <optimized out>
        nprocessed = <optimized out>
        saveTopTransactionResourceOwner = 0x1285838
        saveTopTransactionContext = 0x13050a0
        saveActivePortal = 0x0
        saveResourceOwner = 0x1285838
        savePortalContext = 0x0
        saveMemoryContext = 0x13050a0
        __func__ = "PortalRun"
#23 0x000000000072be17 in exec_simple_query (
    query_string=0x7f607fe1cbb8 "\001") at postgres.c:1145
        parsetree = 0x12bc220
        portal = 0x12f4150
        commandTag = 0x7fff36ef1740 ""
        completionTag =
"\000\335'\001\000\000\000\000\b\375'\001\002\000\000\000\000\000\000\000\002\000\000\000Q\000\000\000\000\000\000\000\060\060#\001\000\000\000\000\b\375'\001\000\000\000\000\b\375'\001",
'\000' <repeats 11 times>
        plantree_list = 0x1234670
        receiver = 0x1234670
        format = 0
        dest = DestRemote
        parsetree_item = 0x12bc250
        msec_str =
"\000\335'\001\000\000\000\000\b\375'\001\002\000\000\000\000\000\000\000\002\000\000\000Q\000\000\000\000\000\000"
        __func__ = "exec_simple_query"
#24 0x000000000072dade in PostgresMain (argc=1, argv=0x1233030,
    dbname=0x127fd20 "wsdb", username=0x127fd08 "koposov") at
postgres.c:4193
        query_string = 0x1233030 "COPY decals9
(release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"...
        firstchar = 19083312
        input_message = {
          data = 0x1233030 "COPY decals9
(release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"...,

          len = 2356, maxlen = 4096, cursor = 2356}
        local_sigjmp_buf = {{__jmpbuf = {2523489864, -6064395710111914777,

              19397896, 19397896, 0, 19382096, 6064270506719339751,
              -6064395007743597337}, __mask_was_saved = 1, __saved_mask =
{
              __val = {0, 19397920, 140734115028992, 47, 19080960, 1024,
                19398080, 19397896, 8711789, 0, 140734115027568,
                140734115027664, 6479486, 13500160, 8610716, 19086720}}}}
        send_ready_for_query = false
        __func__ = "PostgresMain"
#25 0x000000000047e812 in BackendRun (port=0x128b780) at postmaster.c:4356
        ac = 1
        secs = 658859760
        usecs = 545108
        i = 1
        av = 0x127fdc0
        maxac = <optimized out>
#26 BackendStartup (port=0x128b780) at postmaster.c:4028
        bn = 0x127bf50
        pid = 0
#27 ServerLoop () at postmaster.c:1707
        rmask = {fds_bits = {128, 0 <repeats 15 times>}}
        selres = <optimized out>
        now = <optimized out>
        readmask = {fds_bits = {248, 0 <repeats 15 times>}}
        last_lockfile_recheck_time = <optimized out>
        last_touch_time = <optimized out>
        __func__ = "ServerLoop"
#28 0x00000000006c39a3 in PostmasterMain (argc=argc@entry=3,
    argv=argv@entry=0x122d3c0) at postmaster.c:1380
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = true
        i = <optimized out>
        output_config_variable = <optimized out>
        __func__ = "PostmasterMain"
#29 0x000000000047f748 in main (argc=3, argv=0x122d3c0) at main.c:228
No locals.
quit
A debugging session is active.

        Inferior 1 [process 115811] will be detached.

Quit anyway? (y or n) Detaching from program: /opt/pgsql11/bin/postgres,
process 115811
***************************************
SCHEMA:
************************************

                              Table "koposov.decals9"
              Column               |        Type        | Collation |
Nullable | Default
-----------------------------------+--------------------+-----------+----------+---------
 release                           | smallint           |           |      
  |
 brickid                           | integer            |           |      
  |
 brickname                         | text               |           |      
  |
 objid                             | integer            |           |      
  |
 brick_primary                     | smallint           |           |      
  |
 maskbits                          | smallint           |           |      
  |
 fitbits                           | smallint           |           |      
  |
 type                              | text               |           |      
  |
 ra                                | double precision   |           |      
  |
 dec                               | double precision   |           |      
  |
 ra_ivar                           | real               |           |      
  |
 dec_ivar                          | real               |           |      
  |
 bx                                | real               |           |      
  |
 by                                | real               |           |      
  |
 dchisq                            | real[]             |           |      
  |
 ebv                               | real               |           |      
  |
 mjd_min                           | double precision   |           |      
  |
 mjd_max                           | double precision   |           |      
  |
 ref_cat                           | text               |           |      
  |
 ref_id                            | bigint             |           |      
  |
 pmra                              | real               |           |      
  |
 pmdec                             | real               |           |      
  |
 parallax                          | real               |           |      
  |
 pmra_ivar                         | real               |           |      
  |
 pmdec_ivar                        | real               |           |      
  |
 parallax_ivar                     | real               |           |      
  |
 ref_epoch                         | real               |           |      
  |
 gaia_phot_g_mean_mag              | real               |           |      
  |
 gaia_phot_g_mean_flux_over_error  | real               |           |      
  |
 gaia_phot_g_n_obs                 | smallint           |           |      
  |
 gaia_phot_bp_mean_mag             | real               |           |      
  |
 gaia_phot_bp_mean_flux_over_error | real               |           |      
  |
 gaia_phot_bp_n_obs                | smallint           |           |      
  |
 gaia_phot_rp_mean_mag             | real               |           |      
  |
 gaia_phot_rp_mean_flux_over_error | real               |           |      
  |
 gaia_phot_rp_n_obs                | smallint           |           |      
  |
 gaia_phot_variable_flag           | smallint           |           |      
  |
 gaia_astrometric_excess_noise     | real               |           |      
  |
 gaia_astrometric_excess_noise_sig | real               |           |      
  |
 gaia_astrometric_n_obs_al         | smallint           |           |      
  |
 gaia_astrometric_n_good_obs_al    | smallint           |           |      
  |
 gaia_astrometric_weight_al        | real               |           |      
  |
 gaia_duplicated_source            | smallint           |           |      
  |
 gaia_a_g_val                      | real               |           |      
  |
 gaia_e_bp_min_rp_val              | real               |           |      
  |
 gaia_phot_bp_rp_excess_factor     | real               |           |      
  |
 gaia_astrometric_sigma5d_max      | real               |           |      
  |
 gaia_astrometric_params_solved    | smallint           |           |      
  |
 flux_g                            | real               |           |      
  |
 flux_r                            | real               |           |      
  |
 flux_z                            | real               |           |      
  |
 flux_w1                           | real               |           |      
  |
 flux_w2                           | real               |           |      
  |
 flux_w3                           | real               |           |      
  |
 flux_w4                           | real               |           |      
  |
 flux_ivar_g                       | real               |           |      
  |
 flux_ivar_r                       | real               |           |      
  |
 flux_ivar_z                       | real               |           |      
  |
 flux_ivar_w1                      | real               |           |      
  |
 flux_ivar_w2                      | real               |           |      
  |
 flux_ivar_w3                      | real               |           |      
  |
 flux_ivar_w4                      | real               |           |      
  |
 fiberflux_g                       | real               |           |      
  |
 fiberflux_r                       | real               |           |      
  |
 fiberflux_z                       | real               |           |      
  |
 fibertotflux_g                    | real               |           |      
  |
 fibertotflux_r                    | real               |           |      
  |
 fibertotflux_z                    | real               |           |      
  |
 apflux_g                          | real[]             |           |      
  |
 apflux_r                          | real[]             |           |      
  |
 apflux_z                          | real[]             |           |      
  |
 apflux_resid_g                    | real[]             |           |      
  |
 apflux_resid_r                    | real[]             |           |      
  |
 apflux_resid_z                    | real[]             |           |      
  |
 apflux_blobresid_g                | real[]             |           |      
  |
 apflux_blobresid_r                | real[]             |           |      
  |
 apflux_blobresid_z                | real[]             |           |      
  |
 apflux_ivar_g                     | real[]             |           |      
  |
 apflux_ivar_r                     | real[]             |           |      
  |
 apflux_ivar_z                     | real[]             |           |      
  |
 apflux_masked_g                   | real[]             |           |      
  |
 apflux_masked_r                   | real[]             |           |      
  |
 apflux_masked_z                   | real[]             |           |      
  |
 apflux_w1                         | real[]             |           |      
  |
 apflux_w2                         | real[]             |           |      
  |
 apflux_w3                         | real[]             |           |      
  |
 apflux_w4                         | real[]             |           |      
  |
 apflux_resid_w1                   | real[]             |           |      
  |
 apflux_resid_w2                   | real[]             |           |      
  |
 apflux_resid_w3                   | real[]             |           |      
  |
 apflux_resid_w4                   | real[]             |           |      
  |
 apflux_ivar_w1                    | real[]             |           |      
  |
 apflux_ivar_w2                    | real[]             |           |      
  |
 apflux_ivar_w3                    | real[]             |           |      
  |
 apflux_ivar_w4                    | real[]             |           |      
  |
 mw_transmission_g                 | real               |           |      
  |
 mw_transmission_r                 | real               |           |      
  |
 mw_transmission_z                 | real               |           |      
  |
 mw_transmission_w1                | real               |           |      
  |
 mw_transmission_w2                | real               |           |      
  |
 mw_transmission_w3                | real               |           |      
  |
 mw_transmission_w4                | real               |           |      
  |
 nobs_g                            | smallint           |           |      
  |
 nobs_r                            | smallint           |           |      
  |
 nobs_z                            | smallint           |           |      
  |
 nobs_w1                           | smallint           |           |      
  |
 nobs_w2                           | smallint           |           |      
  |
 nobs_w3                           | smallint           |           |      
  |
 nobs_w4                           | smallint           |           |      
  |
 rchisq_g                          | real               |           |      
  |
 rchisq_r                          | real               |           |      
  |
 rchisq_z                          | real               |           |      
  |
 rchisq_w1                         | real               |           |      
  |
 rchisq_w2                         | real               |           |      
  |
 rchisq_w3                         | real               |           |      
  |
 rchisq_w4                         | real               |           |      
  |
 fracflux_g                        | real               |           |      
  |
 fracflux_r                        | real               |           |      
  |
 fracflux_z                        | real               |           |      
  |
 fracflux_w1                       | real               |           |      
  |
 fracflux_w2                       | real               |           |      
  |
 fracflux_w3                       | real               |           |      
  |
 fracflux_w4                       | real               |           |      
  |
 fracmasked_g                      | real               |           |      
  |
 fracmasked_r                      | real               |           |      
  |
 fracmasked_z                      | real               |           |      
  |
 fracin_g                          | real               |           |      
  |
 fracin_r                          | real               |           |      
  |
 fracin_z                          | real               |           |      
  |
 anymask_g                         | smallint           |           |      
  |
 anymask_r                         | smallint           |           |      
  |
 anymask_z                         | smallint           |           |      
  |
 allmask_g                         | smallint           |           |      
  |
 allmask_r                         | smallint           |           |      
  |
 allmask_z                         | smallint           |           |      
  |
 wisemask_w1                       | smallint           |           |      
  |
 wisemask_w2                       | smallint           |           |      
  |
 psfsize_g                         | real               |           |      
  |
 psfsize_r                         | real               |           |      
  |
 psfsize_z                         | real               |           |      
  |
 psfdepth_g                        | real               |           |      
  |
 psfdepth_r                        | real               |           |      
  |
 psfdepth_z                        | real               |           |      
  |
 galdepth_g                        | real               |           |      
  |
 galdepth_r                        | real               |           |      
  |
 galdepth_z                        | real               |           |      
  |
 nea_g                             | real               |           |      
  |
 nea_r                             | real               |           |      
  |
 nea_z                             | real               |           |      
  |
 blob_nea_g                        | real               |           |      
  |
 blob_nea_r                        | real               |           |      
  |
 blob_nea_z                        | real               |           |      
  |
 psfdepth_w1                       | real               |           |      
  |
 psfdepth_w2                       | real               |           |      
  |
 psfdepth_w3                       | real               |           |      
  |
 psfdepth_w4                       | real               |           |      
  |
 wise_coadd_id                     | text               |           |      
  |
 wise_x                            | real               |           |      
  |
 wise_y                            | real               |           |      
  |
 lc_flux_w1                        | real[]             |           |      
  |
 lc_flux_w2                        | real[]             |           |      
  |
 lc_flux_ivar_w1                   | real[]             |           |      
  |
 lc_flux_ivar_w2                   | real[]             |           |      
  |
 lc_nobs_w1                        | smallint[]         |           |      
  |
 lc_nobs_w2                        | smallint[]         |           |      
  |
 lc_fracflux_w1                    | real[]             |           |      
  |
 lc_fracflux_w2                    | real[]             |           |      
  |
 lc_rchisq_w1                      | real[]             |           |      
  |
 lc_rchisq_w2                      | real[]             |           |      
  |
 lc_mjd_w1                         | double precision[] |           |      
  |
 lc_mjd_w2                         | double precision[] |           |      
  |
 lc_epoch_index_w1                 | smallint[]         |           |      
  |
 lc_epoch_index_w2                 | smallint[]         |           |      
  |
 sersic                            | real               |           |      
  |
 sersic_ivar                       | real               |           |      
  |
 shape_r                           | real               |           |      
  |
 shape_r_ivar                      | real               |           |      
  |
 shape_e1                          | real               |           |      
  |
 shape_e1_ivar                     | real               |           |      
  |
 shape_e2                          | real               |           |      
  |
 shape_e2_ivar                     | real               |           |      
  |

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Fujii Masao-4


On 2020/11/17 3:15, PG Bug reporting form wrote:

> The following bug has been logged on the website:
>
> Bug reference:      16722
> Logged by:          Sergey Koposov
> Email address:      [hidden email]
> PostgreSQL version: 11.9
> Operating system:   debian
> Description:
>
> Hi,
>
> When ingesting a billion or so rows in the table that has some array columns
> (see schema below),  at some point all the backends doing the ingestion hang
> with 100% CPU usage. When investigating, I think I can trace this to the
> limit of 2^32 toasted records per table.

Yes, this is the limitation of the number of out-of-line values in toast.
https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit


> See the gdb full backtrace  of
> hanged backends in the bottom. When the problem occurs, it has 272 mill
> records.
> Basically the hanging happens in GetNewOidWithIndex called by
> toast_save_datum.
> While I understand the limit for the toast number is there to stay, but the
> behaviour of PG next to the limit is IMO a bug (or at least non-trivial) .
>
> I would rather prefer to see an error-message as opposed to backends
> hanging.

To emit an error, we need to check that there is no unused OID for
the toast and it would take very long to do that. So I'm not sure
if to emit an error message really improves the current situation or not.

OTOH it might be good idea to emit a warning message (every time we
cannot find unused OID in recent 1 million OIDs, for example) or report
the issue as wait event, or something while the record insertion is hanging
because of toast limit, so that we can easily detect the issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Magnus Hagander-2
On Wed, Nov 18, 2020 at 2:18 AM Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/11/17 3:15, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      16722
> > Logged by:          Sergey Koposov
> > Email address:      [hidden email]
> > PostgreSQL version: 11.9
> > Operating system:   debian
> > Description:
> >
> > Hi,
> >
> > When ingesting a billion or so rows in the table that has some array columns
> > (see schema below),  at some point all the backends doing the ingestion hang
> > with 100% CPU usage. When investigating, I think I can trace this to the
> > limit of 2^32 toasted records per table.
>
> Yes, this is the limitation of the number of out-of-line values in toast.
> https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit
>
>
> > See the gdb full backtrace  of
> > hanged backends in the bottom. When the problem occurs, it has 272 mill
> > records.
> > Basically the hanging happens in GetNewOidWithIndex called by
> > toast_save_datum.
> > While I understand the limit for the toast number is there to stay, but the
> > behaviour of PG next to the limit is IMO a bug (or at least non-trivial) .
> >
> > I would rather prefer to see an error-message as opposed to backends
> > hanging.
>
> To emit an error, we need to check that there is no unused OID for
> the toast and it would take very long to do that. So I'm not sure
> if to emit an error message really improves the current situation or not.
>
> OTOH it might be good idea to emit a warning message (every time we
> cannot find unused OID in recent 1 million OIDs, for example) or report
> the issue as wait event, or something while the record insertion is hanging
> because of toast limit, so that we can easily detect the issue.

This definitely sounds like something that's worth putting out as a
wait event. Even before you start traversing millions of OIDs it might
gradually start to show up, and being able to monitor that would
definitely be useful.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Andres Freund
Hi,

On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote:
> This definitely sounds like something that's worth putting out as a
> wait event. Even before you start traversing millions of OIDs it might
> gradually start to show up, and being able to monitor that would
> definitely be useful.

I don't think this is likely to work well as a wait event. All the index
traversals etc will do IO, acquire locks, etc, which will overwrite the
wait event and reset it to nothing once done.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Magnus Hagander-2
On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote:
> > This definitely sounds like something that's worth putting out as a
> > wait event. Even before you start traversing millions of OIDs it might
> > gradually start to show up, and being able to monitor that would
> > definitely be useful.
>
> I don't think this is likely to work well as a wait event. All the index
> traversals etc will do IO, acquire locks, etc, which will overwrite the
> wait event and reset it to nothing once done.

Oh meh. Yeah, I didn't think about the actual implementation of doing it :/

That said, I'd really prefer to see *some* kind of counter that could
get people noticing this a bit earlier, rather than just bugging them
in the logfile once it's gone over a threshold.  A statistics counter
maybe, but I guess that'd have to be tracked at a per-table level to
be really useful, which would make it potentially fairly expensive to
keep around...

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Andres Freund
Hi,

On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote:
> That said, I'd really prefer to see *some* kind of counter that could
> get people noticing this a bit earlier, rather than just bugging them
> in the logfile once it's gone over a threshold.  A statistics counter
> maybe, but I guess that'd have to be tracked at a per-table level to
> be really useful, which would make it potentially fairly expensive to
> keep around...

I'd rather spend effort to solve the underlying issue. It's a bit more
work, but not crazily so, I think. Even the pg_upgrade pieces aren't
that hard to solve.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote:
>> That said, I'd really prefer to see *some* kind of counter that could
>> get people noticing this a bit earlier, rather than just bugging them
>> in the logfile once it's gone over a threshold.  A statistics counter
>> maybe, but I guess that'd have to be tracked at a per-table level to
>> be really useful, which would make it potentially fairly expensive to
>> keep around...

> I'd rather spend effort to solve the underlying issue. It's a bit more
> work, but not crazily so, I think. Even the pg_upgrade pieces aren't
> that hard to solve.

Yeah.  If we're going to put work into this, widening the IDs used
to identify toast values seems like the right work to be doing.

Having said that, I had imagined that we might never have to fix it,
because if your table is big enough that it has a problem of this
ilk then likely you want to partition it anyway.  And partitioning
solves the problem since each partition has its own toast table.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Sergey KOPOSOV-3
On Wed, 2020-11-18 at 19:02 -0500, Tom Lane wrote:

> This email was sent to you by someone outside the University.
> You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
>
> Andres Freund <[hidden email]> writes:
> > On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote:
> > > That said, I'd really prefer to see *some* kind of counter that could
> > > get people noticing this a bit earlier, rather than just bugging them
> > > in the logfile once it's gone over a threshold.  A statistics counter
> > > maybe, but I guess that'd have to be tracked at a per-table level to
> > > be really useful, which would make it potentially fairly expensive to
> > > keep around...
> > I'd rather spend effort to solve the underlying issue. It's a bit more
> > work, but not crazily so, I think. Even the pg_upgrade pieces aren't
> > that hard to solve.
>
> Yeah.  If we're going to put work into this, widening the IDs used
> to identify toast values seems like the right work to be doing.
>
> Having said that, I had imagined that we might never have to fix it,
> because if your table is big enough that it has a problem of this
> ilk then likely you want to partition it anyway.  And partitioning
> solves the problem since each partition has its own toast table.

I can say that for my case of very large static clustered tables the
partitioning doesn't really offer much benefits, only overhead (To the best
of my understanding at least), so I'm not using it.

      Sergey

PS And the current workaround that I used for this particular dataset that I reported
was to just expand a few arrays that I have in the table into columns
 (as all arrays had static lengths) to not have any toasts at all.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-11-18 19:02:34 -0500, Tom Lane wrote:
> Having said that, I had imagined that we might never have to fix it,
> because if your table is big enough that it has a problem of this
> ilk then likely you want to partition it anyway.  And partitioning
> solves the problem since each partition has its own toast table.

A few billion rows isn't that much anymore, and partitioning has a fair
number of restrictions (plus this only needs 4 billion toasted fields,
not 4 billion rows). More importantly, to hit problems around this, one
doesn't even have to have all those rows in a table at once -
performance will suffer once the oid counter has wrapped around,
especially if there's longer sequences of assigned values.


> Yeah.  If we're going to put work into this, widening the IDs used
> to identify toast values seems like the right work to be doing.

To outline, here's what I think the two major pieces to get there are:

1) Make toast oid assignment independent of the oid counter. The easiest
   way likely is to also create a serial and use that. That alone
   improves the situation considerably, because it takes much longer to
   to wrap around in each toast table.

   The overhead of the additional WAL records isn't nothing, but
   compared to maintaining a btree it's not likely to be measurable.

2) Widen non pg-upgraded toast tables to have 64bit chunk_id field. To
   reference chunks >= 2^32, add VARTAG_ONDISK64, which is only used
   when needed.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2020-11-18 19:02:34 -0500, Tom Lane wrote:
>> Yeah.  If we're going to put work into this, widening the IDs used
>> to identify toast values seems like the right work to be doing.

> To outline, here's what I think the two major pieces to get there are:

> 1) Make toast oid assignment independent of the oid counter. The easiest
>    way likely is to also create a serial and use that. That alone
>    improves the situation considerably, because it takes much longer to
>    to wrap around in each toast table.

I'm not really on board with creating an actual sequence object for each
toast table.  I don't think they are efficient enough.

I had imagined internally widening the OID counter to 64 bits, of which
ordinary OID usage would just discard the high 32 bits, but for TOAST
chunk IDs we could use the whole value.  Now, compared to your idea
that would result in using the wider toast-pointer format more often,
but I think there are ways around that.  For example, we could do

(1) Grab the next 64-bit OID.
(2) Probe the toast table to see if the low 32 bits correspond to
a value in use in that toast table.  If not, just use the low 32 bits.
(3) Otherwise, use the whole 64 bits and the wider pointer format.
(We could do another uniqueness probe here, but I think it's unnecessary,
since the 64-bit counter will never wrap around.)

Since the existing OID assignment mechanism requires a uniqueness
probe already, this isn't any more expensive than what we do now.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

Andres Freund
Hi,

On 2020-11-18 20:17:55 -0500, Tom Lane wrote:
> I'm not really on board with creating an actual sequence object for each
> toast table.  I don't think they are efficient enough.

I'm not convinced that's true. Sure, nextval() could stand to be cheaper
- but on the other hand, this would allow us to entirely avoid doing
ahead-of-time uniqueness probes. Needing one extra WAL record every
SEQ_LOG_VALS/32 toast insertions seems likely to be win over doing
at least SEQ_LOG_VALS-1 additional uniqueness index probes.

It's not like the oid counter is free of contention - one benefit of
what I suggest would be that the potential contention point would move
from system wide to table wide.

I think reducing the rate of oid counter advancement is a significant
benefit, moving the point of where we have to care about efficiency
problems around system oid assignment costs quite a bit further into the
future.

I think there's also some small potential benefits around indexing in
having densely allocated toast chunk ids for each toast table, over the
sparser allocation we have right now.

Greetings,

Andres Freund