代码之家  ›  专栏  ›  技术社区  ›  dvlcis

Ignite无法消耗wal日志并在持久化时释放操作系统缓冲区

  •  0
  • dvlcis  · 技术社区  · 7 年前

    Ignite无法消耗wal日志并在持久化时释放操作系统缓冲区

    我有一个128G内存的点火服务器,并启用了持久性以保证我的数据安全。

    根据官方文件,我的理解是: 当启用Persitent时,Ignite将首先将数据更改保存到OS缓冲区(我检查了这个 作为linux命令free-mh中的buff/cache, 然后写入wal日志,并通过检查点进程定期分析wal 记录并释放解析的wal日志磁盘空间,释放使用过的操作系统缓冲区,如果我错了,请纠正我。

    但是在我的测试中,当Ignite开始处理流量时,我发现OS缓冲区快速增加 检查wal日志目录,有很多wal日志按顺序生成, 几乎与buff/cache的大小相同。

    [root@Redis1 apache-ignite]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        109G        995M        1.7G        109G
    Swap:          127G          0B        127G
          127G
    

    仅几分钟,空闲列就迅速减少,而buff/cache却在增加

    [root@Redis1 apache-ignite]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         15G         85G        995M         25G        108G
    Swap:          127G          0B        127G
    

    wal日志大小和分段数也在不断增加,与buff/cache的大小几乎相同。

    我每3分钟检查一次点火日志、检查点过程审核:

    [05:30:05,818][INFO][db-checkpoint-thread-#107][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=9428aebc-f2b0-4d33-bed6-fb9a1ad49848, startPtr=FileWALPointer [idx=341, fileOff=50223036, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=860ms, walCpRecordFsyncDuration=245ms, pages=89627, reason='timeout']
    [05:30:22,429][INFO][db-checkpoint-thread-#107][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=9428aebc-f2b0-4d33-bed6-fb9a1ad49848, pages=89627, markPos=FileWALPointer [idx=341, fileOff=50223036, len=420491], walSegmentsCleared=0, markDuration=1288ms, pagesWrite=844ms, fsync=15767ms, total=17899ms]
    

    但是对于“free-mh”命令的输出,“free”列不能被释放,仍然随着流量的增加而增加,即使我停止了流量,它也会 不会减少,如果我保持发送流量,可用内存会继续减少,最后可用内存会减少到大约1兆字节,

    [root@Redis1 apache-ignite]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         16G        370M        971M        108G        107G
    Swap:          127G          0B        127G
    

    当这种情况发生时(空闲内存耗尽?),我所有基于Ignite Stop的服务都将继续处理我的新请求,对于Ignite,它将挂起。

    我还注意到检查点日志的原因是'timeout',我不知道这个用于点火的代表是否不能正确解析wal日志和释放os缓存缓冲区? 有没有办法让检查站正常工作来释放门罗?

    我的问题是,我该如何做才能防止点燃耗尽可用内存,并在持续打开的情况下保持服务可用, 我发现如果我关闭persistent,很快地点燃handle,并且在相同流量下缓存使用量小于1g,但是当启用persistent标志时, 操作系统缓存内存迅速增加垂直和用尽所有可用内存,然后点火无法从这种情况下恢复和挂起。

    我尝试了很多参数,使用walmode、log only或background、在jvm中设置-dengite_wal_mmap=false、设置checkpointpagebuffersize,但都没有 他们中的一个可以拯救我的点火服务,它仍然会吃掉操作系统缓存并耗尽它。

    https://apacheignite.readme.io/docs/write-ahead-log https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="defaultDataRegionConfiguration">
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <!-- 10 GB initial size. -->
                        <property name="initialSize" value="#{10L * 1024 * 1024 * 1024}"/>
                        <!-- 50 GB maximum size. -->
                        <property name="maxSize" value="#{50L * 1024 * 1024 * 1024}"/>
                        <property name="persistenceEnabled" value="true"/>
    
                        <property name="checkpointPageBufferSize" value="#{1024L * 1024 * 1024}"/>
                    </bean>
                </property>
              <property name="writeThrottlingEnabled" value="true"/>
              <property name="walMode" value="LOG_ONLY"/>
              <property name="walPath" value="/wal/ebc"/>
              <property name="walArchivePath" value="/wal/ebc"/>
            </bean>
        </property>
    

    下面是我的缓存配置:

    public void createLvOneTxCache() {
    
        CacheConfiguration<String, OrderInfo> cacheCfg =
                new CacheConfiguration<>("LvOneTxCache");
    
        cacheCfg.setCacheMode(CacheMode.REPLICATED);
        //cacheCfg.setStoreKeepBinary(true);
        cacheCfg.setAtomicityMode(ATOMIC);
        ebcLvOneTxCache = ignite.getOrCreateCache(cacheCfg);
    }
    

    我已尝试修改参数,但操作系统缓存仍在增加:

        <!-- Enabling Apache Ignite native persistence. -->
        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="defaultDataRegionConfiguration">
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <!-- 10 GB initial size. -->
                        <property name="initialSize" value="#{4L * 1024 * 1024 * 1024}"/>
                        <!-- 50 GB maximum size. -->
                        <property name="maxSize" value="#{4L * 1024 * 1024 * 1024}"/>
                        <property name="persistenceEnabled" value="true"/>
    
                        <property name="checkpointPageBufferSize" value="#{4L * 1024 * 1024 * 1024}"/>
                    </bean>
                </property>
              <property name="checkpointFrequency" value="6000"/>
              <property name="checkpointThreads" value="32"/>
              <property name="writeThrottlingEnabled" value="true"/>
              <property name="walMode" value="LOG_ONLY"/>
              <property name="walPath" value="/wal/ebc"/>
              <property name="walArchivePath" value="/wal/ebc"/>
            </bean>
        </property>
    

    很快就会触发日志显示审计,但缓存也不会被释放。

    [07:51:20,165][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=fd0c7e68-564a-4b40-9516-bb2a451869e7, startPtr=FileWALPointer [idx=23, fileOff=47849256, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=77ms, walCpRecordFsyncDuration=233ms, pages=7744, reason='timeout']
    [07:51:20,219][INFO][sys-stripe-0-#1][PageMemoryImpl] Throttling is applied to page modifications [percentOfPartTime=0.36, markDirty=16378 pages/sec, checkpointWrite=3322 pages/sec, estIdealMarkDirty=673642 pages/sec, curDirty=0.00, maxDirty=0.40, avgParkTime=21501 ns, pages: (total=7744, evicted=0, written=7744, synced=229, cpBufUsed=0, cpBufTotal=1036430)]
    [07:51:22,303][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=fd0c7e68-564a-4b40-9516-bb2a451869e7, pages=7744, markPos=FileWALPointer [idx=23, fileOff=47849256, len=420491], walSegmentsCleared=0, markDuration=317ms, pagesWrite=24ms, fsync=2114ms, total=2456ms]
    [07:51:26,117][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=d64991bc-3d2f-4f2c-8175-d7e92f46f0bf, startPtr=FileWALPointer [idx=25, fileOff=35951286, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=49ms, walCpRecordFsyncDuration=200ms, pages=7605, reason='timeout']
    [07:51:28,612][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=d64991bc-3d2f-4f2c-8175-d7e92f46f0bf, pages=7605, markPos=FileWALPointer [idx=25, fileOff=35951286, len=420491], walSegmentsCleared=0, markDuration=266ms, pagesWrite=23ms, fsync=2472ms, total=2761ms]
    [07:51:32,118][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=07246861-57ae-4ef5-8419-cb7710d2f72d, startPtr=FileWALPointer [idx=27, fileOff=38042090, len=420491], checkpointLockWait=6ms, checkpointLockHoldTime=60ms, walCpRecordFsyncDuration=185ms, pages=7186, reason='timeout']
    [07:51:32,121][INFO][service-#232][PageMemoryImpl] Throttling is applied to page modifications [percentOfPartTime=0.24, markDirty=10738 pages/sec, checkpointWrite=2757 pages/sec, estIdealMarkDirty=310976 pages/sec, curDirty=0.00, maxDirty=0.07, avgParkTime=358945 ns, pages: (total=7186, evicted=0, written=896, synced=0, cpBufUsed=565, cpBufTotal=1036430)]
    [07:51:34,534][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=07246861-57ae-4ef5-8419-cb7710d2f72d, pages=7186, markPos=FileWALPointer [idx=27, fileOff=38042090, len=420491], walSegmentsCleared=0, markDuration=257ms, pagesWrite=29ms, fsync=2387ms, total=2679ms]
    [07:51:38,169][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=44e6870a-e370-4bd3-8ad9-8252abb0acd3, startPtr=FileWALPointer [idx=29, fileOff=44462293, len=420491], checkpointLockWait=0ms, checkpointLockHoldTime=76ms, walCpRecordFsyncDuration=210ms, pages=7529, reason='timeout']
    [07:51:40,668][INFO][db-checkpoint-thread-#108][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=44e6870a-e370-4bd3-8ad9-8252abb0acd3, pages=7529, markPos=FileWALPointer [idx=29, fileOff=44462293, len=420491], walSegmentsCleared=0, markDuration=303ms, pagesWrite=24ms, fsync=2475ms, total=2802ms]
    
    
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        107G        995M        3.5G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        107G        995M        3.5G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        107G        995M        3.5G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        105G        995M        5.6G        109G
    Swap:          127G          0B        127G
    

    当我停止流量来更新缓存时,我发现操作系统缓存会恢复,但速度非常慢,需要很长时间才能释放, 检查点频率为6秒。如何快速处理?

    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        104G        995M        6.5G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        104G        995M        6.3G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        104G        995M        6.3G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        106G        995M        4.6G        109G
    Swap:          127G          0B        127G
    [root@Redis1 node00-296a5110-74ad-45e0-bf9c-5c075a4f5fdf]# free -mh
                  total        used        free      shared  buff/cache   available
    Mem:           125G         14G        106G        995M        4.4G        109G
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Mitya XMitya    7 年前

    操作系统缓存磁盘数据是完全可以的,这里解释得很好 linux ate my ram 是的。如果您的内核支持,那么您总是可以设置空闲内存的数量,这可以减少ignite分配时的暂停。 new memory blocks

    推荐文章