Announcement

Collapse
No announcement yet.

OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

    Long, but hopefully worth it. Chris, do you think this should also be duplicated in the database forum? If so, let me know, and I'll post it there as well.

    After we upgraded to OTM 5.5 CU4 from 5.0, we experienced performance problems on the database server. We’re running the certified Linux version on a server with 24 Gb. of RAM and 4 dual core processors; a fairly robust system.

    I wanted to utilize HUGE_PAGES to allocate a LOT of the machine’s memory to the database buffer cache (About 10 Gb. initially). My thought process was accessing data from RAM was quicker than accessing data from a disk (And, that still stands, BTW).

    So, the users get on the system, and they’re complaining about performance, even moving from screen to screen is slow. Looking at the system using the Linux top utility, I notice something rather strange that I haven’t seen before. I see that SYSTEM utilization is much higher than user utilization, meaning the system is busy doing something other than processing your data (i.e. house keeping).

    Here’s an example of the top session (I’ve highlighted the problem area):

    top - 09:51:56 up 19 days, 19:45, 2 users, load average: 8.97, 8.97, 8.48
    Tasks: 255 total, 11 running, 244 sleeping, 0 stopped, 0 zombie
    Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Cpu1 : 3.0% us, 96.7% sy, 0.0% ni, 0.0% id, 0.3% wa, 0.0% hi, 0.0% si
    Cpu2 : 5.0% us, 95.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Cpu3 : 3.0% us, 97.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Cpu4 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Cpu5 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Cpu6 : 2.3% us, 96.7% sy, 0.0% ni, 0.0% id, 1.0% wa, 0.0% hi, 0.0% si
    Cpu7 : 1.3% us, 98.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
    Mem: 24953744k total, 24934296k used, 19448k free, 73048k buffers
    Swap: 36861100k total, 23132k used, 36837968k free, 11840736k cached

    As you can see, the system is busy doing stuff, but NOT what we want it to be doing.

    So, after a few iterations that I won’t go through in detail (Disabling huge pages, changing some database parameters, etc.), our Linux guru, John Poff looks closely at what’s happening. His findings are quite interesting. He found that one of the Oracle system processes was calling a Linux system function called GetTimeOfDay. And, it was calling this a LOT! How many times? Well, over a 30 second period, a call from an Oracle process was made to this function over 505,000 times. That’s a LOT, and it was affecting our performance in a BIG way.

    So, off to MetaLink I go, searching for Linux and GetTimeOfDay, and I find a few hits. It’s associated with a database parameter called STATISTICS_LEVEL. The OTM 5.5 documentation says this setting should be set to ALL, which I had done, as they outlined in the documentation, and in the sample init.ora file delivered in the scripts directory. This parameter also has an effect on another parameter, TIMED_OS_STATISTICS, which degrades performance even more. The MetaLink TARs say that the STATISTICS_LEVEL parameter should only be set to ALL if you’re looking for something specific; otherwise, set it to the default, which is TYPICAL.

    So, after a database parameter change, and a bounce of the entire system, this recommended parameter seems to have been the source of all of our grief. Once our users got back on, I did NOT see the behavior that I had seen before on the database server.

    I hope others can avoid the turmoil that John, Beth, myself and others have endured in discovering this tidbit of information. It took almost 2 weeks to get to the bottom of this for us. I don’t know, maybe we’re slow. But, the ironic part of this is that Oracle recommended setting this parameter to the non-default value. If you follow the documentation, you’ll probably experience performance problems on your database tier. I’m guessing it may affect other platforms, as well, but I’m not 100% certain.

    Thanks...Steve Hughes

  • #2
    Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

    Steve,

    I've linked this to both the installation and database sections. I really appreciate this. While I haven't seen this issue directly, I have no doubt that it will help us (and others) in the future.

    Also - could you post the version of Linux that you're running on? Red Hat or Oracle Linux? 4.0 or 5.0? Also - I'm assuming it is the 64-bit version, right?

    Thanks!
    --Chris

    Comment


    • #3
      Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

      We're running 32-bit RedHat Linux, 4.0. here's the header information when I login:

      Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
      Kernel 2.6.9-42.ELhugemem on an i686

      No, we're not using the 64 bit, we're using the huge pages kernel, which allows you to define a large, contiguous segment of memory for usage. It can be over 2 Gb., the "normal" 32 bit OS limit.

      Hope this helps.

      Thanks...Steve Hughes

      Comment


      • #4
        Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

        I have also seen the same issue on another client site using 5.5. I had a suspicion that it had to do with a non standard RH build but couldn’t pinpoint it. The only thing I could find was that the umask was not set correctly. Can you find out if you have a firewall enabled or SE Linux enabled which may be causing this?
        If my post was helpful please click on the Thanks! button

        MavenWire Hosting Admin
        15 years of OTM experience

        Comment


        • #5
          Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

          Nick:

          Yes, we are behind a fire wall, and we have a fully configured DMZ, so we've got firewall rules in place that allow connectivity between the servers.

          Another thing that often gets overlooked is data communication between servers. I believe that some NIC cards default to half duplex, instad of full duplex, so check that as well.

          Hmm, other things. The recompile invalid objects script doesn't always work, especially on a few VPD_LOGON objects (I think they're triggers, if I remember correctly). Recompile them manually, that should help some as well.

          And, then you probably need to increase your JVM memory settings on your app server and web server. I've seen thrashing garbage collection bring a server to it's knees for having too little memory allocated for the JVM (It was only 256M). I've seen plenty of references to this here, so I'm sure you've got that covered.

          And, obviously, check the last time database statistics were collected, and the STATISTICS_LEVEL parameter that was the main reason for this thread.

          And, finally, don't forget to create a startup database trigger to pin certain packages in memory, especially large packages that are executed frequently. This is very easy to do, let me know if you need a code example.

          Thanks...Steve Hughes

          Comment


          • #6
            Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

            Hi Steve,

            I was actually referring to the software firewall and SE Linux on the RH server.

            Thanks,
            Nick
            If my post was helpful please click on the Thanks! button

            MavenWire Hosting Admin
            15 years of OTM experience

            Comment


            • #7
              Re: OTM 5.5 performance problems? Check STATISTICS_LEVEL in database.

              This is interesting. We have the same config for STATISTICS_LEVEL, but we are not having the high system utilization. Just the same, I have changed the parameters per Oracle's (and this post's) recommendations.

              Another source for high system utilization is the *paging* rate on RHEL4. I am planning a move to utilize HUGE_PAGES this evening. It's seems to be the best way to pin the SGA in RH.

              The symptom is heavy pagging in response to db requests. Here's a vmstat that shows what I mean:

              procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
              r b swpd free buff cache si so bi bo in cs us sy id wa
              25 2 3068524 14288 616 4221600 406 0 2333 494 2024 3107 11 72 9 8
              39 1 3069136 14136 332 4221196 2 24 114 55 1067 646 1 99 0 0
              1 17 3156908 15228 592 4155300 198 4926 1325 5036 2014 902 2 84 1 13
              0 25 3161944 18308 800 4152288 69 236 169 286 1576 497 1 1 2 96
              0 21 3185408 17772 1780 4144732 1091 2034 2320 2489 1698 1524 3 5 15 78
              0 18 3186784 20228 2196 4150308 278 2034 1367 2405 1814 917 6 3 6 85
              0 16 3186908 19876 2468 4162684 460 790 2270 1151 3581 3309 23 3 8 65
              0 19 3186960 20924 2464 4170324 204 2171 2203 2354 1842 898 9 2 9 81
              4 7 3174388 25804 2064 4178432 2247 267 11999 784 2138 1876 25 8 17 50
              0 3 3160332 18980 2048 4176600 3894 0 7509 547 4632 5829 38 6 23 32
              5 4 3140384 18356 1340 4172824 5746 0 9578 2553 3804 4232 28 7 35 31
              0 4 3115104 20428 1636 4191620 6768 0 12222 429 3390 4052 36 8 29 28

              high # of blocked processes, a lot of blocks swapped in, and a lot of waitung. Ugh.

              The /proc/sys/vm/swappiness parameter provided brief relief of the pagging issues, but over time, the SGA was still paged out. It's very frustrating to see our system start paging horribly, even though we have ample memory. It seems that RHEL4 is very aggressive about pushing infrequently used pages out in order to make room for file system buffers (which are of little use on a dedicated DB server anyway).

              On our test systems I have moved to HUGE pages and set the filesystemio_options parameter to SetALL, that tells Oracle to bypass the file system buffers and write directly to disk. I believe that this will be more efficient than double buffering (SGA+Filesystem) and will reduce memory pressure at the OS level as well. So far I have seen no ill effects from the changes and I'm planning the changes in PROD this evening.

              Here is an excellent link for RHEL optimization.

              Tuning and Optimizing RHEL for Oracle 9i and 10g Databases (Red Hat Enterprise Linux, 4, 3, 2.1 - redhat, x86-64)

              Comment

              Working...
              X
              😀
              🥰
              🤢
              😎
              😡
              👍
              👎