Announcement

Collapse
No announcement yet.

Event Diagnostics Servlet - Logging and identifying event-related performance problem

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Event Diagnostics Servlet - Logging and identifying event-related performance problem

    This might help if you're having performance issues that can be seen in the Event Diagnostics Servlet


    One of the limitations we've hit is that the servlet only displays current data



    To enable the compilation of historical event diagnostics statistics, you can:



    1. Create a shell script to save the event diagnostics data at 15 minute intervals
    2. Have the shell script also generate a log file (CSV format) for easier analysis. eg, see this event_log.last_7_days.zip


    Note - this means your event diagnostics stats will be reset every 15 minutes, but because it's saved to a file you'll still have the same diag data and more




    To get you started:


    1. This is a sample shell script for Unix environments (build_event_log.sh). build_event_log.zip
    2. sudo to otm, and create this in /users/otm (/users/glog in OTM 5.0)
    3. Update the environment variables in the script for your app (if you have OTM 5.0, ensure that you also change the 'awk' line)
    4. Allow execute permissions to the script. eg,
    Code:
    chmod 700 build_event_log.sh
    5. Test that this shell script works from the command line. ie,
    Code:
    ./build_event_log.sh
    6. Check the output at (your OTM app path)/logs/event_log.last7days.csv. If it works, add a cron job to run this every 15 minutes (and changing your home path below for OTM 5.0). eg,
    Code:
    0,15,30,45 * * * * /users/otm/build_event_log.sh > /users/otm/build_event_log.log 2>&1
    7. Allow read and execute permissions to directories (your OTM app path) and (your OTM app path)/logs to make it easier to download. eg,
    Code:
    chmod 755 (your OTM app path)
    chmod 755 (your OTM app path)/logs
    8. Whenever you now need historical event diagnostics stats, you can just download "event_log.last7days.csv" from the standard logs folder - how sweet is that! Use the friendly WinSCP or a similar download tool
    9. Double-click the CSV file on your desktop - and it'll open up in Excel
    10. From here you can apply filters and build PivotCharts to suit your angle of analysis. eg, I find that SERVER_TIME under Axis Fields, QUEUE under Legend Fields and Sum of MAX_WAIT_TIME under Values is a good place to start, initially filtering the server time for Today and queues for agentPlanning, agentUtility, batch, execution and publishWait. eg, see this


    From here:

    - You can check the MAX_PROCESSING_EVENT for the affected times to see what needs most optimisation, correction or otherwise
    - Check TOTAL_THROUGHPUT in case it's a case of higher load, and increasing the threads might help
    - Compare with any errors in the PROBLEM table, your agent run durations in other logs, for agent workflow issues
    - If you want more than the last 7 days in your log, run it once off using the other included script. eg,
    Code:
    ./build_once_off.sh 90    # for the last 90 days, assuming you have already captured that data
    - And then take it from there - hope this helps someone!
    Attached Files
    Last edited by samson.kaing; March 25th, 2014, 07:17. Reason: Fixed for versions higher than OTM 5.0
    If my post helped you please click the Thanks button!
Working...
X