Monitor Weblogic Stuck Threads - WLST Script

In this post, we are going to share the Weblogic Stuck Thread Monitoring Script which will automatically trigger an email notification and create heap and thread dumps when the maximum Thread stuck count has reached the threshold ( currently set to 10 but can be modified.)

This is written with WLST and Shell Scripts and it is property file based script and for security purposes, it has been designed to use the Weblogic Config file and Weblogic Keyfile for WLST authentication.

If you are not aware how to create these WebLogic Security and Configuration file you can read this post and come back here

This script has been tested in weblogic 11g and 12c .

Let us not get stuck here like this Bear. ūüôā Let's move on.

 

 

Salient Features of this Script

  • Triggering Email Alert when the stuck thread count reached the threshold
  • Creates Heap Dump and Thread Dump in case of Stuck Thread Identified and the threshold reached.
  • Eliminate Duplicate Heap Dump and Thread dump Creation by¬† monitoring the Count Growth
  • Use Jstack and Jmap Utilities for more advanced results.
  • Securely logging into Domain with Configfile and Keyfile, NO CLEAR TEXT PASSWORD
  • Monitor all the Running Managed Servers in the Domain. Even if you are adding any new managed server in the future there is no need to update/modify the script.
  • Works Perfectly with Weblogic 11g and 12c

 

How to use this script to Monitor Stuck thread

This script can be invoked on demand basis or can be scheduled in CRONTAB to run every 5 minutes or 10 minutes based upon your requirement and server capacity.

Here is the line you have to add in crontab after downloading the placing the scripts in place.

*/5 * * * * /path/to/workspace/weblogic_monitor_stuck.sh > /path/to/workspace/wlsstuckmonitor.log 2>&

 

Downloading the Scripts and Getting Ready.

At first,  we need to decide where we are going to place these files. It should the workspace where all three files should be present.

Create a Directory or Choose the existing directory and copy all the following files with their respective names. Ideally, all three files must be present in the same directory.

Names must not be changed as it has internal references in other scripts

 

The Input Properties file

Save the file in the name input.properties in the same directory where you are placing the other two files

The Script creates few temp files in the /tmp directory and you can control their names by modifying the output.file variables.

domain.name=TestDomain
admin.url=t3://mytestwls.com:7101
config.file=/home/oracle/script/adminConfig.secure
key.file=/home/oracle/script/adminKey.secure
output.file1=/tmp/applist
output.file2=/tmp/datasourcelist
output.file3=/tmp/Serverstats
emailID[email protected]
DOMAINDIR=/opt/app/domains/TestDomain

The Python WLST Script

Save the file in the name wls_monitor_stuck.py in the same directory where you are placing the other two files

def usage():
  print "Usage:"
  print "java weblogic.WLST /home/oracle/script/wls_monitor_stuck.py"

def conn(URL,Configfile,Keyfile):
        try:
           connect(userConfigFile=Configfile, userKeyFile=Keyfile, url=URL)
        except:
           print 'UNABLE TO CONNECT To ADMIN SERVER RUNNING AT' + URL
           print 'PLEASE CHECK THE LOGIN CREDENTIALS AND IF THE ADMIN SERVER IS RUNNING'
           print dumpStack()
           exit()

def getStatus(server):
   cd('/ServerLifeCycleRuntimes/' + server.getName())
   return cmo.getState()

def getHealth(server):
   cd('/ServerRuntimes/' + server.getName())
   tState = cmo.getHealthState().getState()
   if (tState == 0):
     return 'OK'

def getThreadstat(server, type):
   cd('ServerRuntimes/' + server.getName() + '/ThreadPoolRuntime/ThreadPoolRuntime')
   if (type == 'S'):
     return int(cmo.getStuckThreadCount())
   elif (type == 'H'):
     return int(cmo.getHoggingThreadCount())

def monitorReport():
    servers = cmo.getServers()
    domainRuntime()

    for msrvr in servers:
       mName  = msrvr.getName()
       mState = ''
       hState = ''
       sCnt   = 0
       hCnt   = 0

       if (mName != 'AdmSvr'):
         mState = getStatus(msrvr)
         if (mState == 'RUNNING'):
           hState = getHealth(msrvr)
           sCnt   = getThreadstat(msrvr, 'S')
           hCnt   = getThreadstat(msrvr, 'H')
           print >>fileStuck, '%s %s %5d' %(mName, "=", sCnt)
           print >>fileHogging, '%s %s %5d' %(mName, "=", hCnt)

if __name__== "main":

   from java.io import FileInputStream
   import sys
   import os
   import getopt
   import datetime
    
   import os
   cwd=os.path.dirname(os.path.realpath(__file__))

   propInputStream = FileInputStream(cwd+'input.properties')
   configProps = Properties()
   configProps.load(propInputStream)
   domainName=configProps.get('domain.name')
   URL=configProps.get('admin.url')
   Configfile=configProps.get('config.file')
   Keyfile=configProps.get('key.file')
   now = datetime.datetime.now()

   redirect("/dev/null",'false')
   fileStuck = open("/tmp/fileStuck.txt", 'w')
   fileHogging = open("/tmp/fileHogging.txt", 'w')
   print "URL is " +URL
   print "Configfile is " +Configfile
   print "Keyfile is " +Keyfile

   if os.path.exists(Configfile) and os.path.exists(Keyfile):
      print "CONNECTING TO THE ADMIN SERVER RUNNING AT " +URL
   else:
      print "UNABLE TO READ USER KEY AND CONFIG FILES " +Configfile+ " AND " +Keyfile
      sys.exit(2)

   conn(URL,Configfile,Keyfile)
   monitorReport()

The Wrapper Shell Script

Save the file in the name weblogic_monitor_stuck.sh in the same directory where you are placing the other two files

#!/bin/bash
now=$(date +"%Y-%m-%d")
BASEDIR=`dirname $0`
recp=`cat $BASEDIR/input.properties | grep emailID | awk -F "=" {'print $2'}`
HN=`hostname -f`
DOM=`cat $BASEDIR/input.properties | grep DOMAINDIR | awk -F "=" {'print $2'}`
source $DOM/bin/setDomainEnv.sh
java weblogic.WLST -skipWLSModuleScanning $BASEDIR/wls_monitor_stuck.py

echo "Stuck Threads" > /tmp/allThreadsDetails.txt
cat /tmp/fileStuck.txt >> /tmp/allThreadsDetails.txt
cat /tmp/fileHogging.txt  >> /tmp/allThreadsDetails.txt

###################
takedump()
{
LOOP=6
        # Interval in seconds between data points.
        INTERVAL=30
        DIRECTORY="/tmp/threaddump"
                if [ -d "$DIRECTORY" ]; then
                cd $DIRECTORY
                else
                mkdir $DIRECTORY
                fi
        for ((i=1; i <= $LOOP; i++))
        do
                #PID=`ps -feww | grep java | grep $MGRS  | grep -v grep |  awk {'print $2'}`
                $JSPA/bin/jstack -l $PID > $DIRECTORY/threaddump_L_`date +%d%b%Y_%H%M%S`.log
                $JSPA/bin/jstack -F $PID > $DIRECTORY/threaddump_F_`date +%d%b%Y_%H%M%S`.log
                echo "thread dump #" $i
                        if [ $i -lt $LOOP ]; then
                              echo "Sleeping..."
                              sleep $INTERVAL
                        fi
        done
        $JSPA/bin/jmap -dump:format=b,file=$DIRECTORY/heapJMap_MgrSvr_pid"$PID"_`date +%d%b%Y_%H%M%S`.bin $PI
}

###################
file="/tmp/fileStuck.txt"
while IFS= read -r line
do
        MGRS1=`echo "$line" | awk {'print $1'}`
        COU1=`echo "$line" | awk {'print $3'}`
        JSPA=`ps -feww | grep $MGRS |  grep "Dweblogic.Name" | grep -v  grep  | awk {'print $8'} `
        $JSPA -version
        if [ $COU1 -gt 10 ]; then
                echo "Stuck threads available in $MGRS in $HN. So taking dumps......"
                if [ -z /tmp/fileStuckConter.txt ]; then
                        echo "0" > /tmp/fileStuckConter.txt
                fi
                COUTMP=`cat /tmp/fileStuckConter.txt |  grep -v  grep`
                if [ $COUTMP -lt $COU1 ] ;then
                        echo "$COU1" > /tmp/fileStuckConter.txt
                        takedump
                echo | mail -s "Stuck threads occured in $HN" $recp
                fi
        fi
done <"$file"

file2="/tmp/fileHogging.txt"
while IFS= read -r line
do
        MGRS=`echo "$line" | awk {'print $1'}`
        COU2=`echo "$line" | awk {'print $3'}`
        #JSPA=`ps -feww | grep $MGRS | grep -v  grep  | awk {'print $8'} `
        JSPA=`ps -feww | grep $MGRS |  grep "Dweblogic.Name" | grep -v  grep  | awk {'print $8'} `
        $JSPA -version
        if [ $COU2 -gt 10 ]; then
                echo "Stuck threads available in $MGRS in $HN. So taking dumps......"
                if [ -z /tmp/fileHogCounter.txt]; then
                        echo "0" > /tmp/fileHogCounter.txt
                fi
                COUTMP2=`cat /tmp/fileHogCounter.txt |  grep -v  grep`
                if [ $COUTMP2 -lt $COU2 ] ;then
                        echo "$COU1" > /tmp/fileHogCounter.txt
                        takedump
                echo | mail -s "Hogging threads occured in $HN" $recp
                fi
        fi
done <"$file2"

 

Credits:  These scripts were created by  Mohan Babu Vunnam. We @middewareinventory thank him for sharing this script with us so does the rest of the world.

Hope it helps.

Ask if you have any questions over the comments section. We will get back to you as soon as possible or you can join in our Whatsapp group for immediate assistance and support.

Follow me on Linkedin My Profile
Follow DevopsJunction onFacebook orTwitter
For more practical videos and tutorials. Subscribe to our channel

Buy Me a Coffee at ko-fi.com

Signup for Exclusive "Subscriber-only" Content

Loading