VPS自动监控Shell脚本

林继 VPS 知识 12,788 共写了764个字 (2011-11-18 21:42:04) 没有评论 打印 扫描二维码 百度未收录

该脚本用于监控VPS服务器负载,Web程序内存及CPU使用。当服务器系统负载或内存使用达到预设值,则重启该程序,或者某个php-cgi进程占用CPU过大,则直接kill掉该进程。目的在于缓解服务器资源耗尽导致意外宕机等情况。
嗯,没错。该脚本是此前 v1 的更新版本,考虑今后可能还会更新,故移到 github gist 进行简单的版本控制。

一、使用方法:

  1. git clone git://gist.github.com/1216837.git gist-1216837
  2. vi gist-1216837/sys-mon.sh //修改内存、CPU等预设阀值
  3. mkdir /var/script
  4. mv gist-1216837/sys-mon.sh /var/script
  5. chmod a+x sys-mon.sh

设置每分钟执行一次

  1. crontab -e
  2. * * * * * /bin/bash  /var/script/sys-mon.sh
  3. vi /usr/local/LuNamp/cmd/crontab_cmd.sh
  4. * * * * * /bin/bash  /var/script/sys-mon.sh

二、Shell脚本内容

建议打开下面网址查看最新版本。

https://gist.github.com/1216837

以下适用于LUM:

  1. 使用条件
  2. 1.curl支持(Ubuntu用户直接 apt-get install curl)
  3. 2.正确的Unix时间 (设置方法如下)
  4. date -s 121120442011
  5. #mmddhhmmyyyy
  6. #月 天 小时 分钟 年
  7.  
  8. 1.vi /var/script/sys-mon.sh
  9. 2.点键盘i键
  10. 3.粘贴脚本内容(见帖子下方)
  11. 4.点击ESC
  12. 5.输入:wq 回车
  13. 6.chmod +x /home/monitor.sh
  14. 7.crontab -e
  1. #! /bin/bash
  2. #====================================================================
  3. # sys-mon.sh
  4. #
  5. # Copyright (c) 2011, WangYan <webmaster@wangyan.org>
  6. # All rights reserved.
  7. # Distributed under the GNU General Public License, version 3.0.
  8. #
  9. # Monitor system mem and load, if too high, restart some service.
  10. #
  11. # See: https://wangyan.org/blog/sys-mon-shell-script.html
  12. # Modify By wtCoder
  13. # blog: http://blog.wtcoder.net/
  14. # V 0.5.1, Date: 2011-12-11
  15. #====================================================================
  16. #  start
  17. # 设置环境变量
  18. PATH=/sbin:/bin:/usr/sbin:/usr/bin
  19.  
  20. # 监控进程名称
  21. NAME_LIST="httpd nginx mysql php-cgi"
  22.  
  23. # 单个子进程的CPU占用百分比
  24. PID_CPU_MAX="25"
  25.  
  26. # 单个程序所有进程内存占用百分比
  27. PID_MEM_SUM_MAX="95"
  28.  
  29. # 最大系统负载限制
  30. SYS_LOAD_MAX="6"
  31.  
  32. # 日志记录路径
  33. LOG_PATH="/var/log/sys-mon.log"
  34.  
  35. # 日期格式化设置
  36. DATA_TIME=$(date +"%Y-%m-%d-%H:%M:%S")
  37.  
  38. # 测试站点是否正常运行的url
  39. MY_URL="http://xxx/"
  40.  
  41. # 程序控制脚本目录
  42. CMD_PATH="/usr/local/LuNamp/cmd/"
  43.  
  44. # PHP-CGI控制目录
  45. FPM_PATH="/usr/local/php_fcgi/sbin/php-fpm"
  46.  
  47. # 指定当前使用的服务器网卡
  48. NETWORK_CARD="eth0"
  49.  
  50. # 通知邮件地址
  51. EMAIL="xxx@139.com"
  52.  
  53. # 指定地址发送邮件通知
  54. # 格式?mailTo=xxx@gmail.com&title=xxx&content=xxx
  55. # get发送。内容言简意赅就好.
  56.  
  57. # SENDMAIL_URL="http://blog.wtcoder.net/mail/send_mail.php?";
  58.  
  59. #发送方式 0:系统mail 1:第三方代发
  60.  
  61. SENDMAIL_TYPE="0";
  62. #=============================
  63. # $1 string Monitor Name
  64. # $2 string start OR stop
  65. #=============================
  66. proControle()
  67. {
  68.      OPERATION=$2
  69.         case $1 in
  70.         "php-cgi")
  71.             ${FPM_PATH} ${OPERATION}
  72.          ;;
  73.         "mysql")
  74.             ${CMD_PATH}mysql-${OPERATION}
  75.         ;;
  76.         "httpd")
  77.             ${CMD_PATH}apache-${OPERATION}
  78.         ;;
  79.         "nginx")
  80.             ${CMD_PATH}nginx-${OPERATION}
  81.         ;;
  82.     esac
  83. }
  84. for NAME in $NAME_LIST
  85. do
  86.     PID_CPU_SUM="0";PID_MEM_SUM="0"
  87.     PID_LIST=`ps aux | grep $NAME | grep -v root`
  88.     IFS_TMP="$IFS";IFS=$'\n'
  89.     for PID in $PID_LIST
  90.     do
  91.         PID_NUM=`echo $PID | awk '{print $2}'`
  92.         PID_CPU=`echo $PID | awk '{print $3}'`
  93.         PID_MEM=`echo $PID | awk '{print $4}'`
  94. #       echo "$NAME: PID_NUM($PID_NUM) PID_CPU($PID_CPU) PID_MEM($PID_MEM)"
  95.         PID_CPU_SUM=`echo "$PID_CPU_SUM + $PID_CPU" | bc`
  96.         PID_MEM_SUM=`echo "$PID_MEM_SUM + $PID_MEM" | bc`
  97.         if [ `echo "$PID_CPU >= $PID_CPU_MAX" | bc` -eq 1 ];then
  98.             #先尝试结束高占用的php-cgi子进程
  99.             if [[ "$NAME" = "php-fpm" || "$NAME" = "httpd" ]];then
  100.                 sleep 5
  101.                 if [ `echo "$PID_CPU >= $PID_CPU_MAX" | bc` -eq 1 ];then
  102.                     echo "${DATA_TIME}: kill ${NAME}($PID_NUM) successful (CPU:$PID_CPU)" | tee -a $LOG_PATH
  103.                     kill $PID_NUM
  104.                 fi
  105.             else
  106.                 echo "${DATA_TIME}: [WARNING!] ${NAME}($PID_NUM) cpu usage is too high! (CPU:$PID_CPU)" | tee -a $LOG_PATH
  107.             fi
  108.         fi
  109.     done
  110.     IFS="$IFS_TMP"
  111.     SYS_LOAD=`uptime | awk '{print $(NF-2)}' | sed 's/,//'`
  112.     SYS_MON="CPU:$PID_CPU_SUM------MEM:$PID_MEM_SUM------LOAD:$SYS_LOAD"
  113. #   echo -e "$NAME: $SYS_MON\n"
  114.     SYS_LOAD_TOO_HIGH=`awk 'BEGIN{print('$SYS_LOAD'>'$SYS_LOAD_MAX')}'`
  115.     PID_MEM_SUM_TOO_HIGH=`awk 'BEGIN{print('$PID_MEM_SUM'>'$PID_MEM_SUM_MAX')}'`
  116.     #仍然高负载 则开始重启主进程
  117.     if [[ "$SYS_LOAD_TOO_HIGH" = "1" || "$PID_MEM_SUM_TOO_HIGH" = "1" ]];then
  118.         proControle $NAME stop
  119.         sleep 5
  120.         for i in `seq 3`
  121.                 do
  122.                     if [ `pgrep $NAME | wc -l` = "0" ];then
  123.                         echo "$DATA_TIME: Stop $NAME successful! ($SYS_MON)" | tee -a $LOG_PATH
  124.                         break
  125.                     else
  126.                         echo "${DATA_TIME}: [WARNING!] Stop $NAME failed[$i]! ($SYS_MON)" | tee -a $LOG_PATH
  127.                         pkill $NAME && killall $NAME
  128.                     fi
  129.         done
  130.         proControle $NAME start
  131.         sleep 5
  132.         for ii in `seq 3`
  133.                 do
  134.                     if [ `pgrep $NAME | wc -l` != "0" ];then
  135.                         echo "$DATA_TIME: Start $NAME successful!" | tee -a $LOG_PATH
  136.                         break
  137.                     else
  138.                         echo "${DATA_TIME}: [WARNING!] Start $NAME failed[$ii]! ($SYS_MON)" | tee -a $LOG_PATH
  139.                         proControle $NAME start
  140.                         sleep 5
  141.                     fi
  142.         done
  143.         if [ `pgrep $NAME | wc -l` != "0" ];then
  144.             echo "${DATA_TIME}: [ERROR!] Start $NAME failed! ($SYS_MON)" | mail -s "Start $NAME failed" $EMAIL
  145.         fi
  146.     fi
  147. done
  148. STATUS_CODE=`curl -o /dev/null -s -w %{http_code} $MY_URL`
  149. if [ "$STATUS_CODE" != "200" ];then
  150.     sleep 3
  151.     ipaddr=`ifconfig ${NETWORK_CARD} |grep "inet addr"| cut -f 2 -d ":"|cut -f 1 -d " "`
  152.     STATUS_CODE=`curl -o /dev/null -s -w %{http_code} $MY_URL`
  153.     if [ "$STATUS_CODE" != "200" ];then
  154.         echo "${DATA_TIME}: [WARNING!] Website Downtime! ($SYS_MON)" | tee -a $LOG_PATH
  155.         if [ "$SENDMAIL_TYPE" = "0"];then
  156.             echo "${DATA_TIME}: [WARNING!] Website Downtime! ($SYS_MON)" | mail -s "Start $NAME failed" $EMAIL
  157.         else
  158.             TITLE="${DATA_TIME}--WARNING--IP:${ipaddr}--Website_Down"
  159.             SEND_STATUS=`curl -o /dev/null -s -w %{http_code} ${SENDMAIL_URL}mailTo=${EMAIL}\&title=${TITLE}\&content=$SYS_MON`
  160.             if [ "$SEND_STATUS" != "200" ];then
  161.                 `curl -o /dev/null -s ${SENDMAIL_URL}mailTo=${EMAIL}\&title=${TITLE}\&content=$SYS_MON`
  162.             fi
  163.         fi
  164.     fi
  165. fi

注意事项

1、NAME_LIST 指定的监控程序必须在/etc/init.d 文件夹中存在,并且支持stop和start操作
2、PID_CPU_MAX 指的是单个进程的CPU占用,只针对php-fpm或httpd。
3、PID_MEM_SUM_MAX 指的是该程序所有进程实际内存占用,而并非系统总内存。
4、EMAIL 只有在程序启动失败后,你才能收到邮件提醒。

  1. #! /bin/bash
  2. #====================================================================
  3. # sys-mon.sh
  4. #
  5. # Copyright (c) 2011, WangYan <webmaster@wangyan.org>
  6. # All rights reserved.
  7. # Distributed under the GNU General Public License, version 3.0.
  8. #
  9. # Monitor system mem and load, if too high, restart some service.
  10. #
  11. # See: https://wangyan.org/blog/sys-mon-shell-script.html
  12. #
  13. # V 2, since 2011-09-14
  14. #====================================================================
  15.  
  16. # Need to monitor the service name
  17. NAME_LIST="php-fpm mysql nginx"
  18.  
  19. # Single process to allow the maximum CPU (%)
  20. PID_CPU_MAX="20"
  21.  
  22. # The maximum allowed memory (%)
  23. SYS_MEM_MAX="90"
  24.  
  25. # The maximum allowed system load
  26. SYS_LOAD_MAX="5"
  27.  
  28. # Log path settings
  29. LOG_PATH="/var/log/autoreboot.log"
  30.  
  31. # Date time format setting
  32. DATA_TIME=$(date +"%y-%m-%d %H:%M:%S")
  33.  
  34. # Your email address
  35. EMAIL="webmaster@wangyan.org"
  36.  
  37. # Your website url
  38. MY_URL="https://wangyan.org/blog"
  39.  
  40. #====================================================================
  41.  
  42. for NAME in $NAME_LIST
  43. do
  44.     SYS_CPU_SUM="0";SYS_MEM_SUM="0"
  45.     PID_LIST=`ps aux | grep $NAME | grep -v root`
  46.  
  47.     IFS_TMP="$IFS";IFS=$'\n'
  48.     for PID in $PID_LIST
  49.     do
  50.         PID_NUM=`echo $PID | awk '{print $2}'`
  51.         PID_CPU=`echo $PID | awk '{print $3}'`
  52.         PID_MEM=`echo $PID | awk '{print $4}'`
  53. #       echo $NAME: $PID_NUM $PID_CPU $PID_MEM
  54.  
  55. #       SYS_CPU_SUM=`echo $SYS_CPU_SUM + $PID_CPU | bc`
  56.         SYS_MEM_SUM=`echo $SYS_MEM_SUM + $PID_MEM | bc`
  57.  
  58.         if [[ "$NAME" = "php-fpm" && "$PID_CPU" > "$PID_CPU_MAX" ]];then
  59.             echo "$DATA_TIME kill $PID_NUM successful (CPU:$PID_CPU)" | tee -a $LOG_PATH
  60.             kill $PID_NUM
  61.         fi
  62.     done
  63.     IFS="$IFS_TMP"
  64.  
  65.     SYS_LOAD=`uptime | awk '{print $(NF-2)}' | sed 's/,//'`
  66.     MEM_COMPARE=`awk 'BEGIN{print('$SYS_MEM_SUM'>'$SYS_MEM_MAX')}'`
  67.     LOAD_COMPARE=`awk 'BEGIN{print('$SYS_LOAD'>'$SYS_LOAD_MAX')}'`
  68. #   echo -e "$NAME: CPU_SUM:$SYS_CPU_SUM MEM_SUM:$SYS_MEM_SUM SYS_LOAD:$SYS_LOAD\n"
  69.  
  70.     for ((i=0;i<3;i++))
  71.     do
  72.         STATUS_CODE=`curl -o /dev/null -s -w %{http_code} $MY_URL`
  73.         if [ "$STATUS_CODE" = "200" ];then
  74.             break
  75.         fi
  76.     done
  77.  
  78.     if [[ "$MEM_COMPARE" = "1" || "$LOAD_COMPARE" = "1" || "$STATUS_CODE" = "502" ]];then
  79.         /etc/init.d/$NAME stop
  80.         if [ "$?" = "0" ];then
  81.             echo "$DATA_TIME Stop $NAME successful (MEM:$SYS_MEM_SUM CPU:$SYS_CPU_SUM LOAD:$SYS_LOAD)" | tee -a $LOG_PATH
  82.         else
  83.             echo "$DATA_TIME Stop $NAME [failed] (MEM:$SYS_MEM_SUM CPU:$SYS_CPU_SUM LOAD:$SYS_LOAD)" | tee -a $LOG_PATH
  84.             sleep 3
  85.             pkill $NAME
  86.         fi
  87.         /etc/init.d/$NAME start
  88.         if [ "$?" = "0" ];then
  89.             echo "$DATA_TIME Start $NAME successful" | tee -a $LOG_PATH
  90.         else
  91.             echo "$DATA_TIME Start $NAME [failed]" | tee -a $LOG_PATH
  92.             echo "$DATA_TIME Start $NAME failed" | mail -s "Start $NAME failed" $EMAIL
  93.         fi
  94.     fi
  95.  
  96. done

脚本内容不难理解,原理解释可参考《Linux 进程自动监控shell脚本》
本文来自:
https://wangyan.org/blog/sys-mon-shell-script.html
http://www.sky.la/2010/12/linux-automatic-recovery-of-resources-to-handle-the-load-monitor-script.html

如果觉得我的文章对您有用,请随意赞赏。您的支持将鼓励我继续创作!

发表评论

电子邮件地址不会被公开。 必填项已用*标注

< >