sar 命令的详解，E文的~~

发布: 2007-6-08 22:43 | 作者: seanhe | 来源: | 查看: 87次 | 进入软件测试论坛讨论

sar -- system activity reporter
sar(ADM) provides information that can help you understand how system resources are being used on your system. This information can help you solve and avoid serious performance problems on your system.

The individual sar options are described on the sar(ADM) manual page.

For systems with an SCO SMP License, mpsar(ADM) reports systemwide statistics, and cpusar(ADM) reports per-CPU statistics.

The following table summarizes the functionality of each sar, mpsar, and cpusar option that reports an aspect of system activity:

sar, cpusar, and mpsar options

Option  Activity reported
-a  file access operations
-A  summarize all reports
-b  buffer cache
-B  copy buffers
-c  system calls
-d  block devices including disks and all SCSI peripherals
-F  floating point activity (mpsar only)
-g  serial I/O including overflows and character block usage
-h  scatter-gather and physical transfer buffers
-I  inter-CPU interrupts (cpusar and mpsar only)
-j  interrupts serviced per CPU (cpusar only)
-L  latches
-m  System V message queue and semaphores
-n  namei cache
-O  asynchronous I/O (AIO)
-p  paging
-q  run and swap queues
-Q  processes locked to CPUs (cpusar and mpsar only)
-r  unused memory and swap
-R  process scheduling
-S  SCSI request blocks
-u  CPU utilization (default option for all sar commands)
-v  kernel tables
-w  paging and context switching
-y  terminal driver including hardware interrupts

http://docsrv.caldera.com:8507/usr/share/doc/OpenServer/en/PERFORM/tool_sar.html

answer 回复于：2003-06-11 13:05:28

How do I quickly determine the cause of a performance problem?

--------------------------------------------------------------------------------

Keywords
openserver 5.0.0 5.0.2 5.0.4 5.0.5 5.0.6 5.0 performance problem troubleshoot bottleneck sar slow system guide activity memory CPU time wio usr bottleneck uw7 unixware unixware7 uw ou ou8 openunix openunix8 711 7.1.1 caldera 713 7.1.3 analysis analyse
Release
SCO OpenServer Enterprise System Release 5.0.5, 5.0.6, 5.0.7
SCO OpenServer Desktop System Release 5.0.5, 5.0.6, 5.0.7
SCO OpenServer Enterprise System Release 5.0.2, 5.0.4
SCO OpenServer Desktop System Release 5.0.2, 5.0.4
UnixWare 7 Release 7.1.1, 7.1.3
Caldera Open UNIX Release 8.0.0

Problem
My system is behavior is slow. How do I quickly determine the most likely cause of the bottleneck?

Solution
The following is a two-step guide to quickly finding the bottleneck on a system with slow performance.

Note
Running two sar commands as a "before and after" test, (once before the system slowdown and once during the system slowdown), will make it easier to see differences in sar data.
Characters after the '#' sign indicate commands to be run.

STEP 1
------

Check the general system activity:

# sar 1 5
09:35:13    %usr    %sys    %wio   %idle (-u)
09:35:14      17       0       0      83
09:35:15       5       0       0      95
09:35:16       5       0       0      95
09:35:17       5       0       0      95
09:35:18       5       1       0      94

Average        7       0       0      92

This command should always be run first to get a general idea of the primary
location of the bottleneck.

If %usr is high, then the system is waiting for user commands to finish
(i.e. sort, data gathering/processing programs).

CAUSES: Non-interactive programs running unnecessarily at peak hours,
slow CPU, not enough CPUs, unnecessary programs running, inefficient
third party progams running, daemons processing data, bad nice values.

If %sys is high, then the system is waiting for kernel driver calls to
complete (i.e. hardware issues, spurious interrupts, third party drivers).

CAUSES: Inefficient third party drivers, bad hardware causing spurious
interrupts, slow CPU, not enough CPUs.

If BOTH %usr and %sys are high, then the system is waiting for all types
of system calls, whether they are user generated or kernel generated.

CAUSES: Slow CPU, not enough CPUs.

If %wio is high, the system is waiting for the disk subsystem to retrieve data.

CAUSES: Not enough disk cache (NBUF/NHBUF), slow hard drive system, not
enough memory, memory leak from process, process grabbing too much memory.

STEP 2
------

IF %usr IS HIGH:

1) Check for processes consuming too much CPU time:

# ps -el | more
  F S    UID   PID  PPID  C PRI NI     ADDR   SZ  TTY       TIME CMD
71 S      0     0     0  0  95 20 fb117000    0    ?   00:00:01 sched
20 S      0     1     0  0  66 20 fb117158  148    ?   00:00:00 init
.
.
.
20 S      0   347     1  0  76 24 fb119db0  312    ?   00:00:00 snmpd
20 S     17   349     1  1  66 20 fb119f08  156    ?   01:05:53 deliver
20 S      0   413   410  0  75 20 fb11a060  128    ?   00:00:00 lockd

(WCHAN removed to fit in screen)
Check the C and TIME values. If the TIME value (minutes:seconds:100ths of
a second) is unusually high and C is positive for a specific process, then
that process could be taxing the system.  In the above example, the deliver
daemon is processing leftover admin mail.  The mail can be removed.

2) Check for system call activity:

# sar -c 1 5
SCO_SV tuvok 3.2v5.0.5 i80386    06/21/2001

09:55:08 scall/s sread/s swrit/s  fork/s  exec/s  rchar/s  wchar/s (-c)
09:55:09    1216      67      12    0.99    0.99   178441     3988
09:55:10     147      31       6    0.00    0.00   168723     8421
09:55:11      74      27       4    0.00    0.00   163644     3342
09:55:12     245      37       6    0.00    0.00   171821     8928
09:55:13     151      29       4    0.00    0.00   163770     3468

Average      367      38       6    0.20    0.20   169280     5629

Check before and after for system calls, forks, execs., etc.  If system
calls are high, it indicates one or more of the following:

   - Programs are suddenly being used more actively.
   - More programs in general are being run on the system.

Use ps to check to see if this is necessary. If forks/execs, or reads/writes
are specifically high, check for programs that may be calling specific calls.

IF %sys IS HIGH:

If you have a muliprocessing system, run the following command to see if
any device is sending thousands of interrupts to slow down the CPU.

For OpenServer5:

#sar -j 1 5

For UnixWare7/Open UNIX 8:

#sar -P ALL 1 5

Check programs that could be accessing the tape drive, third party smart
boards, or other non-disk drivers.

IF BOTH USR AND SYS ARE HIGH:

1) Check the run queue:

# sar -q 1 5

SCO_SV lunasco 3.2v5.0.4 Pentium    06/21/2001

10:46:29 runq-sz %runocc swpq-sz %swpocc (-q)
10:46:30     3.0     100
10:46:31
10:46:32     1.0     100
10:46:33     1.0     100
10:46:34     1.0     100

Average      1.5     100

Normally the Average time should be less than 3 on a taxed system. If the
average time is constantly higher than that, the processes are not being
serviced quick enough; the CPU could be to blame.  Either increase the CPU
speed or add CPUs, if possible.

IF %wio IS HIGH:

1) Check for greedy processes:

# ps -el | more
  F S    UID   PID  PPID  C PRI NI     ADDR   SZ  TTY        TIME CMD
71 S      0     0     0  0  95 20 fb117000    0    ?    00:00:01 sched
20 S      0     1     0  0  66 20 fb117158  148    ?    00:00:00 init
71 S      0     2     0  0  95 20 fb1172b0    0    ?    00:00:00 vhand
71 S      0     3     0  0  95 20 fb117408    0    ?    00:00:16 bdflush
71 S      0     4     0  0  95 20 fb117560    0    ?    00:00:00 kmdaemon
71 S      0     5     1  0  95 20 fb1176b8    0    ?    00:00:18 htepi_daemon
.
.
.
20 S      0   252     1  0  76 20 fb118830  152    ?    00:00:00 cron
20 S      0   354     1  0  76 24 fb118988 233504    ?    00:00:03 report
20 S      0   496     1  0  76 24 fb118ae0  200    ?    00:00:00 calserver

Check the SZ value to see if a process is either grabbing too much memory or
not freeing it up when needed.  In the above example, the "report" program is
grabbing a lot of memory.

2) Check the amount of useable memory:

# sar -r 1 5

SCO_SV tuvok 3.2v5.0.5 i80386    06/21/2001

10:34:13   freemem   freeswp availrmem availsmem (-r)
10:34:14      8262    389120     28765     56421
10:34:15      8262    389120     28765     56421
10:34:16      8262    389120     28765     56421
10:34:17      8262    389120     28765     56421
10:34:18      8262    389120     28765     56421

Average       8262    389120     28765     56421

If freemem (listed in 4K pages) is below 500 and freeswap is dynamic, the
system is paging in data because it can't fit what it needs in memory.
If this is happening all the time, increase RAM.

Note
In OpenServer sar -r reports the amount of swap space on disk. In UnixWare7/Open UNIX 8, it reports the swap space in virtual memory (RAM plus swap).

3) Check the disk i/o usage:

# sar -b 1 5

SCO_SV tuvok 3.2v5.0.5 i80386    06/21/2001

10:37:17 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s (-b)
10:37:18       0       0       0       0       0       0       0       0
10:37:19       0       0       0       0       0       0       0       0
10:37:20       0      60     100       0       1     100       0       0
10:37:21       0      -1     100       0       0       0       0       0
10:37:22       0      56     100       0       1     100       0       0

Average        0      24     100       0       0     100       0       0

If %rcache is continuously < 95 and/or %wcache is < 90, then the system is
having to go to the hard drive to load the disk cache. Increase the disk
cache by increasing NBUF by 50 percent and adjusting NHBUF appropriately.

For UnixWare7/Open UNIX 8.0.0, there are the additional kernel tunables:

"FDFLUSHR", this is the interval in seconds to check the need to write the
buffer cache and file pages to disk.  The default is 1.

"NAUTOUP", this is the humber of seconds between filesystems updates.  The
default is 60 seconds.  Increasing NAUTOUP can improve performance, but also
increase the risk of data loss should a system crash occur.

Also, check "sar -d".  A %busy figure averaging > 50% can indicate a disk
bottleneck.  The "avserv" column shows service tume after a request has arrived
at the disk.  The "await" column shows the average wait time for an I/O request
to be serviced. The "avque" column shows the average length of the wait queue
for an I/O request.

If the disk performance appears to be acceptable, but you have unacceptable
levels of CPU time devoted to waiting for I/O, then you may have a memory
bottleneck.

Note
Make sure you do not increase NBUF by so much that you run out of regular general purpose memory (check with sar -r).

If neither memory nor the disk cache is a problem, check the disk i/o
system as you may need a RAID system or a faster host adapter system.

Note
In general, ensure that the server has the latest patches, host bus adaptors and .network drivers are installed available from:

          ftp://ftp.sco.com/pub

          http://www.sco.com/download

Note
There are other tools available to assist in identifying the processes and analysing the output of sar.
These are:

rtpm(1M) for UnixWare7/Open UNIX 8.0.0

"top" available from Skunkware at http://www.sco.com/skunkware

"hog" package available from Skunkware, above.

SarCheck available from http:/www.sarcheck.com

See Also
sar(ADM), vmstat(C), idtune(ADM)

      TA # 117424, How can I analyze "on-the-spot" performance data with
SarCheck?"

      TA # 114622, "How do I install and run SarCheck?"

      http://www.aplawrence.com/Unixart/slow.html

--------------------------------------------------------------------------------

http://stage.caldera.com/ns-search/TAEXT/os/116222.shtml?NS-search-set=/3ee66/aaaa002GWe66d1a&NS-doc-offset=0&

swwin 回复于：2003-06-11 13:57:16

answer，能否给翻译成中文？