Writer Intr:Scott Hayes is president of Database-Guys Inc., a performance tools and consulting company specializing in DB2 UDB on Unix platforms. He is also a member of the IBM DB2 Gold Consultant Group and a DB2 UDB Advanced Certified Technical Expert. Scott has worked with DB2/AIX since V1, and his diverse experiences have taught him a number of DB2 UDB best practices. A frequent speaker at IDUG and DB2 technical conferences, Scott has published a number of articles in DB2-related magazines.
Database performance woes? Don't panic. This database first-aid plan will help you find and repair almost any performance problem.
The Top 10 At A Glance |
1. Use sufficient agents for the workload. 2. Do not allow DB2 to needlessly close and open files. 3. Do not allow extended lock waits. 4. Ensure parallel I/O capabilities to the database TEMPSPACE tablespace. 5. Manage DB2 sort memory conservatively and don't mask sort problems with large SORTHEAP s. 6. Analyze table aclearcase/" target="_blank" >ccess activity and identify tables with unusually high rows read per transaction or overflow counts. 7. Analyze the performance characteristics of each tablespace, and seek to improve the performance of the tablespaces with the slowest read times, longest write times, highest physical I/O read rates, worst hit ratios, and access attributes that are inconsistent with expectations. 8. Create multiple buffer pools, and make purposeful assignments of tablespaces to buffer pools such that access attributes are shared. 9. Examine DB2 UDB SQL Event Monitor information to discover which SQL statements are consuming the largest proportions of computing resources, and take corrective actions. 10. Reevaluate configuration and physical design settings once high cost SQL is eliminated. |
Every few weeks or so, we get performance calls from DBAs in distress. "Our Web site is crawling," they bemoan. "We're losing customers, and the situation is dire. Can you help?" To answer these calls, I've developed an analytics process for my consulting company that lets us rapidly find the cause of performance problems and develop remedial actions and tuning suggestions. Rarely do these callers inquire about fees or costs - they're only interested in stopping the bleeding. When a DB2 or e-business application isn't performing up to expectations, the entire organization and financial bottom line measurably suffers.
To help DB2 DBAs avoid performance fires and achieve high performance on their own, I've summarized a troubleshooting process for our clients, customers, and fellow DB2 professionals. The top 10 performance tips for e-business OLTP applications in DB2 UDB for Unix, Windows, and OS/2 environments are:
Make sure the monitor switches are turned on. If they aren't, you won't have access to the performance information you need. To turn the monitor switches on, issue the command:
db2 "update monitor switches using lock ON sort ON bufferpool ON uow ON table ON statement ON"
Make sure there are enough DB2 agents to handle the workload. To find out, issue the command
db2 "get snapshot for database manager"
and look for the following lines:
High water mark for agents registered = 7 High water mark for agents waiting for a token = 0 Agents registered= 7 Agents waiting for a token= 0 Idle agents= 5 Agents assigned from pool= 158 Agents created from empty Pool = 7 Agents stolen from another application= 0 High water mark for coordinating agents= 7 Max agents overflow= 0
If you find either Agents waiting for a token or Agents stolen from another application , increase the number of agents available to the database manager ( MAXAGENTS and/or MAX_COORDAGENTS as appropriate).
DB2 tries to be a good citizen within the constraints of operating system resources. One of its "good citizen" acts is putting a ceiling, or upper limit, on the maximum number of files open at any one time. The MAXFILOP database configuration parameter stipulates the maximum number of files that DB2 can have open concurrently. After it reaches that point, DB2 will start closing and opening its tablespace files (including raw devices). Opening and closing files slows SQL response times and burns CPU cycles. To find out if DB2 is closing files, issue the command
db2 "get snapshot for database on DBNAME"
and look for the line that reads:
Database files closed = 0
If files are being closed, increase the value of MAXFILOP until the opening and closing stops. Use the command:
db2 "update db cfg for DBNAME using MAXFILOP N"
The default value for LOCKTIMEOUT is -1 , which means that there will be no lock timeouts - a situation that can be catastrophic for OLTP applications. Nevertheless, I all too frequently find many DB2 users with LOCKTIMEOUT = -1 . Set LOCKTIMEOUT to a very short value, such as 10 or 15 seconds. Waiting on locks for extended periods of time can have an avalanche effect on locks.
First, check the value of LOCKTIMEOUT with this command
db2 "get db cfg for DBNAME"
and look for the line containing this text:
Lock timeout (sec) (LOCKTIMEOUT) = -1
If the value is -1 , consider changing it to 15 seconds by using the following command (be sure to consult with the application developers or your vendor first to make sure the application is prepared to handle lock timeouts):
db2 "update db cfg for DBNAME using LOCKTIMEOUT 15"
You should also monitor the number of lock waits, lock wait time, and amount of lock list memory in use. Issue the command:
db2 "get snapshot for database on DBNAME"
Look for the following lines:
Locks held currently= 0 Lock waits= 0 Time database waited on locks (ms)= 0 Lock list memory in use (Bytes)= 576 Deadlocks detected= 0 Lock escalations= 0 Exclusive lock escalations= 0 Agents currently waiting on locks= 0 Lock Timeouts= 0
If the Lock list memory in use (Bytes) exceeds 50 percent of the defined LOCKLIST size, then increase the number of 4K pages in the LOCKLIST database configuration.
Temporary tablespaces should have at least three containers on three different disk drives in order to help DB2 perform parallel I/O and improve performance for sorts, hash joins, and other database activities that use TEMPSPACE .
To find out how many containers your temporary tablespace has, issue the command:
db2 "list tablespaces show detail"
Look for the TEMPSPACE tablespace definition similar to this example:
Tablespace ID= 1 Name= TEMPSPACE1 Type= System managed space Contents= Temporary data State= 0x0000 Detailed explanation: Normal Total pages= 1 Useable pages= 1 Used pages= 1 Free pages= Not applicable High water mark (pages)= Not applicable Page size (bytes)= 4096 Extent size (pages)= 32 Prefetch size (pages)= 96 Number of containers= 3
Notice that Number of containers has the value 3, and that Prefetch size is three times Extent size . For best parallel I/O performance, it is important for Prefetch size to be a multiple of Extent size . The multiple should be equal to the number of containers.
To find the definitions for the containers, issue the command:
db2 "list tablespace containers for 1 show detail"
The 1 refers to tablespace ID #1 , which is TEMPSPACE1 in the example just given.
OLTP applications should not be performing large sorts. They are too costly in terms of CPU, I/O, and elapsed time and will slow down any OLTP application. Therefore, the default SORTHEAP size of 256 4K pages (1MB) should be more than adequate. You should also know the number of sort overflows and the number of sorts per transaction.
Issue the command
Db2 "get snapshot for database on DBNAME"
and look for the following lines:
Total sort heap allocated= 0 Total sorts = 1 Total sort time (ms)= 8 Sort overflows = 0 Active sorts = 0 Commit statements attempted = 3 Rollback statements attempted = 0 Let transactions = Commit statements attempted + Rollback statements attempted Let SortsPerTX= Total sorts / transactions Let PercentSortOverflows = Sort overflows * 100 / Total sorts
If PercentSortOverflows ((Sort overflows * 100) / Total sorts ) is greater than 3 percent, there may be serious and unexpected sort problems in the application SQL. Because the very presence of overflows indicates that large sorts are occurring, finding zero sort overflows, or at least a percentage less than one, would be ideal.
If excessive sort overflows are present, the "band aid" solution is to increase the size of SORTHEAP . However, doing so only masks the real performance problem. Instead, you should identify the SQL that is causing the sorts and change the SQL, indexes, or clustering to avoid or reduce the sort cost.
If SortsPerTX is greater than 5 (as a rule of thumb), the number of sorts per transaction may be high. Some application transactions perform dozens of small composite sorts (which do not overflow and have very short durations), but consume excessive CPU. When SortsPerTX is high, my experience indicates that these machines are typically CPU bound. Identifying the SQL that is causing the sorts and improving the access plans (via indexes, clustering, or SQL changes) is paramount to improving transaction throughput rates.
For each table, identify how many rows DB2 is reading for each transaction. You must issue two commands:
1. db2 "get snapshot for database on DBNAME"
2. db2 "get snapshot for tables on DBNAME"
After you issue the first command, determine how many transactions have occurred (by taking the sum of Commit statements attempted plus Rollback statements attempted - see tip 3).
After issuing the second command, divide the number of rows read by the number of transactions ( RowsPerTX ). OLTP applications should typically read one to 20 rows from each table per transaction. If you discover that hundreds or thousands of rows are being read for each transaction, scans are taking place and indexes may need to be created. (Sometimes simply running runstats with distribution and detailed indexes all provides a cure.)
Sample output from "get snapshot for tables on DBNAME" follows:
Snapshot timestamp = 09-25-2000 4:47:09.970811 Database name= DGIDB Database path= /fs/inst1/inst1/NODE0000/SQL00001/ Input database alias= DGIDB Number of accessed tables= 8 Table List Table Schema= INST1 Table Name= DGI_ SALES_ LOGS_TB Table Type= User Rows Written= 0 Rows Read= 98857 Overflows= 0 Page Reorgs= 0
A high number of Overflows probably means you need to reorganize the table. Overflows occur when DB2 must locate a row on a suboptimal page due to a change in a row's width.
A tablespace snapshot can be extremely valuable to understanding what data is being accessed and how. To get one, issue the command:
db2 "get snapshot for tablespaces on DBNAME"
For each tablespace, answer the following questions:
For all tablespaces, answer the following questions:
For each tablespace, make sure that the prefetch size is equal to the extent size multiplied by the number of containers. Issue the command:
db2 "list tablespaces show detail"
The prefetch size can be altered for a given tablespace if necessary. Container definitions can be checked by using the command
db2 "list tablespace containers for N show detail"
in which N is the tablespace ID number.
All too often I find DB2 UDB sites where the machines have 2, 4, or 8GB of memory, yet the DB2 database has one buffer pool, IBMDEFAULTBP , which is only 16MB in size!
If this is the case at your site, create a buffer pool for the SYSCATSPACE catalog tablespace, one for the TEMPSPACE tablespace, and at least two more buffer pools: BP_RAND and BP_SEQ . Tablespaces that are accessed randomly should be assigned to a buffer pool with random objectives, BP_RAND. Tablespaces that are accessed sequentially (with asynchronous prefetch I/O) should be assigned to a buffer pool with sequential objectives, BP_SEQ . You can create additional buffer pools depending on performance objectives for certain transactions; for example, you could make a buffer pool large enough to store an entire "hot," or very frequently accessed, table. When large tables are involved, some DB2 users have great success placing the indexes for important tables into an index, BP_IX , buffer pool.
Buffer pools that are too small result in excessive, unnecessary, physical I/O. Buffer pools that are too large put a system at risk for operating system paging and consume unnecessary CPU cycles managing the overallocated memory. Somewhere between "too small" and "too large" lies the size that is just right. The right size exists where the point of diminishing returns is reached. If you're not using a tool to automate the diminishing returns analysis, you should scientifically test buffer pool performance (hit ratios, I/O times, physical I/O read rates) at incremental sizes until an optimum size is reached. Because businesses constantly change and grow, the "optimum size" decision should be reevaluated periodically.
One bad SQL statement can ruin your whole day. Time and time again I've seen a single, relatively simple SQL statement bring a finely tuned database and machine to its knees. For many of these statements, there isn't a DB2 UDB configuration parameter under the sun (or in the doc) that can make right the high cost of an errant SQL statement.
Making matters worse, the DBA's hands are frequently tied: You can't change the SQL can because it's provided by an application vendor (such as SAP , PeopleSoft , or Siebel ). This leaves the DBA three courses of action:
1. Change or add indexes
2. Change clustering
3. Change catalog statistics.
What's more, today's robust applications are made up of hundreds or thousands of different SQL statements. These statements are executed at varying rates of frequency depending on application functionality and the business needs du jour. A SQL statement's true cost is the resource cost to execute it once multiplied by the number of times it is executed.
The monumental task that confronts each DBA is the challenge of identifying the SQL statements with the highest "true cost," and working to reduce the costs of these statements.
You can find out the resource cost to execute a SQL statement once from native DB2 Explain utilities, a number of tools from third-party vendors, or the DB2 UDB SQL Event Monitor data. But the frequency of statement execution can only be learned through careful and time-consuming analysis of DB2 UDB SQL Event Monitor data.
In researching problem SQL statements, the standard procedure used by DBAs is:
1. Create an SQL Event Monitor, write to file:
$> db2 "create event monitor SQLCOST for statements write to ..."
2. Activate the event monitor (be sure ample free disk space is available):
$> db2 "set event monitor SQLCOST state = 1"
3. Let the application run.
4. Deactivate the event monitor:
$> db2 "set event monitor SQLCOST state = 0"
5. Use the DB2-supplied db2evmon tool to format the raw SQL Event Monitor data (hundreds of megabytes of free disk space may be required depending on SQL throughput rates):
$> db2evmon -db DBNAME -evm SQLCOST > sqltrace.txt
6. Browse through the formatted file scanning for unusually large cost numbers, a time-consuming process:
$> more sqltrace.txt
7. Undertake a more complete analysis of the formatted file that attempts to identify unique statements (independent of literal values), each unique statement's frequency (how many times it occurred), and the aggregate of its total CPU, sort, and other resource costs. Such a thorough analysis could take a week or more on just a 30-minute sample of application SQL activity.
To reduce the time it takes to identify SQL statements with high costs, you can consider many sources of available information:
Optimum performance requires not only eliminating high cost SQL statements, but also making sure that appropriate physical infrastructures are in place. Peak performance results when all the tuning knobs are set just right, memory is allocated to pools and heaps effectively, and I/O is evenly balanced across disks. Although it takes time to measure and make adjustments, the DBA who performs these 10 suggestions will be very successful at satisfying internal and external DB2 customers. As e-businesses change and grow, even the best-administered database will need regular fine-tuning. The DBA's job is never done!