Saturday, March 19, 2011

DBA at Snake Mobile

Please enjoy a blog about an Oracle DBA’s typical day and how he reacts to the issues. Characters and situation in this story are fictitious. If you see any resemblance it is just coincidental and is used for educational purpose.

It was another Thursday morning at Snake Mobile and Oracle DBAs were enjoying Chai with their colleagues at Chandni Chowk Café in their company Cafeteria. Then all of a sudden Manmeet Singh got a call from his manager, Bothering Gupta about an unhappy email from an end user. He explained that one of the customer report is running slow and he want Singh to look into it.Manmeet started gathering all his tools and sticky notes to take a look at what is happening within the database. He IMed the customer to get details about the job and what time it runs daily. He also enquired about expected run time and last run time. Customer informed him that it use to finish in 30 minutes, but in last 3 days it is taking 1 hr 15 minutes to complete.

Using Oracle Grid Control Manmeet found the sql query and its SQL_ID. The sql was a 170 line complicated query with UNION, left and inner joins with several sub selects and inline views. He grabbed another cup of newly brewed Charbucks Coffee from the coffee machine to get over the shock of this tsunami sql.

He tried to see if this sql has more than 1 execution plan. So he used following sql
SQL > select * from table(dbms_xplan.display_awr('csu1z5zqf7sbc')); -- csu1z5zqf7sbc is the SQL_ID

This query listed details of all execution plans available with plan hash value in a nicely formatted way. He found 3 different execution plans for the above sql. But he is not sure which one is the good one and which one is used by system now. So he need to look at some other view. So some more queries from his arsenal. He need to find SNAP_ID, query execution time, number of records processed

SQL> select a.snap_id,a.begin_interval_time,b.executions_total,b.executions_delta,b.elapsed_time_delta,
round((elapsed_time_delta / 1000 / executions_delta),3) avgelapsetimems,
b.rows_processed_total,b.rows_processed_delta,
round((cpu_time_delta / 1000 / executions_delta),3) avgcputimems
from sys.wrh$_sqlstat b, sys.wrm$_snapshot a
where executions_delta > 0 and a.snap_id = b.snap_id and a.dbid = b.dbid
and b.instance_number=a.instance_number
and sql_id ='csu1z5zqf7sbc' and a.begin_interval_time > sysdate - 10
order by a.snap_id,a.begin_interval_time;

SQL> select distinct dbid, sql_id, plan_hash_value from dba_hist_sql_plan where sql_id='csu1z5zqf7sbc';
DBID SQL_ID PLAN_HASH_VALUE
---------- ------------- ---------------
2797229286 csu1z5zqf7sbc 1007430168
2797229286 csu1z5zqf7sbc 1942751973
2797229286 csu1z5zqf7sbc 3495545440
SQL> select distinct a.snap_id,a.plan_hash_value from sys.wrh$_sql_plan a, v$database b where a.dbid = b.dbid and a.sql_id = 'csu1z5zqf7sbc';
SNAP_ID PLAN_HASH_VALUE
---------- ---------------

16850 1007430168
16970 1942751973
17114 3495545440

SQL> select * from sys.wrm$_snapshot where snap_id in (16850,16970,17114);

SNAP_ID DBID INSTANCE_NUMBER STARTUP_TIME
---------- ---------- --------------- ---------------------------------------------------------------------------
BEGIN_INTERVAL_TIME END_INTERVAL_TIME
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
FLUSH_ELAPSED SNAP_LEVEL STATUS ERROR_COUNT BL_MOVED SNAP_FLAG
--------------------------------------------------------------------------- ---------- ---------- ----------- ---------- ----------
16850 2797229286 1 22-JUL-10 10.56.01.000 PM
07-MAR-11 07.00.15.751 PM 07-MAR-11 08.00.28.204 PM
+00000 00:00:17.8 1 0 0 0 0
16970 2797229286 1 22-JUL-10 10.56.01.000 PM
12-MAR-11 07.00.31.943 PM 12-MAR-11 08.00.44.143 PM
+00000 00:00:15.9 1 0 0 0 0

He noticed that something has changed with statistics for the tables used in this complicated query in last 10 days. He can find current values of statistics from DBA_TABLES view. But what about history. He used following query to find that detail. He decided to use a bind variable in his query as there are 6 separate tables involved in the slow running sql from same schema.

SQL> select a.savtime,a.flags,a.rowcnt,a.blkcnt,a.avgrln,a.samplesize,to_char(a.analyzetime,'DD-MON-YY HH24:MI:SS') as anlzetime from sys.WRI$_OPTSTAT_tab_history a, dba_objects b where a.obj#=b.object_id and b.owner='TCSDBOWNER' and b.object_name='&table_name' order by 1;
Enter value for table_name: EXTRA_FIELD

Hurray, he found a sudden change in ROWCNT value for 2 of the big tables used in the query. That explains the reason for slowness.

BONUS QUESTION: Manmeet Singh is planning to lock table stats with the statistics values which gave him a good execution plan. Will it work and is it a good approach?

No comments:

Post a Comment