undo tablespace keeps growing, why can't I reclaim space?

Discussion:

(too old to reply)

h***@hotmail.com

2005-08-13 20:40:39 UTC

Hi,

ora 9.2.06, AIX 5.2 ML06, IBM Power5, 1 CPU, EMC storage

We have a transaction which consumes about 2-3 gigs undo space in our
test environment. When we run the batch in production, the transaction
was pretty hungry for undo space; it needed more than 40 gigs...Strange
indeed. I checked v$rollstat during the transaction, and the largest
undo segment never grew beyond 3 gigagytes, just as I expected. Why the
continuing claims for more undo??

The job consists of the following building blocks: (sort of a
destructive load)
truncate target_table
insert into target_table (select * from source_join /* (without append
hints)*/
commit;
update target_table set column_x where column_y = some_value;
commit;

I noticed that the IO operations were very slow on the machine (the
infamous EAGAIN message "performance degradation may be seen" showed up
in lgwr and dbw0 traces).

Now, here's my question: could the slow i/o have something to do with
the continuing claims for new undo space in the undo tablespace? Could
it be that, due to the slow io, SMON just could not find the time to
tidy up a bit?

Now that the IO issue is fixed (we set up Concurrent IO), I have to be
sure that this problem doesn't show up again.

BTW: undo_retention = 1200. I'm pretty sure this is irrelevant,though.

Any thoughts on this would be very much appreciated.

I know the tricks and techniques to limit the amount of undo and/or
redo generation. The goal of this question is to gather more wisdom
about transaction processing and the decisions made by Oracle in the
process of freeing undo space.

Thanks in advance.

Sybrand Bakker

2005-08-13 21:51:04 UTC

Permalink

Post by h***@hotmail.com
Now, here's my question: could the slow i/o have something to do with
the continuing claims for new undo space in the undo tablespace? Could
it be that, due to the slow io, SMON just could not find the time to
tidy up a bit?

You need to use v$waitstat to monitor what your process is waiting
for.
Provided you didn't set up dictionary managed tablespaces your
assumption is highly unlikely and probably just a shot in the dark.

Other than that, one could question why you don't merge the insert and
update into one statement, assuming the *real* update statement (which
you obviously didn't disclose) is as simple as the pseudo statement
you provided.

--
Sybrand Bakker, Senior Oracle DBA

h***@hotmail.com

2005-08-14 16:07:16 UTC

Permalink

Post by Sybrand Bakker
You need to use v$waitstat to monitor what your process is waiting
for.

I have never used v$waitstat to see what is going on with undo
segments, so please clarify.

Post by Sybrand Bakker
Provided you didn't set up dictionary managed tablespaces your
assumption is highly unlikely and probably just a shot in the dark.

Yes, it was a shot in the dark, therefore I asked for the internals of
undo segment handling, to get get a better understanding of what was
going on, an explanation of the difference between the bytes reported
in v$rollstat and dba_segments.

Post by Sybrand Bakker
Other than that, one could question why you don't merge the insert and
update into one statement, assuming the *real* update statement (which
you obviously didn't disclose) is as simple as the pseudo statement
you provided.

Thanks for the hint, but I know how to work around this problem.
Indeed, the *real* update is not as simple.

Sybrand Bakker

2005-08-14 17:34:06 UTC

Permalink

Post by h***@hotmail.com

Post by Sybrand Bakker
You need to use v$waitstat to monitor what your process is waiting
for.

I have never used v$waitstat to see what is going on with undo
segments, so please clarify.

Post by Sybrand Bakker
Provided you didn't set up dictionary managed tablespaces your
assumption is highly unlikely and probably just a shot in the dark.

Thanks for the hint, but I know how to work around this problem.
Indeed, the *real* update is not as simple.

First of all: I don't agree with your analysis the undo segments are
the cause of your problem.
The only way you can tell that is by querying v$waitstat, which will
you show you what the session is waiting for. Might be undo segment,
but might be not. Basically your claim is unsubstantiated.

Secondly using v$waitstat you would see whether your session is
waiting for undo segment headers and/or undo segment blocks.

Thirdly: undo segments were designed to work like temporary segments.
Ie they never shrink. This is to be expected as Oracle and several
Oracle gurus have been advising on not using the optimal clause of the
rollback segment for many years, for the reason extent allocation and
deallocation is an expensive operation. As disk is cheap nowadays, the
space allocated usually shouldn't be a problem.

Fourthly: your transaction may show an inefficient execution path, or
(implicitly) modify many, many indexes. As you are pretty scarce on
details, crystall balls would be needed to determine why you
transaction is 40G. Oracle however, will online additional undo
segments as needed, and those actions are being logged in the alert
log.

Your query about the relationship between v$rollstat and dba_segments
can, as you didn't provide any specifics, not be answered.

Finally: my impression is you set up undo segments and forget about
them. Any attempt to try to be 'smarter' than Oracle is probably
likely to succeed in some circumstances, and fail in other.

--
Sybrand Bakker, Senior Oracle DBA

h***@hotmail.com

2005-08-15 08:37:55 UTC

Permalink

Post by Sybrand Bakker
First of all: I don't agree with your analysis the undo segments are
the cause of your problem.

If you've read carefully, you would conclude that is not my analysis.
I'm looking for an explanation for the excessive use of undo space.
When you take a look at the statspack snippets, it's pretty hard to
deny a problem with IO.
(snapshot taken of a 2hr period)

Total Wait Avgwait
Waits
Event Waits Timeouts Time (s) (ms)
/txn
---------------------------- ------------ ---------- ---------- ------
--------
log file sync 5,050,014 1,063 4,994 1
3,942.2
db file parallel write 20,083 0 737 37
15.7
db file sequential read 200,944 0 626 3
156.9
control file parallel write 11,128 0 523 47
8.7
log file parallel write 5,055,040 0 387 0
3,946.2
db file scattered read 35,818 0 249 7
28.0
async disk IO 40,930 0 225 5
32.0
direct path read 17,884 0 85 5
14.0

Given the number of commits (1277) you must at least share the part of
my analysis that there *is* an IO problem.

Post by Sybrand Bakker
Might be undo segment, but might be not. Basically your claim is
unsubstantiated.

What claim? My only claim is that the batch consumes an unacceptable
amount of undo AND that there is an IO problem. I wonder whether those
symptoms *could* be related. I didn't say they are related.

Post by Sybrand Bakker
Thirdly: undo segments were designed to work like temporary segments.
Ie they never shrink. This is to be expected as Oracle and several
Oracle gurus have been advising on not using the optimal clause of the
rollback segment for many years, for the reason extent allocation and
deallocation is an expensive operation.

Have you ever read "practical Oracle 8i", by Jonathan Lewis? Page 146
clearly states that you should aim to keep the optimal size as small as
possible.
But that's off-topic. BTW: automatic undo management won't let you use
the optimal clause.

As disk is cheap nowadays, the

Post by Sybrand Bakker
space allocated usually shouldn't be a problem.

Space IS a problem, because undo tablespaces need backing up.

Post by Sybrand Bakker
Fourthly: your transaction may show an inefficient execution path, or
(implicitly) modify many, many indexes. As you are pretty scarce on
details, crystall balls would be needed to determine why you
transaction is 40G. Oracle however, will online additional undo
segments as needed, and those actions are being logged in the alert
log.

Only difference is the machine and the IO characteristics. Database
layout, tables and indexes are all the same, so I don't see the need
for crystal balls. I've never seen a transaction span multiple undo
segments.

Post by Sybrand Bakker
Your query about the relationship between v$rollstat and dba_segments
can, as you didn't provide any specifics, not be answered.

When I query v$rollstat and Oracle reports 3g from column "rssize",
then why does dba_segments report 41g as the size of that particular
segment? I used v$rollstat.usn to get to dba_rollback_segs.segment_name
to get to dba_segments.bytes.

Post by Sybrand Bakker
Finally: my impression is you set up undo segments and forget about
them. Any attempt to try to be 'smarter' than Oracle is probably
likely to succeed in some circumstances, and fail in other.

Assumption is the mother of all screwups...

By now, it must be clear to you that I'm not trying to solve a problem,
I'm trying to obtain knowledge about extension and shrinkage of
automatically managed undo segments. Under what circumstances does
Oracle postpone or cancel freeing space from those segments?

HTH,
Hans

hpuxrac

2005-08-15 13:46:50 UTC

Permalink

Hans I tried but couldn't follow all of your explanation.

If you have more than one commit, each commit causes the beginning of a
new transaction.

Each new transaction can go to a different undo segment.

Do you have the same volume of data in test and production? Do you
have the same indexes in place?

It seems unlikely that something that takes 2-3 gig in test will take
40 gig in production unless there are big differences in your
environment that don't seeem clear to me yet.

Arun Mathur

2005-08-15 14:26:46 UTC

Permalink

Hi Hans,

Instead of an insert followed by an update, why not do something like
this:

1) create table target_table_copy as (<select clause to include the
update and inserts>)
2) apply any necessary constraints to target_table_copy
3) drop target_table
4) rename target_table_copy to target_table

Regards,
Arun

h***@hotmail.com

2005-08-15 16:14:05 UTC

Permalink

@hpuxrac: I don't follow the logic either...You're right, I should have
been more specific. The complete batch comprises of three transactions.
Those 3 transactions consume 3g in PAT. That same batch w/ the same
three transactions w/ the same data and indexes, everything the same
except on PROD, comsumes 41G.

The only difference is, that during the run on production, I increased
undo, because it was runningout of space. As per a contribution in
another thread in by mr. Howard J Rogers, I have learned that, when
there is space left in the undo tablespace, Oracle will claim it.

@Arun Mathur: thanks for your suggestion, but, again, I know how to
avoid the problem; direct path insert should do the job. It reportedly
generates not as much undo.

What's done, is done. I cannot rerun the batch for quite some time, but
I would like to know the mechanics and internals used by Oracle when it
decides to reclaim space from existing undo segments.

Perhaps a deep look into the good old concepts manual could shed some
light on this issue.

h***@hotmail.com

2005-08-17 20:07:31 UTC

Permalink

***@hotmail.com schreef:

Well guys (and girls), strange as it is: now that the IO system has
been configured properly, the same batch ( I repeat: the exact same
batch) runs in 50 minutes and consumes less than 4 gigs of undo space.

My theory is: Oracle will use not reuse undo space when there is space
available. Also, when IO due to cleaning up of undo space doesn't get
through or simply is not performed fast enough, Oracle will postpone
these activities until some time in the future.

While I write this down, I'm having more difficulties believing this.

Open invitation on the real guru's: please explain how this mechanism
works.

Daniel Fink

2005-08-17 20:23:23 UTC

Permalink

Post by h***@hotmail.com
Well guys (and girls), strange as it is: now that the IO system has
been configured properly, the same batch ( I repeat: the exact same
batch) runs in 50 minutes and consumes less than 4 gigs of undo space.
My theory is: Oracle will use not reuse undo space when there is space
available. Also, when IO due to cleaning up of undo space doesn't get
through or simply is not performed fast enough, Oracle will postpone
these activities until some time in the future.
While I write this down, I'm having more difficulties believing this.
Open invitation on the real guru's: please explain how this mechanism
works.

It has been awhile since I dove under the covers of AUM, but I'll see
what I can remember and what I can theorize.

The space management algorithm for AUM is to use available unallocated
space before reusing space from it's own segment or another segment
(extent stealing). This is one reason why autoextend on UNDO tablespace
datafiles and a high undo_retention setting can be 'dangerous'.

SMON is responsible for cleaning up the UNDO tablespace, offlining
segments that are not needed, updating expiry tables (internal
structures), etc. It has been the contention of several reasonably
smart people that on some systems SMON gets bogged down with all of
it's chores and can't keep up. I've personally never seen an I/O
subsystem issue cause this, but it is a possibility.

I'm not sure I would use the term postpone, though. I think that SMON
does part of the work, then has to do some other work. Eventually it
gets back to the original work and completes it. This probably happens
within seconds or minutes. Unfortunately, more work keeps piling up...

I think there is also a bug where SMON is cleaning up a segment at the
same time a user session is assigned to it. At this point, it goes
'pear shaped' to quote a friend, and may crash the instance. I can't
recall the exact bug, though.

AUM introduces additional work and complexity for SMON. It seems when
SMON is stressed with other operations, AUM can become very
problematic.

Cheers,
Dan

Doug

2005-08-17 22:01:20 UTC

Permalink

"I think there is also a bug where SMON is cleaning up a segment at the
same time a user session is assigned to it. At this point, it goes
'pear shaped' to quote a friend, and may crash the instance. I can't
recall the exact bug, though. "

I hit this problem - see

http://groups-beta.google.com/group/comp.databases.oracle.server/browse_frm/thread/bf37c94eec10d729/1306bb3cc568faee?lnk=st&q=Doug+Burns+SMU&rnum=1#1306bb3cc568faee

and

http://doug.burns.tripod.com/oracle/index.blog?entry_id=1076486

Cheers,

Doug

h***@hotmail.com

2005-08-18 07:25:13 UTC

Permalink

Post by Daniel Fink
I've personally never seen an I/O
subsystem issue cause this, but it is a possibility.

There's a first time for evertything.

Thanks for the explanation, Dan.

Sybrand Bakker

2005-08-17 20:31:02 UTC

Permalink

Post by h***@hotmail.com
Open invitation on the real guru's: please explain how this mechanism
works.

The unneeded invective is quite clear.
Please rest assured I won't address any other of your posts, as you
appear to claim to know everything better.

--
Sybrand Bakker, Senior Oracle DBA

h***@hotmail.com

2005-08-18 07:51:24 UTC

Permalink

@sybrand:
where did I "appear to claim to know everything better"?
You shouldn't take all posts personally....

It is not that I don't value your opinion. At times, when I succeed in
ignoring the negative energy your posts drown in, they contain useful
information. In this thread however, they don't (especially the
assertion that I'm trying to outsmart Oracle is unfounded, totally
ridiculous and unnecessary).

That's all.

@doug: Thx, useful links.