Discussion:
What causes ORA-12547: TNS:lost contact ?
(too old to reply)
Stan Brown
2006-07-28 11:38:22 UTC
Permalink
Last night on an instance that has been up for over 100 days, and on a
machine with an uptime over 400 days (HP-UX 10.20 and Oracle 7.3.4.5.0,
I started getting :

ERROR: ORA-12547: TNS:lost contact

Intermitentaly on attempts to connect to the instance. I also got some
errors about TNS path? Sorry I don't have that one handy.

From an OS perspective, the machine seems healthy (not out of process table
entries or anything like that, I don't believe).

What should I look for to troubleshoot this problem?
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
Anthony
2006-07-28 17:46:51 UTC
Permalink
I remembered in the old days (meaning 3 years ago). For some versions
of Oracle, it is not behaving properly if stay up for more than one
year. Now you are only about 100 days. So should be okay. But staying
up for 400 days is long enough. I would say reboot and start from fresh
on a yearly basis is sound manangement practice.
Post by Stan Brown
Last night on an instance that has been up for over 100 days, and on a
machine with an uptime over 400 days (HP-UX 10.20 and Oracle 7.3.4.5.0,
ERROR: ORA-12547: TNS:lost contact
Intermitentaly on attempts to connect to the instance. I also got some
errors about TNS path? Sorry I don't have that one handy.
From an OS perspective, the machine seems healthy (not out of process table
entries or anything like that, I don't believe).
What should I look for to troubleshoot this problem?
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
Sybrand Bakker
2006-07-29 08:55:50 UTC
Permalink
Post by Anthony
I would say reboot and start from fresh
on a yearly basis is sound manangement practice.
So you never patch or upgrade Oracle? Now *that* is sound management
practice!! (apart from the fairy tales you have already told us)

--
Sybrand Bakker, Senior Oracle DBA
Stan Brown
2006-07-31 12:16:01 UTC
Permalink
Post by Sybrand Bakker
Post by Anthony
I would say reboot and start from fresh
on a yearly basis is sound manangement practice.
So you never patch or upgrade Oracle? Now *that* is sound management
practice!! (apart from the fairy tales you have already told us)
Actually, I only apply patches for identifiable problems with our specific,
and very static usage of Oracle.

It's also inside a tightly protected network (to preempt the security
questions).

I recognize this may not e "Best Accepted Practice". But it has worked well
for us over 10+ years, so I think it's been a fairly good plan. Also it let's
me concentrate my time on issues management precieves are more important.

BTW, I looked a ipcs on this machine, and it does not appear to be a shared
(memory et all) issue that is causing this problem

Any suggestions as to where else to look?
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
joel garry
2006-07-28 22:14:44 UTC
Permalink
Post by Stan Brown
Last night on an instance that has been up for over 100 days, and on a
machine with an uptime over 400 days (HP-UX 10.20 and Oracle 7.3.4.5.0,
ERROR: ORA-12547: TNS:lost contact
Intermitentaly on attempts to connect to the instance. I also got some
errors about TNS path? Sorry I don't have that one handy.
From an OS perspective, the machine seems healthy (not out of process table
entries or anything like that, I don't believe).
What should I look for to troubleshoot this problem?
Figure out why processes are being given the boot-rear. This is almost
certainly on the OS side. Ask on an hpux group why processes might
suddenly have trouble starting up.
Look in the syslog for symptoms of memory fragmentation. It could just
be 400 days is too many. I know, I hate the MS-think about rebooting
to fix problems, too.
Check ps for zombies or large number of processes that shouldn't be
there.
How much swap space do you have, primary and secondary? You might have
just tipped past a limit. Check listener.log for errors, too.
Check $ORACLE_HOME/otrace/admin for big files (if you've accidentally
turned on tracing, you could be expending lots of resources writing to
here - I think it was 7.3.2 where they delivered this turned on!).
Look for an environment variable called EPC_DISABLED and make sure it
is true.
netstat -a, see if something is not freeing ports. It could be that
just a few things dying has cascaded into lots of things retrying while
the tcp times out.
Check hardware.
Get hp support involved to use the tools to really dig into this.
Upgrade to a supported database.

jg
--
@home.com is bogus.
http://www.crystalinks.com/shc.html
Stan Brown
2006-07-29 00:51:10 UTC
Permalink
Post by joel garry
Post by Stan Brown
Last night on an instance that has been up for over 100 days, and on a
machine with an uptime over 400 days (HP-UX 10.20 and Oracle 7.3.4.5.0,
ERROR: ORA-12547: TNS:lost contact
Intermitentaly on attempts to connect to the instance. I also got some
errors about TNS path? Sorry I don't have that one handy.
From an OS perspective, the machine seems healthy (not out of process table
entries or anything like that, I don't believe).
What should I look for to troubleshoot this problem?
Figure out why processes are being given the boot-rear. This is almost
certainly on the OS side. Ask on an hpux group why processes might
suddenly have trouble starting up.
Look in the syslog for symptoms of memory fragmentation. It could just
be 400 days is too many. I know, I hate the MS-think about rebooting
to fix problems, too.
Check ps for zombies or large number of processes that shouldn't be
there.
How much swap space do you have, primary and secondary? You might have
just tipped past a limit. Check listener.log for errors, too.
Check $ORACLE_HOME/otrace/admin for big files (if you've accidentally
turned on tracing, you could be expending lots of resources writing to
here - I think it was 7.3.2 where they delivered this turned on!).
Look for an environment variable called EPC_DISABLED and make sure it
is true.
netstat -a, see if something is not freeing ports. It could be that
just a few things dying has cascaded into lots of things retrying while
the tcp times out.
Check hardware.
Get hp support involved to use the tools to really dig into this.
Upgrade to a supported database.
Thanks.

This somewhat confirms what I was suspecting.

I've already looked at the size of the process table vs maxprocs tuning in
the kernel, and used Glance to look at almost everthing that it can display.
Swap space is fine.

memory fragmentation I wiil try to figure out how to check.

Thanks, again.
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
HansF
2006-07-28 22:28:12 UTC
Permalink
Post by Stan Brown
Last night on an instance that has been up for over 100 days, and on a
machine with an uptime over 400 days (HP-UX 10.20 and Oracle 7.3.4.5.0,
One of the more obtuse ones I'd seen with HP-UX 10.xx was ... so many
instances 'up' that the system started hitting swap-related situations.
That is, the swapping was so severe it the system was hitting an
undiscovered boundary condition that *caused* corruption in the various
SGAs and other intermittent inexplicable errors.

Of course, that only started happening when the customer installed the
13th instance on a 512M RAM machine.

The customer, and their primary consulting partner who advised them to
install on that machine (and who also happened to be a major competitor to
Oracle, but 'never had a conflict of interest'), hauled me on the carpet
to explain why 'Oracle was corrupting their data'.


Since then, my primary advice on HP-UX 10 is: identify your *real* memory
requirement and compare that to the actual RAM.
--
Hans Forbrich (mailto: Fuzzy.GreyBeard_at_gmail.com)
*** Feel free to correct me when I'm wrong!
*** Top posting [replies] guarantees I won't respond.
Loading...