Euro Character lost on export/import

Discussion:

(too old to reply)

Tosta

2004-08-10 12:29:05 UTC

Hi.

An Euro symbol character in varchar2 fields gets lost on export and import beteen different databases.

Source: Oracle 8i, NLS_CHARACTERSET WE8ISO8859P1
Target: Oracle 9i, NLS_CHARACTERSET AL32UTF8 (Unicode)

We're running a Java-based web app with the 8iDB and migrating to 9i. There are definitely fields with the Euro sym in
the 8iDB. They are displayed in the web app. After migration, the Euro sym has changed to a question mark.

I would simply like to know if I'm right with the following:

WE8ISO8859P1 char set doesn't support the Euro (WE8ISO8859P15 does), and Oracle "invents" its own code to represent the
Euro sym. That's the reason the 8i app can display it. On import, all texts are converted from WE8ISO8859P1 to AL32UTF8.
Since Euro is not in WE8ISO8859P1, there is no mapping for the oracle-invented Euro char code, as no conversion.

Conclusion: There is nothing we can do as to live with it. If we had chosen WE8ISO8859P15 for the 8iDB in the beginning,
we wouldn't have the trouble today.

Right?

Looking forward to your comments,

Tosta.

P.S. The other problems migrating to unicode, namely the length-semantics problem, have been solved already, thanks.

Galen Boyer

2004-08-10 14:26:53 UTC

Permalink

My guess is that the new client translation layer is the issue.
A `?' usually means that the client doesn't know how to display a
character. If Oracle did mangle the characters, I would expect
to see mangled characters.

Your app understood how to display characters translated from
WE8ISO8859P1. Has it been set up to understand how to display
characters translated from AL32UTF8?

Post by Tosta
WE8ISO8859P1 char set doesn't support the Euro (WE8ISO8859P15
does), and Oracle "invents" its own code to represent the Euro
sym. That's the reason the 8i app can display it.

Why would Oracle "invent" a code?

Post by Tosta
On import, all texts are converted from WE8ISO8859P1 to
AL32UTF8. Since Euro is not in WE8ISO8859P1,

AL32UTF8 most definitely should include variants of European
character sets.

Post by Tosta
there is no mapping for the oracle-invented Euro char code, as
no conversion.
Conclusion: There is nothing we can do as to live with it. If
we had chosen WE8ISO8859P15 for the 8iDB in the beginning, we
wouldn't have the trouble today.
Right?

I doubt it.

Post by Tosta
Looking forward to your comments,
Tosta.
P.S. The other problems migrating to unicode, namely the
length-semantics problem, have been solved already, thanks.

--
Galen Boyer

Sybrand Bakker

2004-08-10 17:33:37 UTC

Permalink

Post by Tosta
Hi.
An Euro symbol character in varchar2 fields gets lost on export and import beteen different databases.
Source: Oracle 8i, NLS_CHARACTERSET WE8ISO8859P1
Target: Oracle 9i, NLS_CHARACTERSET AL32UTF8 (Unicode)
We're running a Java-based web app with the 8iDB and migrating to 9i. There are definitely fields with the Euro sym in
the 8iDB. They are displayed in the web app. After migration, the Euro sym has changed to a question mark.
WE8ISO8859P1 char set doesn't support the Euro (WE8ISO8859P15 does), and Oracle "invents" its own code to represent the
Euro sym. That's the reason the 8i app can display it. On import, all texts are converted from WE8ISO8859P1 to AL32UTF8.
Since Euro is not in WE8ISO8859P1, there is no mapping for the oracle-invented Euro char code, as no conversion.
Conclusion: There is nothing we can do as to live with it. If we had chosen WE8ISO8859P15 for the 8iDB in the beginning,
we wouldn't have the trouble today.
Right?
Looking forward to your comments,
Tosta.
P.S. The other problems migrating to unicode, namely the length-semantics problem, have been solved already, thanks.

The story is slightly different. As the ISO couldn't agree on a
location for the Euro in WE8ISO8859P1, they decided to set up a new
characterset including the Euro, WE8ISO8859P15.
As usual, Mickeysoft decided not to follow that path. Their default
characterset is currently the 1252 code page, and the P1 characterset
matches the 850 codepage.
The characterset for the database should *always* match the
characterset of the O/S. The characterset of the database should have
been set to WE8MSWIN1252. The one and only difference between the two
charactersets is the location of the Euro.
If you had chosen WE8MSWIN1252 from the beginning you wouldn't have
had this problem.
It currently works because apparently the characterset of the database
is identical to the characterset of the webserver, ie both 8 bit
charactersets, so no conversion ever. As soon as you start export
import, you are in trouble.
There are many notes on Metalink explaining in more detail how to
check for these issues and how to resolve them.

--
Sybrand Bakker, Senior Oracle DBA