473,543 Members | 1,890 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

'order by' does "wrong" with unicode-chars (german umlauts)


postgres 7.3.2

I store unicode-data in postgresql. The data is retrieved via webinterfaces,
processed with perl and then stored in postgresql (and viceversa).

All is going nice with one problem. If performing a "select * order by
field"-query the result is not what I expected.

German umlauts (ie: Ö) are stored as doublechars ("Ö" is "Ö") and only the
first char seems to be taken into account when sorting.

So it happens that the order is like:

Österreich
America
Pakistan

instead of

Amerika
Österreich
Pakistan
How to deal with this Problem ? Of course converting to latin before storing
would be a solution but we plan to offer support for many non-latin languages
later and the meaning of unicode is to get rid of all this converting-stuff
after all.

thnx,
peter

--
IT-Consulting
mag. peter pilsl
tel:+43-699-1-3574035
fax:+43-699-4-3574035
pi***@goldfisch .at
http://www.goldfisch.at


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #1
6 5593
On Friday 19 September 2003 13:11, peter pilsl wrote:
postgres 7.3.2

I store unicode-data in postgresql. The data is retrieved via
webinterfaces, processed with perl and then stored in postgresql (and
viceversa).

All is going nice with one problem. If performing a "select * order by
field"-query the result is not what I expected.

German umlauts (ie: Ö) are stored as doublechars ("Ö" is "Ö") and only the
first char seems to be taken into account when sorting.

So it happens that the order is like:

Österreich
America
Pakistan

instead of

Amerika
Österreich
Pakistan
I'm no expert on locales, but I think you're confusing two things.
Your character-set determines what symbols you can store.
Your locale determines sorting rules. Check the end of the postgresql.conf
file for details of what your current settings are.
How to deal with this Problem ? Of course converting to latin before
storing would be a solution but we plan to offer support for many non-latin
languages later and the meaning of unicode is to get rid of all this
converting-stuff after all.


What sorting-order do you want? You probably have options like: C, en_GB, de,
it etc.
If you are storing non-latin1 symbols as well as latin1, I can't think of what
a reasonable sort order would be.

Unfortunately, PG only supports one locale at a time, and gets set during
initdb. See the chapter on Localisation in the manuals for details.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2
>
I'm no expert on locales, but I think you're confusing two things.
Your character-set determines what symbols you can store.
Your locale determines sorting rules. Check the end of the postgresql.conf
file for details of what your current settings are.


I dont think that this is my problem.

I get my text from a web-form, process it via perl and store it in postgreSQL
via DBI-Interface. The unicode-text appears as multibyte in perl and I got the
suspect that postgresql simply takes this multibyte-text and doesnt even
reckognize that it could be unicode.

If I store a german-umlaut-O (uppercase) to postgres and then retrieve it using
the lower-function on it I dont get a german-umlaut-o (lowercase) at all.
Only the first byte is converted to lowercase and the second is left untouched,
while in "real" unicode-lowercasing the first byte would stay untouched and the
second would change.
I still dont know how to tell postgres that the data it receives is unicode and
not just "singlebyte ".

I'll rethink my problem and post a somehow more precise question to the mainlist
then, but any comments to shorten and improve my rethinking are highly welcome.

thnx,
peter

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 11 '05 #3
On Saturday 20 September 2003 13:56, peter pilsl wrote:
I'm no expert on locales, but I think you're confusing two things.
Your character-set determines what symbols you can store.
Your locale determines sorting rules. Check the end of the
postgresql.conf file for details of what your current settings are.
I dont think that this is my problem.


Sorry - looks like the sorting part of your question threw me off track.
I get my text from a web-form, process it via perl and store it in
postgreSQL via DBI-Interface. The unicode-text appears as multibyte in perl
and I got the suspect that postgresql simply takes this multibyte-text and
doesnt even reckognize that it could be unicode.
Could be the case - try "show client_encoding " in psql to see what encoding
you are using.
If I store a german-umlaut-O (uppercase) to postgres and then retrieve it
using the lower-function on it I dont get a german-umlaut-o (lowercase) at
all. Only the first byte is converted to lowercase and the second is left
untouched, while in "real" unicode-lowercasing the first byte would stay
untouched and the second would change.
I still dont know how to tell postgres that the data it receives is unicode
and not just "singlebyte ".


If it turns out you want to change encoding to multibyte, I think you'll need
to dump an initdb again. See the chapter on localization - multi-byte
encodings for details.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #4
peter pilsl <pi***@goldfisc h.at> writes:
I get my text from a web-form, process it via perl and store it in postgreSQL
via DBI-Interface. The unicode-text appears as multibyte in perl and I got the
suspect that postgresql simply takes this multibyte-text and doesnt even
reckognize that it could be unicode.


If you have set the database encoding as SQL_ASCII, then that's exactly
what it will (and should) do. You need to make the database encoding
be unicode.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #5


It makes no difference if I use a database with encoding unicode:
# \l
List of databases
Name | Owner | Encoding
--------------------+--------------+-----------
<skip>
test | peter | SQL_ASCII
unicode | peter | UNICODE
unicode2 | peter | LATIN1
I tried with all these databases. The problem stays exactely the same. The
german umlaut-O appears as double-byte "ö" and when ordering it appears between
"A" and "B", cause imho only the first byte "Ã" is taking into account.

I still have no idea, if there is a problem outside postgreSQL (in perl or maybe
in the DBD::Pg-interface) or inside postgreSQL. I still struggle with serious
debugging cause I dont know how to insert "real" unicode into postgres ... Any
debugging-hints welcome also :)
Dealing with the UNICODE-database raises new problems:
unicode=# insert into test values ('österreich');
ERROR: Unicode >= 0x10000 is not supported
unicode=# show client_encoding ;
client_encoding
-----------------
UNICODE
(1 row)
thnx,
peter
Quoting Tom Lane <tg*@sss.pgh.pa .us>:
peter pilsl <pi***@goldfisc h.at> writes:
I get my text from a web-form, process it via perl and store it in

postgreSQL
via DBI-Interface. The unicode-text appears as multibyte in perl and I got

the
suspect that postgresql simply takes this multibyte-text and doesnt even
reckognize that it could be unicode.


If you have set the database encoding as SQL_ASCII, then that's exactly
what it will (and should) do. You need to make the database encoding
be unicode.

regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #6
Sat, 20 Sep 2003 18:39:35 +0200
skrev pi***@goldfisch .at (peter pilsl):
It makes no difference if I use a database with encoding unicode:


ah@ahb:~$ LC_ALL=da_DK initdb
ah@ahb:~$ su postgres -c "/usr/local/pgsql/bin/createuser -ad ah"
ah@ahb:~$ createdb ah
ah@ahb:~$ psql ah

ah=# \l

List of databases
Name | Owner | Encoding
-----------+----------+-----------
ah | ah | SQL_ASCII
ah=# show client_encoding ;

client_encoding
-----------------
SQL_ASCII
(1 row)
CREATE TABLE test (f1 varchar);

INSERT INTO test VALUES ('A');
INSERT INTO test VALUES ('B');
INSERT INTO test VALUES ('AA');
INSERT INTO test VALUES ('Æ');
INSERT INTO test VALUES ('Å');
INSERT INTO test VALUES ('Ø');
INSERT INTO test VALUES ('Ä');
INSERT INTO test VALUES ('Ö');
INSERT INTO test VALUES ('Ü');

SELECT * FROM test ORDER BY f1;

t
----
A
B
Ü
Æ
Ä
Ø
Ö
Å
AA
(9 rows)
Looks OK to me ;-)
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1838
by: yurps | last post by:
Hello, I installed both, 1st mysql 4.1.7 but heard there was a change with password hashing so now got 4.0.9 Basically when I choose a page with php_info() I can load it fine, but when I call a php page with mysql connections I get an Window NT prompt, which I type either mysql or window username/password combinations and always get "Wrong...
2
2369
by: Asfand Yar Qazi | last post by:
Hi, xmllint --valid gives error output on some code (included below:) file "Shotgun_Robot.dtd" (don't ask) <!ELEMENT Image (#PCDATA) > <!-- #PCDATA == image path --> <!-- x/y values default to 0, w/h values default to image w/h--> <!ATTLIST Image
18
5694
by: day | last post by:
I know I've seen this issue described before, but I can't find it, or the solution now that I need it. I have some css-specified floating divs that contain images or text. The text divs have a background color and text color that differ from the rest of the page, and I'd like them to be positioned from the right edge of the rightmost image...
6
4310
by: wukexin | last post by:
Help me, good men. I find mang books that introduce bit "mang header files",they talk too bit,in fact it is my too fool, I don't learn it, I have do a test program, but I have no correct doing result in any way. Who can help me, I thank you very very much. list.cpp(main program)...
2
6890
by: joewhitehair | last post by:
Using the XSD.exe tool, I created a number of classes from my XSD file. When I generate the WSDL for my web service, the schema does not have the proper Occurance constraints for the attributes. In this example, the MessageID element should always occur (minOccur=1, maxOccur=1), but the WSDL generates minOccur=0. I can't pinpoint why this is...
2
2027
by: ~~~ .NET Ed ~~~ | last post by:
It is not the first time I see this happen. I am using VS.NET 2003 with .NET Framework 1.1. In this particular situation I have a custom user control in a windows form. There is a member variable declared with the fully qualified class name (some.namespace.mycontrolclass). Then in the InitializeComponent() it is instantiated and there is a...
42
3389
by: Holger | last post by:
Hi guys Tried searching for a solution to this, but the error message is so generic, that I could not get any meaningfull results. Anyways - errormessage: ---------------------------------------------------- TypeError: addFile() takes exactly 1 argument (2 given) ----------------------------------------------------
15
2005
by: David White | last post by:
The size of a struct can be affected by compiler packing. Suppose you need it to be a specific value for some reason (e.g., in firmware). How can you get the compiler to generate an error for the wrong size rather than assert it at run-time? Here is one way, but I don't know if it's guaranteed to work on any compiler: 1/(sizeof(struct...
1
8005
by: Java Guy | last post by:
I'm trying to view a web page. IE tells me there are (Java?) errors on the page. Here they are: Line: 15 Char: 7 Error: Wrong number of arguments or invalid propert assignment Code: 0 URL: http://(address.of.my.webcam):port/LiveView.html and
2
1399
by: sukatoa | last post by:
I've invoke that .COM file using the code below, Runtime.getRuntime().exec("cmd /C start FINAL6.COM"); The program's behavior is wrong when i use this implementation.... When just double clicked, the program's flow is correct... is this a bug?
0
7412
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7355
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7746
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7356
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7697
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5889
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
3394
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1830
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
979
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.