Page 1 of 1

UTF-8 Test

Posted: Thu Jan 25, 2007 7:00 pm
by akula65
Я уже забыл об этом.
Превосходно!

Peachy!

Posted: Fri Jan 26, 2007 9:19 am
by Spooky

Posted: Fri Jan 26, 2007 9:31 am
by heftig
This forum's character set is ISO-8859-1, not UTF-8.

Text encoded as UTF-8:
Я уже забыл об Ñ

Posted: Fri Jan 26, 2007 1:20 pm
by akula65
Heh heh. I didn't really expect any replies to this post.

The goal of the test was to see if I could use a UTF-8 editor (Yudit, http://www.yudit.org/ ) in order to generate Russian text in Cyrillic characters in Unicode format which I could then post and display properly on the DBB. If you look at the page source for this page, you can see that the Russian text I posted is not in ISO-8859-1 encoding, but is in fact embedded as Unicode (as defined here: http://www.unicode.org/charts/PDF/U0400.pdf ), and this BB is smart enough to display it properly.

ISO-8859-1 does not include Cyrillic characters by the way.

Yudit gives me a scratchpad where I can easily generate Russian text in Cyrillic and then cut and paste that text into posts on this BB and into applications like OpenOffice, etc. You can see the motivation for wanting a proper Unicode solution in these old DBB posts:

http://descentbb.net/viewtopic.php?t=6169
http://descentbb.net/viewtopic.php?t=8117

(Don't bother trying to read the non-Unicode Russian in the old posts. It has been mangled during BB upgrades, restores from backups, or something along those lines.)

Posted: Fri Jan 26, 2007 3:27 pm
by heftig
phpBB didn't do anything.

Your browser converted any characters not in the ISO-8859-1 set (such as Cyrillic) into character entities when it submitted the form.

See what happens to my last post if you switch the display from ISO-8859-1 to UTF-8. The UTF-8 part will become readable, the umlauts in the ISO-8859-1 part will become unreadable, and the Cyrillic portion of the ISO-8859-1 part will not change (because it is encoded using entities).