UTF-8 Test

Testing area for the DBB members and staff!
Post Reply
User avatar
akula65
DBB Ace
DBB Ace
Posts: 365
Joined: Mon Sep 20, 2004 6:34 pm
Location: Virginia

UTF-8 Test

Post by akula65 »

Я уже забыл об этом.
Превосходно!

Peachy!
Spooky
DBB Ace
DBB Ace
Posts: 251
Joined: Tue Apr 25, 2006 2:27 pm

Post by Spooky »

User avatar
heftig
DBB Ace
DBB Ace
Posts: 138
Joined: Mon Jun 05, 2006 9:55 pm
Location: Germany
Contact:

Post by heftig »

This forum's character set is ISO-8859-1, not UTF-8.

Text encoded as UTF-8:
Я уже забыл об Ñ
User avatar
akula65
DBB Ace
DBB Ace
Posts: 365
Joined: Mon Sep 20, 2004 6:34 pm
Location: Virginia

Post by akula65 »

Heh heh. I didn't really expect any replies to this post.

The goal of the test was to see if I could use a UTF-8 editor (Yudit, http://www.yudit.org/ ) in order to generate Russian text in Cyrillic characters in Unicode format which I could then post and display properly on the DBB. If you look at the page source for this page, you can see that the Russian text I posted is not in ISO-8859-1 encoding, but is in fact embedded as Unicode (as defined here: http://www.unicode.org/charts/PDF/U0400.pdf ), and this BB is smart enough to display it properly.

ISO-8859-1 does not include Cyrillic characters by the way.

Yudit gives me a scratchpad where I can easily generate Russian text in Cyrillic and then cut and paste that text into posts on this BB and into applications like OpenOffice, etc. You can see the motivation for wanting a proper Unicode solution in these old DBB posts:

http://descentbb.net/viewtopic.php?t=6169
http://descentbb.net/viewtopic.php?t=8117

(Don't bother trying to read the non-Unicode Russian in the old posts. It has been mangled during BB upgrades, restores from backups, or something along those lines.)
User avatar
heftig
DBB Ace
DBB Ace
Posts: 138
Joined: Mon Jun 05, 2006 9:55 pm
Location: Germany
Contact:

Post by heftig »

phpBB didn't do anything.

Your browser converted any characters not in the ISO-8859-1 set (such as Cyrillic) into character entities when it submitted the form.

See what happens to my last post if you switch the display from ISO-8859-1 to UTF-8. The UTF-8 part will become readable, the umlauts in the ISO-8859-1 part will become unreadable, and the Cyrillic portion of the ISO-8859-1 part will not change (because it is encoded using entities).
Post Reply