Home Contact Us Sitemap News Forums Japanese Lessons Fan Art Fan Fiction Download
Main Menu
  Community
News
Forums (ext.)
Log In
Register Account
Resources
Fan Art
Fan Fiction
Stories
Japanese Lessons
Cutest Anime Girl
Downloads
Links
Sitemap
Documentation
All Anime World FAQ
Site Search
Affiliates
Link Us!
Impressum
 
Catnip Box
  ###DROPBOX###  
Toolbox
   Print View
 Text Only
 Bookmark us!
Call a dog and they come, call
a cat and they take a message
and get back to you later.
Oldwrench, chat 13th Apr

 
Affiliates
 




Firefox 3
- More Cool Sites -
 

  Delphi + ZeosLib + MySQL + UTF8 HowTo

After fiddling around with all these components I finally made a working database connection with my server. It appears to me, that all connections made to MySQL 4.1 default to the "latin1" encoding no matter what the database contains. To work with utf8 you have to tell Zeos to set up certain server variables. You can do this by executing the SQL command: "SET character_set_connection=utf8; SET character_set_results=utf8;" Starting that moment, MySQL expects to get ALL commands in utf8 and will also return all datasets in utf8! You can put these variables into your TZConnection.Properties String list:

 

character_set_connection=utf8
character_set_results=utf8

Hint: If you want Mysql to use 100% utf8 internally, just add the following to your TZConnection.Properties:

 

character_set_client=utf8
character_set_connection=utf8
character_set_database=utf8
character_set_results=utf8
character_set_server=utf8
character_set_system=utf8
collation_connection=utf8_general_ci
collation_database=utf8_general_ci
collation_server=utf8_general_ci
Codepage=utf8

Now that MySQL is returning everything in utf8 you have to make sure to use WideString variables ALWAYS.

For NON-BLOB fields you can use the following:

 

VAR MyString : WideString;
VAR DBString : UTF8String;
[...]
DBString := MyTable.FieldByName('title').AsString;
MyString := UTF8Decode(DBString);

BLOBs are returned as a stream, this also applies to TEXT fields however since you have stored them in utf8 (using UTF8Encode()) you still need to decode them with UTF8Decode(). I'm using the (shareware) DIUnicode tool for utf8 stream manipulation but that's EUR 20.- although it comes with a nifty library of WideString utility functions like StrReplaceIW() If you don't want to spend money you can download the libiconv conversion for free and write a stream conversion tool yourself.

Now that you're working with WideChars which are UNICODE two byte encoded you have to stay the f.... away from all and any library function that doesn't end in a "W". Most notably you will not be able to read ResourceStrings from your pascal code. LoadString() is using AnsiString internally which automatically converts WideString to CP Windows 1252 and back to WideString to fit your variable (or whatever your system default codepage is). That doesn't create much of a problem unless you're using characters which are definitely NOT in your CP - which is the very reason for using utf8 and unicode in the first place.

If you haven't been playing with UNICODE before you will trip over another problem: The VCL is written using AnsiString's. Which means, all controls is converting WideString's to AnsiString's when you write something like: [i]MyControl.Caption := CoolCyrillicWideString;[/i] it will create a caption like: [i]???? ?? ??. 24??? ??????[/i]

Fortunately there is a VCL library available which deals with the problem: TNT Controls. You have to use these controls whenever you want to have a control I18N aware.

Another reason for using UNICODE is speed. Windows NT, 2000, XP, and Vista is using UNICODE internally. All AnsiString functions are just wrapper functions around the corresponding UNICODE functions. Omitting these wrapper functions can speed up your application notably when used properly.

Now that your data is on your server you should make sure to always put MySQL in utf8 mode especially when using php to access your data which is common work in web based CMS or CRM systems. Always use the following:

$res = mysql_query('SHOW VARIABLES LIKE "character_set_results"');
if ($row = mysql_fetch_row($res) {
  $oldcharset = $row[1];
} else {
  $oldcharset = 'latin1';
}
mysql_query("SET character_set_results = 'utf8'");

[...insert your code here...]

mysql_query("SET character_set_results = '".$oldcharset."');
?>

If your php code is not embedded into a forum or CMS/CRM system you can of course omit querying the original encoding. The code above will allow you to receive your database content in utf8 but it will not allow you to query data using utf8! You have to set character_set_connection to utf8 as well but since php doesn't properly support utf8 I prefer to work around that problem using a uid auto_increment value for row selection.

Have fun and give me some feedback to: sakura (at) allanime (dot) org. All additions and comments (and flames) welcome.

 

 



All articles by Olray and Sakura

 Fantasy Name Generator
 UTF-8 howto
 TYPO3 scaling images by code
 TYPO3 Power-Linking
 TYPO3 Rotating Images

 
Privacy Policy | Terms & Conditions