Dealing with UTF-8 Character Encoding In PHP and MySQL

Posted: August 25, 2011

The other week, a colleague of mine sent an interesting article about UTF8 encoding through the web stack. The post talks about how you have to force each part of your web application and software into UTF8 as they fight to default to their own default encodings. Most of the time, we think that setting the charset of our tables to UTF8 in the database and then declaring the HTML document as a ecoded in UTF8 is enough, however you should also specify that data transferred between your application and the database is also encoded as UTF8.

Here’s a summary of the points to be aware of:

1. Code Editor Preferences – Make sure the code editor that you are using is opening and saving your code files in UTF8. A lot of editors default to native Windows or Mac encoding, which can cause characters to appear differently.

2. Create MySQL Tables set to charset=utf8 and to collate=utf8_unicode_ci – MySQL by default collates to Latin1 so you will have to define both the charset and collate when you create your tables.

3. Tell MySQL that .sql File is UTF8 – If you are using the SQL pane in something like PHPMyAdmin, you may not have to worry about this step, however if you are using MySQL’s command line or a different 3rd party MySQL client, you have to tell MySQL that the .sql file is encoded in UTF8. If you don’t tell MySQL the .sql file is UTF8, it will default back to Latin1 and corrupt your data. For a command line example, refer to rentzsch’s post.

4. Exporting Tables – If you’re transferring tables between two databases, make sure to set the default-character-set=utf8 when using mysqldump.

5. Specify the Database Connection in your Application – A lot of developers forget to include a line of code to tell their web application to connect to the database using the UTF8 encoding. In PHP, you can use the mysql_set_charset functions like:

$link = mysql_connect('localhost','username','password',TRUE);
mysql_set_charset('utf8',$link);

6. Specify Encoding in HTTP Headers – Include some PHP code at the top of your application to specify the HTTP Headers:

header('Content-Type: text/html; charset=utf-8');

7. Specify Encoding in HTML – Include a Meta tag in the <head></head> of your HTML documents that defines the encoding as UTF8:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

8. Specify Encoding in HTML Forms – If your application is using forms that Insert or Update data in the database, you should also specify accept-charset=”UTF-8” in the form tag:

<form name="webform" method="post" action="formscript.php" accept-charset="UTF-8">

Dealing with character encodings can be a very tricky, especially if your web application uses different languages with different character sets (ie. Japanese), however if you ensure your encoding is set in each step, it can save you a lot of headaches down the road.

Thanks to rentzsch for their informative post on How To Use UTF-8 Throughout Your Web Stack.

Leave a Reply

Your email address will not be published. Required fields are marked *