utf-8 handling in Koha

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

utf-8 handling in Koha

Dobrica Pavlinusic
In our migration to new koha, we hit bug 6554. We are having similar
problem, but in our case we don't use localized templates but have utf-8
characters inside MySQL which get double utf-8 encoded before they are sent
to browser.

1: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=6554

Below is short summary of changes from patch:

This patch tries to clean up utf-8 handling in Koha.

In current implementation (mostly commented out in this patch)
uses heuristic to guess which strings need decoding from utf-8
to binary representation and doesn't support utf-8 characters
in templates and has problems with utf-8 data from database.

With this changes, Koha perl code always uses utf-8 encoding
correctly. All incomming data from database is allready
correctly marked as utf-8, and decoding of utf8 is required
only from Zebra and XSLT transfers which don't set utf-8 flag

For output, standard perl :utf8 handler is used removing various
"wide character" warnings as side-effect.

I would love to hear your thoughts on this approach. So far, I know that
it breaks CGI::Session (which is documented as known bug in it's
documentation) so after first reload library names, shelfs and other
data returned from session isn't encoded correctly.

I would also need to check if this change affect LDAP, Z39.50 encoding
and SIP server, but before I start down this road do you see any reasons
not to persueue it? Compatibility with older perl versions might be
one reason. I'm running perl v5.10.1 from Debian squeeze.

Dobrica Pavlinusic               2share!2flame            [hidden email]
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin
Koha-devel mailing list
[hidden email]
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/