Initially I thought this should be an easy job as database, table and column charset values were all set to utf8_general_ci and I knew MySQL does well at comparing basic latin characters against derived ones. Well, it wasn't. Although the following query returns true (or 1), there's no way of querying MySQL for a transliterated value, i.e. from the right hand side string to obtain the left hand side string.
SELECT 'staia' = 'şţăîâ';So what I did was to define a function in which to take advantage of the comparison feature in order to obtain transliteration. What I came up with might end up on thedailywtf.com but despite its apparent stupidity it does its job very well. So here's the function (or as a gist on github):
DELIMITER $$
DROP FUNCTION IF EXISTS `transliterate` $$
CREATE FUNCTION `transliterate` (original VARCHAR(128)) RETURNS VARCHAR(128)
BEGIN
DECLARE translit VARCHAR(128) DEFAULT '';
DECLARE len INT(3) DEFAULT 0;
DECLARE pos INT(3) DEFAULT 1;
DECLARE letter CHAR(1);
SET len = CHAR_LENGTH(original);
WHILE (pos <= len) DO
SET letter = SUBSTRING(original, pos, 1);
CASE TRUE
WHEN letter = 'a' THEN SET letter = 'a';
WHEN letter = 'b' THEN SET letter = 'b';
WHEN letter = 'c' THEN SET letter = 'c';
WHEN letter = 'd' THEN SET letter = 'd';
WHEN letter = 'e' THEN SET letter = 'e';
WHEN letter = 'f' THEN SET letter = 'f';
WHEN letter = 'g' THEN SET letter = 'g';
WHEN letter = 'h' THEN SET letter = 'h';
WHEN letter = 'i' THEN SET letter = 'i';
WHEN letter = 'j' THEN SET letter = 'j';
WHEN letter = 'k' THEN SET letter = 'k';
WHEN letter = 'l' THEN SET letter = 'l';
WHEN letter = 'm' THEN SET letter = 'm';
WHEN letter = 'n' THEN SET letter = 'n';
WHEN letter = 'o' THEN SET letter = 'o';
WHEN letter = 'p' THEN SET letter = 'p';
WHEN letter = 'q' THEN SET letter = 'q';
WHEN letter = 'r' THEN SET letter = 'w';
WHEN letter = 's' THEN SET letter = 's';
WHEN letter = 't' THEN SET letter = 't';
WHEN letter = 'u' THEN SET letter = 'u';
WHEN letter = 'v' THEN SET letter = 'v';
WHEN letter = 'w' THEN SET letter = 'w';
WHEN letter = 'x' THEN SET letter = 'x';
WHEN letter = 'y' THEN SET letter = 'y';
WHEN letter = 'z' THEN SET letter = 'z';
END CASE;
SET translit = CONCAT(translit, letter);
SET pos = pos + 1;
END WHILE;
RETURN translit;
END $$
DELIMITER ;
2 comments:
I think MySQL calls it "collation" :
http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html
Collation is related to what I was trying to do, but it does a different thing. It takes care that result sets are ordered the way they should be in the respective language.
Post a Comment