Can you explain when these methods are useful?
It is very simple to read text with any encoding and UTF8 is the simplest one.
Hi Erel,
Suppose that we want to send a string of non latin chars (mixed with numeric data) via POST (or GET) to a php script on a remote server.
Suppose also that we want -in some manner- guarantee the integrity of those data, so we decide to encode64 them (or perhaps also internally encrypt them, add a CRC etc.)
So we must broke utf-strings to single bytes (to guarantee the consistency of the encoding/decoding process).
For example, say that you try to send to a php-script the string "ΔW" (Ucase Delta, W).
In Android-UTF, "Δ" is assigned as dec:916 and if you call something like
"ΔW".getbytes("UTF-8") returns the 3-bytes byte-array: [-50, -108, 87] because in B4A, Byte type is Signed.
If you make the same trick on the php-side:
array_slice( unpack("C*", "\0"."ΔW"), 1 ) you get the array: [206, 148, 87] because in php, the Byte "type" is Unsigned.
In this situation described, if you send encoded numeric data mixed with utf strings, the transmission is faulty.
But if you use these two functions, prior encoding or after decoding, all goes smooth.
Perhaps the problem could be resolved, if there was a sub, something like
str.getchars("UTF8").
Certainly, an analog procedure (data conditioning) could be made on the php-side and leave the B4A code intact.
A php example code for similar conversions is:
// returns a 2-bytes-UTF8 string from a 16-bit Android-UTF8 array.
function android2utf($a){
$s="";
foreach ($a as $r) {
if ($r < 128 ){ //ascii
$s .= chr($r);
}else{ // utf
$z = intval($r / 64);
$s .= chr($z+192) . chr($r - ($z-2)*64) ;
}
}
return $s;
}
// returns a 16-bit Android-UTF8 array from 2-bytes-UTF8 string
function utf2android($s){
$m=strlen($s);
$i=0;
$r=array();
while ($i < $m) {
if (ord($s[$i]) < 128 ){ //ascii
$r[] = ord($s[$i]);
$i++;
}else{ // utf
$r[] = (ord($s[$i])-194 )*64 + ord($s[$i+1]) ;
$i +=2;
}
}
return $r;
}