Android Question string to Byte Array

Rusty

Well-Known Member
Licensed User
Longtime User
I am having a problem converting strings with "extended" characters to an Encryptable byte array.
B4X:
...
Sub Globals
    Dim conv As ByteConverter
End Sub

Sub Activity_Create(FirstTime As Boolean)
    conv.LittleEndian = True    
    Dim PaddedString As String = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnoá      "  
    Log("padddestring " & PaddedString.Length & "   >" & PaddedString & "<")
    Dim data() As Byte = conv.StringToBytes(PaddedString, "UTF8") 
Log("Data Len before " & data.Length)
    If data.Length mod 8 <> 0 Then
        Dim tmpdata(PaddedString.Length) As Byte
Log("data len " & data.Length)  
        conv.ArrayCopy(data,0,tmpdata,0,PaddedString.Length-1)
        data = tmpdata
    End If
    Log("Data after: " & data.Length)
End Sub
...
In order to encrypt data it must be multiples of 8 bytes (I believe).
When I convert a string of standard ascii characters of 48 bytes to a byte array, it returns a 48 byte array.
When I convert a string of ascii characters with, for example "á" embedded within, it returns a 49 byte array. Each additional "á" (extended character) results in one additional byte added to the array, which in return causes the byte array to fail encryption (because it is not a 8 byte multiple in size).
What is the rule on these "extended characters" when the byte converter converts them to bytes?
and how can I insure the byte array is a multiple of 8 bytes in size?
Thanks,
Rusty
 

Rusty

Well-Known Member
Licensed User
Longtime User
B4X:
Dim C As Cipher

...code from above

data = C.Encrypt(data, kg.Key, True)
When I run the code, I get the following error:
Erel, I don't understand how the library pads this automatically. We've had to pad the string to an 8 byte boundary in order for this to work for years.
The question remains, why does the Spanish character increase the size of the byte array?
Thanks, ideas appreciated
 
Last edited:
Upvote 0

James Chamblin

Active Member
Licensed User
Longtime User
In UTF-8 encoding, characters are encoded with 1 to 4 bytes. The standard letters, numbers and punctuation are encoded with a single byte. Most special characters, like á are encoded using multiple bytes. So that is why you need an extra byte.
 
Upvote 0

Rusty

Well-Known Member
Licensed User
Longtime User
Thanks James
I still don't understand why Erel indicated
You don't need to do anything special. The data will be padded by the encryption library.
Is this clear to you?
Rusty
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
The correct answer is that it depends on the selected algorithm and whether padding is applied automatically or not (for example it will be added with AES/CBC/PKCS5Padding which is the algorithm used by B4XEncryption).

Anyway you can use ByteConverter.ArrayCopy to add the padding by copying the data you get from GetBytes to a larger array.
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…