Android Question Preventing Non-Latin Alphabet Text Being Entered

Azhar

Active Member
Licensed User
Longtime User
Hi

I have an editTextBox control where the user can add/edit text. I need to control the maximum text entry to 200 characters. Nice and easy with Latin oriented language scripts (a,b,c,d...) but if I enter an alphabet from another language which is a scripting type language such as Arabic, the text length routine fails and I am unable to determine the string length.

Is there a workaround on this to accurately count the number of characters in non-latin based alphabets or prevent these scripting language texts being entered?

Thanks,

Azhar
 

Azhar

Active Member
Licensed User
Longtime User
Hi Erel,

What I think is happening is the accent symbols are being included in the character count I.e. ٱلْحَمْدُ لِلَّٰهِ LEN is 18
If the accent symbols aren't included then the string.len count would be correct.
The accent symbols are to help non-native speakers pronounce the words correctly.
So I'll try your suggestion.

I prevented users from pasting Arabic into an edit text box by analysing the ASCII indexes of each character and prohibiting out-of-range ASCII in the control.

Thanks.
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
UTF 32 will not help here as it is not an encoding issue. There are actually 18 "characters" here. This is similar to the case of complex emojies that a single emoji is actually made of multiple emojies.

B4X:
Dim s As String = "ٱلْحَمْدُ لِلَّٰهِ"
Dim b() As Byte = s.GetBytes("UTF-32LE")
Dim LengthWithoutAccents As Int
For i = 0 To b.Length - 1 Step 4
    Dim CodePoint As Int = BytesToInt(b, i)
    If CodePoint >= 1611 And CodePoint <= 1621 Then Continue
    LengthWithoutAccents = LengthWithoutAccents + 1
Next
Log(LengthWithoutAccents)

Code points: https://www.ssec.wisc.edu/~tomw/java/unicode.html#x0600
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…