B4J Question How to remove symbols from a string?

Diceman

Active Member
Licensed User
I want to write a function that will quickly remove invalid characters from a string so I am left with a valid identifier name. The identifier would start with one or more letters, digits, "_" or space. It should allow for Unicode letters.

I can check to see if the string has a valid identifier name using:

B4X:
'Return True if aIdentName is a valid name for an identifier otherwise return False
Sub IsValidIdentifierName(aIdentName As String) As Boolean
    Dim ma As Matcher
    ma = Regex.Matcher("^[a-zA-Z][a-zA-Z0-9\_\ ]+$", aIdentName)
    If ma.Find Then
        Log("String is ok " & aIdentName)
        Return True
    Else
        Log("String not ok " & aIdentName)
        Return False
    End If
End Sub

But if it has invalid symbols in the string, how do I quickly remove them?
And how do I allow for Unicode letters in the string?

TIA
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
Replace all characters except of English letters, numbers and undersecore:
B4X:
Dim s As String = Regex.Replace("[^a-zA-Z0-9_]", "c23f23;',./\']234", "")
Log(s)

And how do I allow for Unicode letters in the string?
Every character is a Unicode character. You need to better define "Unicode letters" and see whether there is a regex class that matches what you are looking for.
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…