Android Code Snippet Get data type from 1D string array

This is mainly needed for if the data needs to be sorted eg in a flexible table.
Lightly tested and seems to work fine, and is quite fast as well.
Will be interested if there are improvements.

B4X:
Sub GetColumnDataType(arrString() As String) As String
    
    Dim i As Int
    Dim r As Int
    Dim strOld As String
    Dim arrBytes() As Byte
    Dim bHasNumber As Boolean
    Dim iDotPos As Int
    Dim iCommaPos As Int
    Dim strDataType As String
    
    bHasNumber = False
    iDotPos = -1
    iCommaPos = -1
    
    For r = 0 To arrString.Length - 1
            
        Try
                
            If arrString(r) <> Null Then
                If arrString(r).Length > 0 Then
                    'this can make this about twice as fast if there is a sorted
                    'column and if the rows to be tested are not taken randomly
                    '-----------------------------------------------------------
                    If arrString(r) <> strOld Then 
                        strOld = arrString(r)
                        arrBytes = strOld.GetBytes("ASCII")
                        
                        For i = 0 To arrBytes.Length - 1
                            If arrBytes(i) > 57 Then
                                'Log("> 57: |" & arrString(i) & "|")
                                Return "T"
                            Else 'If arrBytes(i) > 57
                                If arrBytes(i) < 48 Then
                                    Select Case arrBytes(i)
                                        Case 45 '-
                                            'Log(" = minus")
                                            If i > 0 Then
                                                Return "T" 'as number can't have minus at position past zero
                                            Else
                                                strDataType = "T" 'provisional value
                                            End If
                                        Case 46 '.
                                            If iCommaPos > -1 Then
                                                'Log("dot and comma")
                                                Return "T" 'number can't have both dot and comma
                                            Else
                                                If i - iDotPos = 1 Then 'number can't have 2 consecutive dots
                                                    'Log("consecutive dots: |" & str & "|, " & i & " - " & iDotPos)
                                                    Return "T"
                                                Else
                                                    iDotPos = i
                                                End If
                                            End If
                                        Case 44 ',
                                            If iDotPos > -1 Then
                                                'Log("comma and dot")
                                                Return "T" 'number can't have both dot and comma
                                            Else
                                                If i - iCommaPos = 1 Then
                                                    'Log("consecutive commas")
                                                    Return "T" 'number can't have 2 consecutive commas
                                                Else
                                                    iCommaPos = i
                                                End If
                                            End If
                                    End Select
                                Else 'If arrBytes(i) < 48
                                    bHasNumber = True
                                End If 'If arrBytes(i) < 48
                            End If 'If arrBytes(i) > 57
                        Next
                        
                    End If 'If str <> strOld
                End If 'If str.Length > 0
            End If 'If str <> Null

        Catch
                
            'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
            Return "B"
                
        End Try
            
    Next
        
    If bHasNumber Then
        If iDotPos = -1 Then
            If iCommaPos = -1 Then
                Return "I" 'number and no dot or comma
            Else
                Return "R" 'number and has a comma
            End If
        Else
            Return "R" 'number and has a dot
        End If
    Else   'If bHasNumber
        If strDataType = "T" Then
            Return "T"
        Else
            Return "N"
        End If
    End If 'If bHasNumber
        
End Sub
[\CODE]

RBS
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Yes, thanks, didn't think of that one.
Have done some testing now and it seems using the byte array is a bit faster, but of course using IsNumber is a lot simpler.
Will do some more testing.

Code would be something like this:

B4X:
Sub GetColumnDataType_2(arrString() As String) As String
  
    Dim r As Int
    Dim strOld As String
    Dim bHasNumber As Boolean
    Dim bHasDouble As Boolean
  
    For r = 0 To arrString.Length - 1
          
        Try
            If arrString(r) <> Null Then
                If arrString(r).Length > 0 Then
                    'this can make this about twice as fast if there is a sorted
                    'column and if the rows to be tested are not taken randomly
                    '-----------------------------------------------------------
                    If arrString(r) <> strOld Then
                        strOld = arrString(r)
                        If IsNumber(arrString(r)) Then
                            If arrString(r).Contains(".") Then
                                bHasNumber = True
                                bHasDouble = True
                            Else
                                If arrString(r).Contains(",") Then
                                    bHasNumber = True
                                    bHasDouble = True
                                Else
                                    bHasNumber = True
                                End If
                            End If
                        Else
                            Return ("T")
                        End If
                    End If 'If str <> strOld
                End If 'If str.Length > 0
            End If 'If str <> Null

        Catch
              
            'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
            Return "B"
              
        End Try
          
    Next

    If bHasNumber Then
        If bHasDouble Then
            Return "R"
        Else
            Return "I"
        End If
    Else
        Return "N"
    End If
      
End Sub

RBS
 

Sandman

Expert
Licensed User
Longtime User
Will be interested if there are improvements.
The older I get, the more allergic I get to if-then-else trees, especially if they are nested... :eek: (I rarely even use else nowadays.)

So my OCD kicked in and I remade your code to be more my style. I very much like to "peel away" thin cases as I like to go along, which I find makes the code not only simple to write, but also simple to understand when I later return to it. Might not be true for all coders though. (I do know that this style does irritate some forum members, who consider the principle "only one exit point in each function" important - I obviously don't agree to that. Not in an aggressive way, just in a "that's not my preferred style"-way.)

So, bottom line, this might not be an improvement, depending on your stance.

B4X:
Sub GetColumnDataType_3(arrString() As String) As String
 
    Dim r As Int
    Dim strOld As String
 
    For r = 0 To arrString.Length - 1
         
        Try
            If arrString(r) = Null Then Continue

            If arrString(r).Length = 0 Then Continue
           
            'this can make this about twice as fast if there is a sorted
            'column and if the rows to be tested are not taken randomly
            '-----------------------------------------------------------
            If arrString(r) = strOld Then Continue
           
            strOld = arrString(r)

            If Not(IsNumber(arrString(r))) Then Return ("T")

            If arrString(r).Contains(".") Or arrString(r).Contains(",") Then Return "R"

        Catch
             
            'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
            Return "B"
             
        End Try
         
    Next

    Return "I"
     
End Sub

Note: I have not tested this, not even once. It's possible a bug found its way in there, but it should be fairly simple to squash it if that's the case.

EDIT: Oh right, I should also mention that I didn't fully understand the magic values used and so I felt brave enough to cut out the last lines of code thinking, "it looks so small, surely it can't be important!" ;-)
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
The older I get, the more allergic I get to if-then-else trees, especially if they are nested... :eek: (I rarely even use else nowadays.)

So my OCD kicked in and I remade your code to be more my style. I very much like to "peel away" thin cases as I like to go along, which I find makes the code not only simple to write, but also simple to understand when I later return to it. Might not be true for all coders though. (I do know that this style does irritate some forum members, who consider the principle "only one exit point in each function" important - I obviously don't agree to that. Not in an aggressive way, just in a "that's not my preferred style"-way.)

So, bottom line, this might not be an improvement, depending on your stance.

B4X:
Sub GetColumnDataType_3(arrString() As String) As String
 
    Dim r As Int
    Dim strOld As String
 
    For r = 0 To arrString.Length - 1
        
        Try
            If arrString(r) = Null Then Continue

            If arrString(r).Length = 0 Then Continue
          
            'this can make this about twice as fast if there is a sorted
            'column and if the rows to be tested are not taken randomly
            '-----------------------------------------------------------
            If arrString(r) = strOld Then Continue
          
            strOld = arrString(r)

            If Not(IsNumber(arrString(r))) Then Return ("T")

            If arrString(r).Contains(".") Or arrString(r).Contains(",") Then Return "R"

        Catch
            
            'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
            Return "B"
            
        End Try
        
    Next

    Return "I"
    
End Sub

Note: I have not tested this, not even once. It's possible a bug found its way in there, but it should be fairly simple to squash it if that's the case.

EDIT: Oh right, I should also mention that I didn't fully understand the magic values used and so I felt brave enough to cut out the last lines of code thinking, "it looks so small, surely it can't be important!" ;-)
What the code should do is this:

If the array contains any value that is not a number > return T (meaning text)
If the array contains no values at all (where an empty string "" is considered no value) > return N (meaning null)
If the array contains a number (and not a non-number value) then if there is number that can be parsed as a double than return R (as in real as in SQLite)
otherwise return I (as integer as in SQLite).

As far as I see it your code does an early exit when it finds a number and that shouldn't happen as there could be a text value further on in the array.

I wasn't aware of the Continue B4A keyword, so thanks for that tip.

RBS
 

Sandman

Expert
Licensed User
Longtime User
As far as I see it your code does an early exit when it finds a number and that shouldn't happen as there could be a text value further on in the array.
Good point, thanks! Yeah, I can see I simplified the logic a bit much there.
 

Sandman

Expert
Licensed User
Longtime User
Noticed that B4A (and IsNumber) considers the comma not a valid decimal separator, so have to think how to handle that.

RBS
After the early exit for null-check, just add this:

B4X:
arrString(r).Replace(",", ".")

Basically a crutch to help IsNumber (and B4A) to see things our way.
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
After the early exit for null-check, just add this:

B4X:
arrString(r).Replace(",", ".")

Basically a crutch to help IsNumber (and B4A) to see things our way.
Not sure I will come across situations where there are comma's in numbers (maybe in .csv or .txt files), but I probably add a boolean array argument to the function
and that will then be returned as true if there were numbers with a comma. If that is the case then the sorting code will know this and replace the comma with a dot
when transferring to a double array that is used for the sorting.

RBS
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Not sure I will come across situations where there are comma's in numbers (maybe in .csv or .txt files), but I probably add a boolean array argument to the function
and that will then be returned as true if there were numbers with a comma. If that is the case then the sorting code will know this and replace the comma with a dot
when transferring to a double array that is used for the sorting.

RBS
But then if I consider comma's valid in numbers it can get complex with a number having both a dot and one or more comma's:


RBS
 

emexes

Expert
Licensed User
But then if I consider comma's valid in numbers it can get complex with a number having both a dot and one or more comma's:

Not to mention comma-separated values (CSV) files. ?

Besides, it's a decimal point, and points are circular because they have a constant radius, albeit zero. At least, that's my reasoning and I'm sticking to it. ?

I have a vague recollection that some ISO standard says to use a thin-space to group digits in threes.

The Style Manual for Authors, Editors and Printers that I have in my hands here says they used to recommend commas but now they recommend spaces.

But then I found an online version which is presumably up-to-date cf my 1994 copy:

https://www.stylemanual.gov.au/gram...s#use_commas_in_numbers_with_4_or_more_digits

which seems a step backwards to me, but what would I know, I am just a pleb.
 
Last edited:

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Not to mention comma-separated values (CSV) files. ?

Besides, it's a decimal point, and points are circular because they have a constant radius, albeit zero. At least, that's my reasoning and I'm sticking to it. ?

I have a vague recollection that some ISO standard says to use a thin-space to group digits in threes.

The Style Manual for Authors, Editors and Printers that I have in my hands here says they used to recommend commas but now they recommend spaces.

But then I found an online version which is presumably up-to-date cf my 1994 copy:

https://www.stylemanual.gov.au/gram...s#use_commas_in_numbers_with_4_or_more_digits

which seems a step backwards to me, but what would I know, I am just a pleb.
Having the possibility that both comma's and dots can be a thousand separator, does make it more complex and as my app is for the medical world and not the financial
world I don't think it very relevant, but have made a Sub that I think accounts for those options. Lightly tested (probably too lightly) and it seems to be working OK.

B4X:
Sub GetColumnDataType_2(arrString() As String, _
                        bAllowCommaForNumbers As Boolean, _
                        bCommaIsThousandSeparator As Boolean, _
                        bDotIsThousandSeparator As Boolean, _
                        arrNumbersHaveDot() As Boolean, _
                        arrNumbersHaveComma() As Boolean, _
                        arrNumbersHaveCommasAndDots() As Boolean) As String
    
    Dim i As Int
    Dim r As Int
    Dim strOld As String
    Dim arrBytes() As Byte
    Dim bHasNumber As Boolean
    Dim iDotPos As Int
    Dim iCommaPos As Int
    Dim strDataType As String
    
    'can't do -1 here that would make a value text if that value started with a dot or comma (0 - -1 = 1)
    iDotPos = -2
    iCommaPos = -2
    
    If bAllowCommaForNumbers Then
        
        For r = 0 To arrString.Length - 1
            
            Try
                
                If arrString(r) <> Null Then
                    If arrString(r).Length > 0 Then
                        'this can make this about twice as fast if there is a sorted
                        'column and if the rows to be tested are not taken randomly
                        '-----------------------------------------------------------
                        If arrString(r) <> strOld Then
                            strOld = arrString(r)
                            arrBytes = strOld.GetBytes("ASCII")
                        
                            For i = 0 To arrBytes.Length - 1
                                If arrBytes(i) > 57 Then
                                    'Log("> 57: |" & arrString(i) & "|")
                                    Return "T"
                                Else 'If arrBytes(i) > 57
                                    If arrBytes(i) < 48 Then
                                        Select Case arrBytes(i)
                                            Case 45 '-
                                                'Log(" = minus")
                                                If i > 0 Then
                                                    Return "T" 'as number can't have minus at position past zero
                                                Else
                                                    strDataType = "T" 'provisional value, can change to an int or double
                                                End If
                                            Case 46 '.
                                                If iCommaPos > -1 Then 'this means bAllowCommaForNumbers will be True
                                                    arrNumbersHaveCommasAndDots(0) = True
                                                    'Log("dot and comma")
                                                    If i - iCommaPos = 1 Then
                                                        Return "T" 'as can't have consecutive comma and dot
                                                    Else
                                                        iDotPos = i
                                                    End If
                                                Else
                                                    If i - iDotPos = 1 Then 'number can't have 2 consecutive dots
                                                        'Log("consecutive dots: |" & str & "|, " & i & " - " & iDotPos)
                                                        Return "T"
                                                    Else
                                                        iDotPos = i
                                                    End If
                                                End If
                                            Case 44 ',
                                                If i - iCommaPos = 1 Then
                                                    Return "T" 'can't have 2 consecutive comma's
                                                Else
                                                    If i - iDotPos = 1 Then
                                                        Return "T" 'can't have a dot followed by a comma
                                                    Else
                                                        iCommaPos = i
                                                    End If
                                                End If
                                        End Select
                                    Else 'If arrBytes(i) < 48
                                        bHasNumber = True
                                    End If 'If arrBytes(i) < 48
                                End If 'If arrBytes(i) > 57
                            Next
                        End If 'If str <> strOld
                    End If 'If str.Length > 0
                End If 'If str <> Null

            Catch
                
                'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
                Return "B"
                
            End Try
            
        Next
    
        arrNumbersHaveDot(0) = iDotPos > -1
        arrNumbersHaveComma(0) = iCommaPos > -1
        
        If bHasNumber Then
            If iDotPos = -2 Then
                If iCommaPos = -2 Then
                    Return "I" 'number and no dot or comma
                Else
                    If bCommaIsThousandSeparator = False Then
                        Return "R" 'number and has a comma
                    Else
                        Return "I"
                    End If
                End If
            Else
                If bDotIsThousandSeparator = False Then
                    Return "R" 'number and has a dot
                Else
                    Return "I"
                End If
            End If
        Else   'If bHasNumber
            If strDataType = "T" Then
                Return "T"
            Else
                Return "N"
            End If
        End If 'If bHasNumber
        
    Else 'If bAllowCommaForNumbers
        
        For r = 0 To arrString.Length - 1
            
            Try
                
                If arrString(r) <> Null Then
                    If arrString(r).Length > 0 Then
                        'this can make this about twice as fast if there is a sorted
                        'column and if the rows to be tested are not taken randomly
                        '-----------------------------------------------------------
                        If arrString(r) <> strOld Then
                            strOld = arrString(r)
                            arrBytes = strOld.GetBytes("ASCII")
                        
                            For i = 0 To arrBytes.Length - 1
                                If arrBytes(i) > 57 Then
                                    'Log("> 57: |" & arrString(i) & "|")
                                    Return "T"
                                Else 'If arrBytes(i) > 57
                                    If arrBytes(i) < 48 Then
                                        Select Case arrBytes(i)
                                            Case 45 '-
                                                'Log(" = minus")
                                                If i > 0 Then
                                                    Return "T" 'as number can't have minus at position past zero
                                                Else
                                                    strDataType = "T" 'provisional value, can change to an int or double
                                                End If
                                            Case 46 '.
                                                If i - iDotPos = 1 Then 'number can't have 2 consecutive dots
                                                    'Log("consecutive dots: |" & str & "|, " & i & " - " & iDotPos)
                                                    Return "T"
                                                Else
                                                    iDotPos = i
                                                End If
                                            Case 44 ',
                                                Return "T" 'number can't have comma
                                                iCommaPos = i
                                        End Select
                                    Else 'If arrBytes(i) < 48
                                        bHasNumber = True
                                    End If 'If arrBytes(i) < 48
                                End If 'If arrBytes(i) > 57
                            Next
                        
                        End If 'If str <> strOld
                    End If 'If str.Length > 0
                End If 'If str <> Null
    
            Catch
                
                'this won't happen with a string array, but will work if a ResultSet is passed in a similar routine
                Return "B"
                
            End Try
            
        Next
    
        arrNumbersHaveDot(0) = iDotPos > -1
        arrNumbersHaveComma(0) = iCommaPos > -1
        
        If bHasNumber Then
            If iDotPos = -2 Then
                Return "I" 'number and no dot
            Else
                Return "R" 'number and has a dot
            End If
        Else   'If bHasNumber
            If strDataType = "T" Then
                Return "T"
            Else
                Return "N"
            End If
        End If 'If bHasNumber
        
    End If 'If bAllowCommaForNumbers
        
End Sub

RBS
 

emexes

Expert
Licensed User
Having the possibility that both comma's and dots can be a thousand separator, does make it more complex

Understatement of the year. ?

In an unrelated adventure, I was just looking at https://en.wikipedia.org/wiki/Garanhuns and spotted:

1671573401923.png


where the 1.20% suggests that the decimal point is a "." but if that is the case, then that is an extremely precise population count. ?
 

emexes

Expert
Licensed User
This is mainly needed for if the data needs to be sorted eg in a flexible table.

I was wondering why you had numbers stored as strings in a table, but it just clicked that you're not talking about a database table.
 
Last edited:

RB Smissaert

Well-Known Member
Licensed User
Longtime User
I was wondering why you had numbers stored as strings in a table, but it just clicked that you're not talking about a database table.

The simplest way to determine if a column there is numeric or not, might be to look at the entire column rather than just one entry at a time: if that column only contains numeric characters +-0123456789 and one of ., and maybe e and d for exponential format, then I'd be willing to bet good money that it's numeric. If it contains any other non-qhitespace characters, then I'd bet even more that it's not. ?
>> might be to look at the entire column rather than just one entry at a time

Not sure what you mean by that. I am looking at the whole column, unless it finds an element that is text.
It is actually all working fine. The thousands separator thing is not really that important.

RBS
 

emexes

Expert
Licensed User
B4X:
Sub GetColumnDataType_2(arrString() As String, _
                        bAllowCommaForNumbers As Boolean, _
                        bCommaIsThousandSeparator As Boolean, _
                        bDotIsThousandSeparator As Boolean, _
                        arrNumbersHaveDot() As Boolean, _
                        arrNumbersHaveComma() As Boolean, _
                        arrNumbersHaveCommasAndDots() As Boolean) As String

Instead of Booleans re: Dots and Commas, maybe record the maximum number of Dots per element, and same for Commas.

If MaxDotsInElement > 1 then thousands separator must be Dot, and thus decimal point must be Comma
If MaxCommasInElement > 1 then thousands separator must be Comma, and thus decimal point must be Dot

If both maximums are > 1 then :oops: your guess is as good as mine is as good as Rnd(0, 2)

If both maximums = 0 then no problem, all numbers are integer.

If one maximum is 1 and the other is not, then probably (but not stake-your-life-on-it certainly) decimal point is that character.

ps nice touch using array to emulate BYREF ?
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Instead of Booleans re: Dots and Commas, maybe record the maximum number of Dots per element, and same for Commas.

If MaxDotsInElement > 1 then thousands separator must be Dot, and thus decimal point must be Comma
If MaxCommasInElement > 1 then thousands separator must be Comma, and thus decimal point must be Dot

If both maximums are > 1 then :oops: your guess is as good as mine is as good as Rnd(0, 2)

If both maximums = 0 then no problem, all numbers are integer.

If one maximum is 1 and the other is not, then probably (but not stake-your-life-on-it certainly) decimal point is that character.

ps nice touch using array to emulate BYREF ?
Yes, that is an idea.
Will have a look at that, but must keep it fast. Mind you, getting a .csv or .txt into that list of arrays will take a lot longer
than determining the data type, so speed of that is less relevant really.

RBS
 

emexes

Expert
Licensed User
Will have a look at that, but must keep it fast. Mind you, getting a .csv or .txt into that list of arrays will take a lot longer
than determining the data type, so speed of that is less relevant really.

Yeah, I think we might have passed the point of maximum result-per-effort some time ago. ?

But it's an interesting challenge.
 

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Yeah, I think we might have passed the point of maximum result-per-effort some time ago. ?

But it's an interesting challenge.
Will work this out in VBA now as that is a lot quicker, not having to compile all the time.

RBS
 
Top