Speed up string operations

canalrun

Well-Known Member
Licensed User
Longtime User
Hello,
I am doing heavy-duty string operations:
CharAt
IndexOf
"RemLet" (my remove letter)

Within a short string (10 characters or less) I have to extract a particular character. Using that character I have to see if it exists in another short string (10 characters or less). And, if it does I have to remove that character from the second string.

I am using the String data type. I see there is also a StringBuilder; it does not have an IndexOf, but it has a Remove. String has an IndexOf but no Remove.

For my Remove, I am basically using:
t = s.SubString2(0, pos) & s.SubString(pos+1)

the operations I'm performing take about 2 min. I would like to speed that up if possible.

Can anyone suggest a faster way to work worth short strings? The key functions are equivalents to: CharAt, IndexOf, and Remove.

Thanks,
Barry.
 

canalrun

Well-Known Member
Licensed User
Longtime User
Hello,
I am doing heavy-duty string operations:
CharAt
IndexOf
"RemLet" (my remove letter)

Within a short string (10 characters or less) I have to extract a particular character. Using that character I have to see if it exists in another short string (10 characters or less). And, if it does I have to remove that character from the second string.

I am using the String data type. I see there is also a StringBuilder; it does not have an IndexOf, but it has a Remove. String has an IndexOf but no Remove.

For my Remove, I am basically using:
t = s.SubString2(0, pos) & s.SubString(pos+1)

the operations I'm performing take about 2 min. I would like to speed that up if possible.

Can anyone suggest a faster way to work worth short strings? The key functions are equivalents to: CharAt, IndexOf, and Remove.

Thanks,
Barry.

I found Erel's post:

http://www.b4x.com/forum/bugs-wishlist/8379-byref-parameter-android.html#post46933

about creating ByRef is behavior. I will try that.

Barry.
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
Can you post your algorithm code?

Here it is. It's kind of long. The list in the outer loop has 175,000 (175 thousand) words of less than 10 characters. The inner two loops also have words of less than 10 characters. I left in my crude profiling code (tr stuff).

On my 1 GHz Vibrant, I get the following timing:
tr(2) = 127 secs
tr(3) = 107 secs
tr(4) = 35 secs

The most inner loop, tr(4), is not too bad. The middle loop minus the most inner loop, tr(3)-tr(4), must be 72 secs. this seems to be the culprit. The most outer loop minus the middle, tr(2)-tr(3), loop is 20 secs.

I have tried your trick for using ByRef in the statements that call RemLet, but that actually increased the time a little. I got the same answer so I have some confidence that I actually did it right.

Instead of a list to hold 175,000 words, at first I used a database. Using a list is quite a bit faster.

Thanks,
Barry.


B4X:
tr(2).srt = DateTime.Now

     For i=0 To ScrList.Size-1 ' for each list word
       If (i Mod 1000 = 0) Then ' update progressbar
        k = 2 + 98 * (i/ScrList.Size)
        pbrPB1.Progress = k
      End If
      
        DoEvents
      If (fSrchCancel = True) Then
         PgShow = 0
         pnlSwap(pnlMain, pnlMain)
        Return
      End If

        s = ScrList.Get(i)
      s = s.ToUpperCase
      l = s.Length
      
tr(3).srt = DateTime.Now

      For j=0 To l2-1 ' for each let match check if it has remainder
          ts = s ' temp word
          p1 = ts.IndexOf(s2.CharAt(j))
        
          If (p1 >= 0) Then
          ts = RemLet(ts, p1) ' remove first occurrence of let form word
        Else
          Continue ' try next let. word must have let
        End If
         
          fmtch = True
          ts1 = s1 ' temp letters
         
 tr(4).srt = DateTime.Now

         For k=0 To ts.Length-1 ' check if each let of temp word is in letters
            p1 = ts1.IndexOf(ts.CharAt(k))
           
            If (p1 >= 0) Then
              RemLet(ts1, p1) ' letter used in match
            Else
                fmtch = False
                Exit ' word let not in letters
            End If
          Next
         
tr(4).end = DateTime.Now
tr(4).tot = tr(4).tot + (tr(4).end - tr(4).srt)

          If (fmtch = True) Then ' add word to results table
            v(0) = ScrList.Get(i)
            v(1) = ScoreWord(v(0))
            v(2) = v(0).Length
            SQL1.ExecNonQuery2("INSERT INTO restmp (mword, mscore, mlen) VALUES (?, ?, ?)", v)
          End If
        Next

tr(3).end = DateTime.Now
tr(3).tot = tr(3).tot + (tr(3).end - tr(3).srt)

     Next ' next word

tr(2).tot = DateTime.Now - tr(2).srt

Log("*** tr2 = " & tr(2).tot)
Log("*** tr3 = " & tr(3).tot)
Log("*** tr4 = " & tr(4).tot)

   End If

B4X:
Sub RemLet(str As String, pos As Int) As String
  Dim l As Int
  Dim s, t As String

  l = str.Length
  s = str
  
  If (l <= 1) Then
    t = ""
  Else
    If (pos >= l) Then
     t = s
   Else
      If (pos = 0) Then
       t = s.SubString(1)
     Else
        If (pos = l-1) Then
        t = s.SubString2(0, l-1)
      Else
          t = s.SubString2(0, pos) & s.SubString(pos+1)
      End If
     End If
   End If
  End If
  
  Return t  
End Sub
 
Upvote 0

Pfriemler

Member
Licensed User
Longtime User
Without really understanding what your code does ;-) ...

- Try to avoid String-to-String operations as much as possible to avoid Garbage collection

- Think about pushing the DoEvents to the progress bar update (only every 1000 steps).
 
Upvote 0

Pfriemler

Member
Licensed User
Longtime User
String has an IndexOf but no Remove.

No, it has in a sort of... Method Replace can remove a part from a string (whereever it is) by replacing it with nothing.

if s1 is your list of letters to remove and s2 is where it should be removed from,

for i=0 to s1.length-1
s2=s2.Replace(s1.CharAt(i),"")
next

Do you need to remove a letter at a certain position or only to remove all of them at all?
 
Upvote 0

kickaha

Well-Known Member
Licensed User
Longtime User
Try this for RemLet

B4X:
Sub RemLet(str As String, pos As Int) As String

Dim l As Int
l = str.Length

If (l = 0) Then Return ""
If (pos >= l) Then Return str
If (pos = 0) Then Return str.SubString(1)
If (pos = l-1) Then Return str.SubString2(0, l-1)
Return str.SubString2(0, pos) & str.SubString(pos+1)  

End Sub
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
Try this for RemLet

B4X:
Sub RemLet(str As String, pos As Int) As String

Dim l As Int
l = str.Length

If (l = 0) Then Return ""
If (pos >= l) Then Return str
If (pos = 0) Then Return str.SubString(1)
If (pos = l-1) Then Return str.SubString2(0, l-1)
Return str.SubString2(0, pos) & str.SubString(pos+1)  

End Sub

Thanks. I replaced my code with yours and re-ran. Yours saved 1 second or two. Your code is definitely neater.

I use local variables so that I can see interim results in the debugger. I had a single "Return" because at one time I tried to profile this routine's timing, but it needs something finer than 1 millisecond. I will learn that Google trace tool sometime.

Also ...

That is a good point about garbage collection. I had not considered that.

I only need to remove one character from a certain position at a time, but that idea of replacing characters with nulls and then using a final s.Replace to clean things up is intriguing.

For DoEvents, I have a button shown with the progress bar for "cancel". I assumed that pressing the cancel button event would not be recognized if another event subroutine was busy processing. The Cancel button event sets fSrchCancel to true. I never tested that assumption though.

Thanks,
Barry.
 
Last edited:
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
There is no ByRef in Basic4android.

As Pfriemler wrote above, I recommend you to call DoEvents every 1000 iterations.

Thanks. I moved my DoEvent to within my if (i mod 1000) statement. This slashed about 10 seconds off the time.

Is there a more efficient way to handle strings? For example, using arrays of chars or lists?

I saw in a Basic speed up tutorial on another site that said to stay away from variants.

can you treat strings as arrays in Basic? Such as:

Dim str As String
str = "Apfle"
str(2) = "p"

Barry.
 
Last edited:
Upvote 0

agraham

Expert
Licensed User
Longtime User
You won't get any faster than using Strings or a StringBuilder if you want to look at individual characters or character sequences. Any iteration over a character array at the Java level is much more expensive than doing it in native code as will be used for the string methods. This is especially true on Android versions before 2.2 as they do not JIT their bytecodes.

I don't understand the purpose of your algorithm but is there anything you can pre-calculate and store in the outer loop that can be moved out of the inner loops. Also avoiding calling Subs in the inner loop will increase performance slightly so inlining RemLet might help.
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Longtime User
Also avoiding calling Subs in the inner loop will increase performance slightly so inlining RemLet might help.[/QUOTE said:
Inlining RemLet - very good idea.

I might be able to get away with
t - str.SubString2(0, pos) & str.SubString(pos+1)
as long as it does not choke if pos = 0 or pos >= str.Length-1

My algorithm can be thought of as O(NM^2)

I would like to make my algo O(NM). That would probably help alot.

Thanks,
Barry.


Added later ...

I tried inlining. It does not choke so it must be able to handle the two edge cases.
plus I found a small "boo boo"

These two changes saved almost 30 seconds. That is livable for now.

Barry.
 
Last edited:
Upvote 0
Top