B4J Question How fast is your lookup?

William Lancee

Well-Known Member
Licensed User
Longtime User
I am going to try it with various length strings (an English dictionary - 370,000 words)
1. I am going to create a list with 10,000 randomly chosen words (I'll set RndSeed to ensure repeatability)
2. Create 2 dataset:
a) list of all words
b) a B4XSet of all words
3. Time the process of looking up the 10,000 words in each dataset

I'll be right back...
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
'Reading 370103 words from File ==> Time = 37 msecs
'Loading 370103 words into a List ==> Time = 9 msecs
'Loading 370103 words into a B4XSet ==> Time = 81 msecs
'Creating 10,000 sample lookUp words ==> Time = 2 msecs
'Search For 10,000 random words in a list ==> Time = 9390 msecs 'a linear search
'Search For 10,000 random words in a B4XSet ==> Time = 3 msecs 'a hash lookup

Conclusion: no contest for this task B4XSet wins easily - but takes 10x longer to load

B4X:
Sub Class_Globals
    Private Root As B4XView  'ignore
    Private xui As XUI
    
    Private words As List
    Private markTime As Long
End Sub

Public Sub Initialize
    markTime = DateTime.now
    words = File.ReadList(File.DirAssets, "words_alpha.txt")
    Log($"Reading ${words.Size} words from file  ==>  Time = ${DateTime.Now - markTime} msecs"$)
End Sub

Private Sub B4XPage_Created (Root1 As B4XView)
    Root = Root1
    markTime = DateTime.now

    Dim DSList As List
    DSList.Initialize
    For Each w As String In words
        DSList.Add(w)
    Next
    Log($"Loading ${words.Size} words into a List  ==>  Time = ${DateTime.Now - markTime} msecs"$)
    markTime = DateTime.now

    Dim DSSet As B4XSet
    DSSet.Initialize
    For Each w As String In words
        DSSet.Add(w)
    Next
    Log($"Loading ${words.Size} words into a B4XSet  ==>  Time = ${DateTime.Now - markTime} msecs"$)
    markTime = DateTime.now
    
    RndSeed(314159826)
    Dim lookupWords As List
    lookupWords.Initialize
    For i = 0 To 9999
        lookupWords.Add(words.Get(Rnd(0, words.Size)))
    Next
    Log($"Creating 10,000 sample lookUp words  ==>  Time = ${DateTime.Now - markTime} msecs"$)
    markTime = DateTime.now
    
    For i = 0 To 9999
        Dim w As String = lookupWords.Get(i)
        Dim found As Boolean
        For j = 0 To DSList.Size - 1
            Dim s As String = DSList.Get(j)
            If s = w Then
                found = True
                Exit
            End If
        Next
        If found Then
            'Do something
        End If
    Next
    Log($"Search for 10,000 random words in a list ==>  Time = ${DateTime.Now - markTime} msecs"$)

    markTime = DateTime.now
    For i = 0 To 9999
        Dim w As String = lookupWords.Get(i)
        Dim found As Boolean = DSSet.Contains(w)
        If found Then
            'Do something
        End If
    Next
    Log($"Search for 10,000 random words in a B4XSet ==>  Time = ${DateTime.Now - markTime} msecs"$)
    markTime = DateTime.now
End Sub
 
Upvote 0

emexes

Expert
Licensed User
Also: when the hit rate decreases, so will the list speed, whereas the set speed should stay constant.

If 100% hit rate ie all words found, then on average will have to search through half of list.

If 0% hit rate ie no words found, then on average will have to search through entire list.
 
Upvote 0

emexes

Expert
Licensed User
'Search For 10,000 random words in a list ==> Time = 9390 msecs 'a linear search
'Search For 10,000 random words in a B4XSet ==> Time = 3 msecs 'a hash lookup

back-of-envelope summary = a set does in a second what a list does in an hour

(for this n ie hundreds of thousands)
 
Last edited:
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
Here's the difference!

All lookup words found:
'Creating 10,000 sample lookUp words ==> Time = 2 msecs
'Search For 10,000 random words in a list ==> Time = 9390 msecs
'Search For 10,000 random words in a B4XSet ==> Time = 3 msecs

No lookup words found (I append a ? at end of each of the 10,000 words) :
'Creating 10,000 sample lookUp words - not in Dataset ==> Time = 6 msecs
'Search For 10,000 random words not in a list ==> Time = 20698 msecs
'Search For 10,000 random words not in a B4XSet ==> Time = 3 msecs
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
@emexes
Your back-of-the-envelope calcs are ballpark but close enough.

Hash tables were a major breakthru. The superfast one-pass WATFOR Fortran compiler was one of the first to use it.
WATFOR stands for Waterloo (University) Fortran, my Alma Mater.

B4X:
    Dim msecsPerHour As Int = 60 * 60 * 1000
    Dim N10000List As Float = msecsPerHour / (.5 * (9390 + 20698))  'per hour averaging best and worst times
    Log(10000 * N10000List) '2,392,981 lookups

    Dim N10000Set As Float = 1000 / (.5 * (3 + 3))  'per second
    Log(10000 * N10000Set) '3,333,334 lookups
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
'Search For 10,000 random words in a list with 10 items ==> Time = 6 msecs
'Search For 10,000 random words in a B4XSet with 10 items ==> Time = 1 msecs

'Search For 10,000 random words in a list with 100 items ==> Time = 16 msecs
'Search For 10,000 random words in a B4XSet with 100 items ==> Time = 2 msecs

'Search For 10,000 random words in a list with 500 items ==> Time = 38 msecs
'Search For 10,000 random words in a B4XSet with 500 items ==> Time = 2 msecs

'Search For 10,000 random words in a list with 1000 items ==> Time = 44 msecs
'Search For 10,000 random words in a B4XSet with 1000 items ==> Time = 2 msecs

'Search For 10,000 random words in a list with 2000 items ==> Time = 76 msecs
'Search For 10,000 random words in a B4XSet with 2000 items ==> Time = 3 msecs

'Search For 10,000 random words in a list with 5000 items ==> Time = 165 msecs
'Search For 10,000 random words in a B4XSet with 5000 items ==> Time = 3 msecs

The B4XSet seems not too sensitive to source list size.
But certainly for anything below 100 items, a simpler (not sure about the meaning of simpler) approach using a list would not be a problem in most cases.

Note that the list approach gives you not just if it is there but also what it's index is. The B4XSet method does not give you an index.
 
Upvote 0

emexes

Expert
Licensed User
Good reason to use Lists, very often.

Can have your cake and eat it too by using Maps. Although then we have this possible future hurdle:

1734744357395.png
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
Using a Map gives the same result as using B4XSet (3 msecs for the same task). if you store the index as the the value you can have the best of both worlds.

If you use 'GetDefault', you don't need 'ContainsKey'

B4X:
    RndSeed(314159826)
    Dim lookupWords As List
    lookupWords.Initialize
    For i = 0 To 9999
        Dim w As String = words.Get(Rnd(0, words.Size))
        lookupWords.Add(w)
    Next
    Log($"Creating 10,000 sample lookUp words  ==>  Time = ${DateTime.Now - markTime} msecs"$)
    'Creating 10,000 sample lookUp words  ==>  Time = 3 msecs

    markTime = DateTime.now
    Dim DSMap As Map
    DSMap.Initialize
    Dim n As Int
    For i = 0 To words.Size - 1
        Dim w As String = words.Get(i)
        DSMap.Put(w, i)
    Next
    Log($"Loading ${words.Size} words into a Map  ==>  Time = ${DateTime.Now - markTime} msecs"$)
    'Loading 370103 words into a Map  ==>  Time = 66 msecs
 
    markTime = DateTime.now
    For i = 0 To 9999
        Dim w As String = lookupWords.Get(i)
        Dim found As Int = DSMap.GetDefault(w, -1)
        If found > - 1 Then
            'Log(w, found)
            'Do something
        End If
    Next
    Log($"Search for 10,000 random words in a Map ==>  Time = ${DateTime.Now - markTime} msecs"$)
    'Search for 10,000 random words in a Map ==>  Time = 3 msecs
    markTime = DateTime.now
 
    Dim testWord As String = lookupWords.Get(5000)
    Log(testWord & TAB & DSMap.GetDefault(w, -1))
    'uniformized    189779
 
Upvote 0
Top