How to extract a list out of html code using a loop

netchicken

Active Member
Licensed User
Longtime User
I have html code extracted from a search query on the net as a string passed to listextract. In that code bracketed by <H3>'s are the names I want to put into a list box. Also in that code is a lot of extraneous stuff I don't want.

So I can extract out the first <h3> to </h3> OK but then I can't get the next ones. I used a standard FOR NEXT loop in the main SUB but it only ever got me the first field and then repeated it for the length of listextract.

Here is the attached program, I was going to give it to the community when I finished it anyway :)

Anyone know an easier way?

B4X:
Try
For  i=0  To  listextract.Length - 1

result = listextract.SubString(listextract.IndexOf("<h3>"))
result = result.SubString2(4, result.IndexOf("</h3>")) 

'cut it out again to get to the text
Dim resultsword As String

resultsword = result.SubString(result.IndexOf(">"))
resultsword = resultsword.SubString2(1, resultsword.IndexOf("<")) 

'put it in the list
lvsearch.AddSingleLine( i & " " & resultsword)
       
Next

Catch
ToastMessageShow("end",False) 
End Try

edit: fixed attachment to show how it works (or doesn't)
 
Last edited:

netchicken

Active Member
Licensed User
Longtime User
Thanks Erel.

I sort of solved it by cutting out the extracted code from the string and then running the ever diminishing loop again. It doesn't end nicely but it seems to work, now to tie it in to the program :)

Here is the program for anyone, the striing used is downloaded from the net when it runs its a search for movies starting with Blade

I am sure reg ex would be faster! I just need to learn it one day :)
edit: Changed the loop to end with 5 runs For i=0 To 4 ... so easy ..... sigh

B4X:
Try
For  i=0  To  listextract.Length - 1

result = listextract'.SubString(listextract.IndexOf("<h3>"))
result = result.SubString2(4, result.IndexOf("</h3>")) 
'<h3 class="nomargin">
'<a href="/m/1083484-blade/"  class="" >Blade</a>
'<span class="movie_year"> (1998)</span></h3>



'cut it out again
Dim resultsword As String
resultsword = result.SubString(result.IndexOf(">"))
resultsword = resultsword.SubString2(1, resultsword.IndexOf("</a>")) 


Dim resultsyear As String
'<span class="movie_year"> (1998)</span></h3>
resultsyear = result.SubString(result.IndexOf("("))
resultsyear = resultsyear.SubString2(1, resultsyear.IndexOf(")")) 

'Dim resultslink As String
''<a href="/m/1083484-blade/" 
'resultslink = result.SubString(result.IndexOf("/m/"))
'resultslink = resultslink.SubString2(3, resultslink.IndexOf("/")) 



'put it in the list 
      lvsearch.AddSingleLine( i & " " & resultsword & " " & resultsyear)
      'lvsearch.AddSingleLine( resultsword & " " & resultsyear)
       lvsearch.SingleLineLayout.ItemHeight = 50
       lvsearch.SingleLineLayout.Label.TextSize = 20
      lvsearch.FastScrollEnabled = True

'Dim replaceresult As String
'replaceresult = "<h3>" & result & "</h3>"

'result = result.SubString(result.IndexOf("</h3>"))

'delete out the text just used above from teh string
listextract = listextract.Replace(result,"")

'delete out the h3's left over from that string
listextract = listextract.Replace("<h3></h3>","")

'delete everything to the next h3 in the string
listextract = listextract.SubString(listextract.IndexOf("<h3>"))
lblsynopsis.Text = listextract
Next



Catch
ToastMessageShow("bugger",True) 
Return
End Try
 

Attachments

  • movie program.zip
    27.1 KB · Views: 232
Last edited:
Upvote 0
Top