Android Question [Help Needed] Regex to Parse Html fails in IDE

konradwalsh

Active Member
Licensed User
Longtime User
Hello
I have been reading up on other threads but can seem to find a reliable method of dealing with this..

I use a httpjob to download a html file
I receive a string like this
id,name,ssize,protein,carbs,fat,fiber,user,over,sce 25821235,"Muller Light Toffee Yogurt","175g pot",7.17,13.8,0.175,0,1,0,0

How can I parse this out so that it is usable..?
I tried putting it in a JSONArray but that failed..

I appreciate any help
 

konradwalsh

Active Member
Licensed User
Longtime User
Thanks very much for helping.

Yes but that to me seemed very cumbersome.. I was hoping to try somehow put it in an array that I can pull the values from the index
So the first 10 items become the keys and the second ten the values..
the reason is that I get amounts of data depending on the query.. so I wanted to avoid mistakes..
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
again.. thanks for trying to point me..

But how would I do this... My ideas are all flat..
I am trying right now to use stringfunctions
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
you will run into problems as the list above is not correct. There is a missing comata
B4X:
id,name,ssize,protein,carbs,fat,fiber,user,over,sce 25821235,"Muller Light Toffee Yogurt"
after sce

Please note that i put manualy the missing comata

B4X:
    Dim text As String = $"id,name,ssize,protein,carbs,fat,fiber,user,over,sce,25821235,"Muller Light Toffee Yogurt","175g pot",7.17,13.8,0.175,0,1,0,0"$

    Dim strarr() As String = Regex.split("\,",text)
    For i = 0 To strarr.Length-1
      Dim mystrings As String = strarr(i)
        mystrings = mystrings.Replace($"""$,"")
        Log($"${i+1}: ${mystrings}"$)
    Next   
    Dim len As Int = strarr.Length/2
    Log(len)
   
    Dim m As Map
    m.Initialize
    For i = 0 To len-1
        m.Put(strarr(i),strarr(i+10))
    Next
    Log(m)
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
Just so you know I am not trying to be overly lazy

This is where I am heading
B4X:
 sf.Initialize
                    Dim ProductMap As Map
                    Dim aList As List = sf.split(htmlresult,",")
                     For i = 0 To 9
                         ProductMap.Put(aList(i),aList(i + 10))
                       
                        
                        
                    Next
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
Hey
You will be delighted to know that I took your working example and made it massively overcomplicated and of course it doesn't work..

I realised that space where you mentioned you manually placed the comma is significant. It will appear in all results
So i took you code.. TRIED to split by *SPACE* which also has undesirable splits in the values
so I wanted to join the values back together

anyway.. I am failing at the first hurdle..
my first split, splits it the word and removes the word

What is the correct syntax for splitting at a space?
I tried "\ " and of course " "

B4X:
 Dim firstSplit() As String = Regex.split(" ",res)
    Log (firstSplit)
    'keep firstSplit(0) re joing 1 and 2
    Dim joinSplit As String = firstSplit(1) & firstSplit(2)
    Log(joinSplit)
   
   
    Dim SplitIDS() As String = Regex.split("\,",firstSplit(0))
    Dim SplitValues() As String = Regex.split("\,",joinSplit)


   
    For i = 0 To SplitIDS.Length-1
      Dim myIDstrings As String = SplitIDS(i)
        myIDstrings = myIDstrings.Replace($"""$,"")
        Log($"${i+1}: ${myIDstrings}"$)
    Next  
   
     For i = 0 To SplitValues.Length-1
      Dim myValuestrings As String = SplitValues(i)
        myValuestrings = myValuestrings.Replace($"""$,"")
        Log($"${i+1}: ${myValuestrings}"$)
    Next 
   
   
    Dim len As Int = SplitIDS.Length
    Log(len)
  
    Dim m As Map
    m.Initialize
    For i = 0 To len-1
        m.Put(SplitIDS(i),SplitValues(i))
    Next
    Log(m)
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
I dont know why this took so long to find.. but I found the correct syntax for removing a space
B4X:
"\s+"

So in case anyone needs this...
My final code is:
B4X:
    Dim firstSplit() As String = Regex.split("\s+",res)
    Log (firstSplit)
    'keep firstSplit(0) re join 1 and 2
    Dim joinSplit As String = firstSplit(1) & " " & firstSplit(2)
    Log(joinSplit)
   
    'put these in a new arrays
    Dim SplitIDS() As String = Regex.split("\,",firstSplit(0))
    Dim SplitValues() As String = Regex.split("\,",joinSplit)


    'remove unwanted ""
    For i = 0 To SplitIDS.Length-1
      Dim myIDstrings As String = SplitIDS(i)
        myIDstrings = myIDstrings.Replace($"""$,"")
        Log($"${i+1}: ${myIDstrings}"$)
    Next  
   
     For i = 0 To SplitValues.Length-1
      Dim myValuestrings As String = SplitValues(i)
        myValuestrings = myValuestrings.Replace($"""$,"")
        Log($"${i+1}: ${myValuestrings}"$)
    Next 
   
   
    Dim len As Int = SplitIDS.Length
    Log(len)
   'map out results
    Dim m As Map
    m.Initialize
    For i = 0 To len-1
        m.Put(SplitIDS(i),SplitValues(i))
    Next
    Log(m)
 
Upvote 0

b4auser1

Well-Known Member
Licensed User
Longtime User
If the strings contain "," as a part of data, not as a delimeter, then the code above will get into the problems.
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
If the strings contain "," as a part of data, not as a delimeter, then the code above will get into the problems.
You are right..

Can anyone tell be a better regex ..
I need to ssplit at the 1st space only...
I dont want to split in between " "
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
again...
this is what I am working with that seems to work in online simulation
B4X:
\s+(?=(?:[^"]*"[^"]*")*[^"]*$)

but this causes errors in the ide even when I wrap the quotes in $$
 
Upvote 0

konradwalsh

Active Member
Licensed User
Longtime User
ok..so i fixed it by moving the regex command to a string first..
B4X:
 Dim myRegexSolution As String = $"\s+(?=(?:[^"]*"[^"]*")*[^"]*$)"$
                         Dim firstSplit() As String = Regex.split(myRegexSolution,res)

but i am still not convinced this is reliable...
 
Upvote 0
Top