Android Question Parsing HTML Span element

Almora

Well-Known Member
Licensed User
Longtime User
hi..

How can I get the value of the "span" element.
Two separate values are required. (dataid="101" and dataid="102")
How can I get the values "615" and "715".
thanks...


B4X:
            Dim st As String = $"   
                       <div class="list-group">
                       <input type="checkbox" dataid="101" />
                       <a href="techno1">2023</a>
                       <span> 615 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                                          
                       <div class="list-group">
                       <input type="checkbox" dataid="102" />
                       <a href="techno2">2023</a>
                       <span> 715 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                       "$
    Dim parser As MiniHtmlParser
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(st)
    Dim listgroup As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "list-group"))
    If listgroup.IsInitialized Then
        Dim span1 As HtmlNode = parser.FindNode(listgroup, "a", Null)
        
          Log(parser.GetAttributeValue(span1, "href", ""))
 
    End If

log(techno)
 
Solution
and if you really wanted to help the guy:
B4X:
    text = text.Replace(CRLF,"")
    Dim pattern As String = $"(dataid="10[12]").*?n> (\d+) <"$
    Dim mat As Matcher = Regex.Matcher2(pattern, Regex.MULTILINE, text)
    Dim dataid101, dataid102 As String
    Do While mat.Find
        Log(mat.Group(1).Replace("=","").Replace(QUOTE,"") & " = " & mat.Group(2))
        If mat.Group(1).Contains("101") Then
            dataid101 = mat.Group(2)
        else if mat.Group(1).Contains("102") Then
            dataid102 = mat.Group(2)
        End If
    Loop

    Log( dataid101)
    Log( dataid102)

this will feed the values into predefined variables for his convenience.
and, of course, this is meant to solve a particular problem as described by...

walterf25

Expert
Licensed User
Longtime User
hi..

How can I get the value of the "span" element.
Two separate values are required. (dataid="101" and dataid="102")
How can I get the values "615" and "715".
thanks...


B4X:
            Dim st As String = $" 
                       <div class="list-group">
                       <input type="checkbox" dataid="101" />
                       <a href="techno1">2023</a>
                       <span> 615 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                                        
                       <div class="list-group">
                       <input type="checkbox" dataid="102" />
                       <a href="techno2">2023</a>
                       <span> 715 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                       "$
    Dim parser As MiniHtmlParser
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(st)
    Dim listgroup As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "list-group"))
    If listgroup.IsInitialized Then
        Dim span1 As HtmlNode = parser.FindNode(listgroup, "a", Null)
      
          Log(parser.GetAttributeValue(span1, "href", ""))
 
    End If
You can use Regular Expressions to do that, something like this will work

B4X:
    Dim pattern As String = "<span\b[^>]*>(.*?)<\/span>"
    Dim text As String = $"
                       <div class="list-group">
                       <input type="checkbox" dataid="101" />
                       <a href="techno1">2023</a>
                       <span> 615 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                                         
                       <div class="list-group">
                       <input type="checkbox" dataid="102" />
                       <a href="techno2">2023</a>
                       <span> 715 <a href="techno"><i class="fa fa-chevron-circle-right"></i></a></span>
                       </div>
                       "$
                     
    Dim matcher1 As Matcher
    matcher1 = Regex.Matcher(pattern, text)
    Do While matcher1.Find
        Log("match: " & matcher1.Group(1).SubString2(0, 4))
    Loop
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
or this too:
B4X:
    Dim pattern As String = $"n>\s(.+?)\s<a"$
    Dim mat As Matcher = Regex.Matcher2(pattern, Regex.MULTILINE, text)
    Do While mat.Find
        Log(mat.Group(1))
    Loop
 
Upvote 0

walterf25

Expert
Licensed User
Longtime User
or this too:
B4X:
    Dim pattern As String = $"n>\s(.+?)\s<a"$
    Dim mat As Matcher = Regex.Matcher2(pattern, Regex.MULTILINE, text)
    Do While mat.Find
        Log(mat.Group(1))
    Loop
Even more elegant solution.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
even shorter - my miniregex::)
Dim pattern As String = $"> (\d+) <"$
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
and if you really wanted to help the guy:
B4X:
    text = text.Replace(CRLF,"")
    Dim pattern As String = $"(dataid="10[12]").*?n> (\d+) <"$
    Dim mat As Matcher = Regex.Matcher2(pattern, Regex.MULTILINE, text)
    Dim dataid101, dataid102 As String
    Do While mat.Find
        Log(mat.Group(1).Replace("=","").Replace(QUOTE,"") & " = " & mat.Group(2))
        If mat.Group(1).Contains("101") Then
            dataid101 = mat.Group(2)
        else if mat.Group(1).Contains("102") Then
            dataid102 = mat.Group(2)
        End If
    Loop

    Log( dataid101)
    Log( dataid102)

this will feed the values into predefined variables for his convenience.
and, of course, this is meant to solve a particular problem as described by the op.
 
Last edited:
Upvote 0
Solution
Top