Android Question Regex help

Sergey_New

Well-Known Member
Licensed User
Longtime User
It is necessary to remove the value <A HREF="UID50B5C7EEF7"> from the given string.
The highlighted characters can be any letters or numbers in any quantity.
Please help with the regular expression.
 
Solution
with a generic function to extract the values and do whatever you want
B4X:
    Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
    Dim sb As StringBuilder
    sb.Initialize
    For Each s As String In ExtractAllTagValues(html)
        Log(s)
        sb.Append(s).Append(" ")
    Next
    Log(sb.ToString.Trim)

B4X:
Public Sub ExtractAllTagValues(html As String) As List
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, html)
    Dim Values As List
    Values.Initialize
    Do While Matcher.Find
        If Not(Matcher.Group(2) = Null) Then Values.Add(Matcher.Group(2))
        If...

Sergey_New

Well-Known Member
Licensed User
Longtime User
Do you want to sort the extraction of tag values in a defined order?
Yes, first the word that does not contain "<A HREF=", then the rest, without taking into account sorting.
?
First the Outside Tag
Of the six given word combination options, there can only be one option, not all six.
 
Upvote 0

RichardN

Well-Known Member
Licensed User
Longtime User
@Sergey_New

Take a look at RegexBuddy. It provides a visual interface to build and test Regex using several different input strings.

Not only can you check an intended result outside the B4X IDE environment, but it is also a great visual vehicle for learning Regex.
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
First the Outside Tag
Checked, it is output in the order I asked for.
I would like to see group names used (instead of numbers), including for non-output string values (UIDD0, UID50).
TILogistic, thank you very much for your help!
 
Last edited:
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
I would like to see group names used (instead of numbers)
I read the help for Matcher for B4X.
It turns out that Matcher does not have the ability to name groups, unlike other languages.
Please help with a pattern:
B4X:
Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
so that two more groups are added for the values UIDD0, UID50. Original line:
B4X:
html = $"<A HREF="UIDD0">Nombre Apellido</A><A HREF="UID50">Origen 1 2</A>Tarea 1 2"$
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
If no one has suggested a solution, then I will use aeric advice:
B4X:
Sub Process_Globals
    Private HtmlParser As MiniHtmlParser
    Type Link (Name As String, Data As String)
End Sub

Sub AppStart (Args() As String)
    FindLink
End Sub

Sub FindLink
    HtmlParser.Initialize
    Dim txt As String = $"<A HREF="UID50B">Source</A>Task<A HREF="UIDD03">Name</A>"$
    Dim root As HtmlNode = HtmlParser.Parse(txt)
    Dim links As List
    links.Initialize
    Dim nodes As Int = root.Children.Size
    For i = 0 To nodes - 1
        Dim node As HtmlNode = root.Children.Get(i)
        Dim lnk As Link
        lnk.Initialize
        Select node.Name
            Case "A"
                lnk.Name = HtmlParser.GetTextFromNode(node, 0)
                lnk.Data =HtmlParser.GetAttributeValue(node,"HREF","value")
            Case "text"
                lnk.Name = HtmlParser.GetAttributeValue(node, "value", "")
                lnk.Data = ""
        End Select
        links.Add(lnk)
    Next
    links.SortType("Data",True)
End Sub
aeric, thank you!
 
Upvote 0
Top