Android Question Regex help

Sergey_New

Well-Known Member
Licensed User
Longtime User
It is necessary to remove the value <A HREF="UID50B5C7EEF7"> from the given string.
The highlighted characters can be any letters or numbers in any quantity.
Please help with the regular expression.
 
Solution
with a generic function to extract the values and do whatever you want
B4X:
    Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
    Dim sb As StringBuilder
    sb.Initialize
    For Each s As String In ExtractAllTagValues(html)
        Log(s)
        sb.Append(s).Append(" ")
    Next
    Log(sb.ToString.Trim)

B4X:
Public Sub ExtractAllTagValues(html As String) As List
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, html)
    Dim Values As List
    Values.Initialize
    Do While Matcher.Find
        If Not(Matcher.Group(2) = Null) Then Values.Add(Matcher.Group(2))
        If...

drgottjr

Expert
Licensed User
Longtime User
i'm not sure i understand what you're trying to removed, but look at these:
B4X:
    Dim s As String = $"<A HREF="UID50B5C7EEF7">"$
    Dim matcher As Matcher = Regex.Matcher($"<A HREF="(.+?)">"$, s)
        If matcher.Find Then
        s = Regex.Replace(matcher.Group(1),s,"")
        Log("s now: " & s)
    End If

    Dim s2 As String = $"<A HREF="UID50B5C7EEF7">"$
    Dim matcher As Matcher = Regex.Matcher($"(<A HREF=".+?">)"$, s2)
    If matcher.Find Then
        s2 = Regex.Replace(matcher.Group(1),s2,"")
        Log("s2 now: " & s2)
    End If
 

Attachments

  • regex1.png
    20.7 KB · Views: 49
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
I apologize for poorly expressing the required actions.
The line in which the replacement needs to be made:
<A HREF="UID50B5C7EEF789AB44AECFF">Source </A>Task<A HREF="UIDD034051C4B77FA419DC48"> Name</A>
I need to get:
Source Task Name
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
there's this:
B4X:
    Dim s As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Source </A>Task<A HREF="UIDD034051C4B77FA419DC48"> Name</A>"$
    Dim matcher As Matcher = Regex.Matcher($"<A HREF="(.+?)">Source </A>Task<A HREF="(.+?)"> Name</A>"$,s)
    If matcher.Find Then
        Log(" source = " & matcher.Group(1))
        Log(" name = " & matcher.Group(2))
        
    End If
 

Attachments

  • regex1.png
    16.9 KB · Views: 45
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
Your code returns:
B4X:
 source = UID50B5C7EEF789AB44AECFF
 name = UIDD034051C4B77FA419DC48
But you need to get the string:
B4X:
Dim s As String = "Source Task Name"
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
B4X:
Sub Process_Globals
    Private HtmlParser As MiniHtmlParser
End Sub

B4X:
Dim SB As StringBuilder
SB.Initialize
Dim text As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Source </A>Task<A HREF="UIDD034051C4B77FA419DC48"> Name</A>"$
HtmlParser.Initialize
Dim root As HtmlNode = HtmlParser.Parse(text)
Dim nodes As Int = root.Children.Size
For i = 0 To nodes - 1
    Dim node As HtmlNode = root.Children.Get(i)
    Select node.Name
        Case "A"
            Dim value As String = HtmlParser.GetTextFromNode(node, 0)
        Case "text"
            Dim value As String = HtmlParser.GetAttributeValue(node, "value", "")
    End Select
    'Log(value)
    SB.Append(value)
Next
Log(SB.ToString)
 
Upvote 0

alwaysbusy

Expert
Licensed User
Longtime User
I apologize for poorly expressing the required actions.
The line in which the replacement needs to be made:
<A HREF="UID50B5C7EEF789AB44AECFF">Source </A>Task<A HREF="UIDD034051C4B77FA419DC48"> Name</A>
I need to get:
Source Task Name
If these are the requirements, then this regex should do it:

B4X:
Dim s As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Source </A>Task<A HREF="UIDD034051C4B77FA419DC48"> Name</A>"$
   
Dim result As String = Regex.Replace("<[^>]+>",s, "")
Log(result)

Result:
B4X:
Source Task Name

Alwaysbusy
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
Alwaysbusy, thank you, that's what I wanted!
I would also like to ask you to clarify how the pattern should be changed if each of the words Source, Task, Name can consist of several words and spaces between them?
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
Other
B4X:
Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
Log(Regex.Replace("<([^>]+)>",html, ""))
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
To extract values inside and outside the tags of an html string
use:
B4X:
    Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, html)

    Do While Matcher.Find
        Dim InsideTag As String = Matcher.Group(2)
        Dim OutsideTag As String = Matcher.Group(3)

        If Not(InsideTag = Null) Then Log(InsideTag)
        If Not(OutsideTag = Null) Then     Log(OutsideTag)
    Loop

 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
Thank you!
The result obtained must contain a space before the word "Nombre"
B4X:
    Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, html)

    Dim sb As StringBuilder
    sb.Initialize
    
    Do While Matcher.Find
        Dim InsideTag As String = Matcher.Group(2)
        Dim OutsideTag As String = Matcher.Group(3)

        If Not(InsideTag = Null) Then
            sb.Append(InsideTag).Append(" ")
            Log(InsideTag)
        End If
        If Not(OutsideTag = Null) Then
            sb.Append(OutsideTag).Append(" ")
            Log(OutsideTag)
        End If
    Loop
    Log(sb.ToString.Trim)
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
with a generic function to extract the values and do whatever you want
B4X:
    Dim html As String = $"<A HREF="UID50B5C7EEF789AB44AECFF">Origen 1 2 </A>Tarea 1 2<A HREF="UIDD034051C4B77FA419DC48">Nombre Apellido</A>"$
    Dim sb As StringBuilder
    sb.Initialize
    For Each s As String In ExtractAllTagValues(html)
        Log(s)
        sb.Append(s).Append(" ")
    Next
    Log(sb.ToString.Trim)

B4X:
Public Sub ExtractAllTagValues(html As String) As List
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, html)
    Dim Values As List
    Values.Initialize
    Do While Matcher.Find
        If Not(Matcher.Group(2) = Null) Then Values.Add(Matcher.Group(2))
        If Not(Matcher.Group(3) = Null) Then Values.Add(Matcher.Group(3))
    Loop
    Return Values
End Sub
 
Upvote 0
Solution

Sergey_New

Well-Known Member
Licensed User
Longtime User
TILogistic, thank you very much! You solved my problem. I still have a task, how to first determine the value of the string "Tarea 1 2", and then the values of the remaining strings. This string can be at the beginning, in the middle or at the end of the entire string. Of course, if I am not too annoying with my questions.
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
Post an example of what you want.
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
Post an example of what you want.
The lines can be:
B4X:
html = $"Tarea 1 2<A HREF="UIDD0">Nombre Apellido</A><A HREF="UID50">Origen 1 2</A>"$
html = $"Tarea 1 2<A HREF="UID50">Origen 1 2</A><A HREF="UIDD0">Nombre Apellido</A>"$
html = $"<A HREF="UID50">Origen 1 2</A>Tarea 1 2<A HREF="UIDD0">Nombre Apellido</A>"$
html = $"<A HREF="UIDD0">Nombre Apellido</A>Tarea 1 2<A HREF="UID50">Origen 1 2</A>"$
html = $"<A HREF="UID50">Origen 1 2</A><A HREF="UIDD0">Nombre Apellido</A>Tarea 1 2"$
html = $"<A HREF="UIDD0">Nombre Apellido</A><A HREF="UID50">Origen 1 2</A>Tarea 1 2"$
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
Do you want to sort the extraction of tag values in a defined order?
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
?
First the Outside Tag
B4X:
Public Sub Test1
    Dim html(6) As String
    html(0) = $"Tarea 1 2<A HREF="UIDD0">Nombre Apellido</A><A HREF="UID50">Origen 1 2</A>"$
    html(1) = $"Tarea 1 2<A HREF="UID50">Origen 1 2</A><A HREF="UIDD0">Nombre Apellido</A>"$
    html(2) = $"<A HREF="UID50">Origen 1 2</A>Tarea 1 2<A HREF="UIDD0">Nombre Apellido</A>"$
    html(3) = $"<A HREF="UIDD0">Nombre Apellido</A>Tarea 1 2<A HREF="UID50">Origen 1 2</A>"$
    html(4) = $"<A HREF="UID50">Origen 1 2</A><A HREF="UIDD0">Nombre Apellido</A>Tarea 1 2"$
    html(5) = $"<A HREF="UIDD0">Nombre Apellido</A><A HREF="UID50">Origen 1 2</A>Tarea 1 2"$
    
    For i = 0 To html.Length - 1
        For Each s As String In ExtractAllTagHtmlValues(html(i))
            Log(s)
        Next
        Log("--------------------")
    Next
End Sub
B4X:
Public Sub ExtractAllTagHtmlValues(Html As String) As List
    Dim Pattern As String = "(<[^>]+>([^<]+)</[^>]+>)|([^<]+)"
    Dim Matcher As Matcher = Regex.Matcher(Pattern, Html)
    Dim Values As List
    Values.Initialize
    Do While Matcher.Find
        If Not(Matcher.Group(2) = Null) Then Values.Add(Matcher.Group(2))
        If Not(Matcher.Group(3) = Null) Then Values.InsertAt(0,Matcher.Group(3))
    Loop
    Return Values
End Sub
B4X:
Call B4XPages.GetManager.LogEvents = True to enable logging B4XPages events.
Tarea 1 2
Nombre Apellido
Origen 1 2
--------------------
Tarea 1 2
Origen 1 2
Nombre Apellido
--------------------
Tarea 1 2
Origen 1 2
Nombre Apellido
--------------------
Tarea 1 2
Nombre Apellido
Origen 1 2
--------------------
Tarea 1 2
Origen 1 2
Nombre Apellido
--------------------
Tarea 1 2
Nombre Apellido
Origen 1 2
--------------------
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…