I need to strip some tags from an html file. I have the lines of code below. One line works and the other doesn't. Any help would be greatly appreciated.
html = html.Replace(">", ">" & CRLF) <-- This line works
html = html.Replace("/<head>.*?<\/head>/is", "") <-- This line doesn't
the second line wont work since string.replace is not regecp replace, its a string replace only call.
in order to use regexp you need something like this sub:
B4X:
Sub RegexReplace(Pattern As String, Text As String, Replacement As String) As String
Dim m As Matcher
m = Regex.Matcher(Pattern, Text)
Dim r As Reflector
r.Target = m
Return r.RunMethod2("replaceAll", Replacement, "java.lang.String")
End Sub
' example of how to use it
sub parser
Dim s As String = Utilities.RegexReplace("<head>.*?<\/head>", "jlasdkj <head> yes </head>more!","")
Log ("---" & s)
end sub
The problem I am having now is that the <head> and </head> tags are not on the same line. So the regex code does not find a match and replace the tags. Do you have any other suggestions to fix the code?