Android Question Regexreplace problem

rosippc64a

Active Member
Licensed User
Longtime User
Hi All!
I use the regexreplace function that I found here.
I have an expression to remove unwanted tags from a html webpage:
B4X:
                szov = RegexReplace("<(img|head|nav|form|footer|style|script|noscript|aside|button|label|i\040|input)[^<>]*>((.|\n|\r\n)*?)</\1>",szov," ")
and I tried with this url: view-source:http://www.erdekesvilag.hu/a-patkanyok-temploma-indiaban/
After the replace, in the source of webpage there is a lot of <source... and <style i.e.
If I use a second regexreplace, then the mentioned tags is replaced well (I checked).
B4X:
                szov = RegexReplace("<(img|head|nav|form|footer|style|script|noscript|aside|button|label|i\040|input)[^<>]*>((.|\n|\r\n)*?)</\1>",szov," ")
                If szov.Contains("<script") Then
                    'view-source:http://www.erdekesvilag.hu/a-patkanyok-temploma-indiaban/
                    szov = RegexReplace("<script[^<>]*>((.|\n|\r\n)*?)</script>",szov," ")
                End If
                If szov.Contains("<style") Then
                    szov = RegexReplace("<style[^<>]*>((.|\n|\r\n)*?)</style>",szov," ")
                End If
Do I made any mistake in the first expression why they aren't replaced?
thanks in advance
Steven
 

rosippc64a

Active Member
Licensed User
Longtime User
I tried with a big parenthesis, maybe regexreplace replace the group(0):
B4X:
szov = RegexReplace("(<(img|head|nav|form|footer|style|script|noscript|aside|button|label|i\040|input)[^<>]*>((.|\n|\r\n)*?)</\2>)",szov," ")
, but don't.
 
Upvote 0

rosippc64a

Active Member
Licensed User
Longtime User
The shortests work well (except the empty scripts) but there are a lot of very complex too.
I solved with substrings...
thank you Erel!
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…