Android Question RegEx Matcher with HTML Code

hasexxl1988

Active Member
Licensed User
Longtime User
Hello,
i have follow problem:

i have HTML Code from a Website:

Example:
HTML:
<span itemprop='name'>
                Ferrari TestCar
            </span>

i need the Name of the Car.

i have tryed with following Code:

B4X:
If Job.JobName = "PageJob" Then
            Dim mAutoName As Matcher = Regex.Matcher("<span itemprop='name'>""([^""]+)""</span>", Job.GetString)
            Do While mAutoName.Find
                namelinks.Add(mAutoName.Group(1))
            Loop
            BuildItems
        End If

Result is only: []

Download and Job function works perfect with my ImageDownloader

Images URLs with this Code Working:
B4X:
Dim m As Matcher = Regex.Matcher("src=\""https://mywebsite/mmo([^""]+)""", Job.GetString)

i have found de RegEx Pattern List: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Unfortunately, I do not know how to put together the value that the HTML code is removed
 
Last edited:

sorex

Expert
Licensed User
Longtime User
do a replace of linefeeds, tabs and double spacings (it makes it a lot easier) and then try

B4X:
Regex.Matcher("<span itemprop='name'>(.*?)</span>", Job.GetString)
 
Upvote 0

hasexxl1988

Active Member
Licensed User
Longtime User
Not Working :/

i have try:
B4X:
If Job.JobName = "PageJob" Then
            Dim xtemp As String
            xtemp = Job.GetString
            Log ("IndexOf: " & xtemp.IndexOf("<span itemprop='name'>"))
            Dim m As Matcher = Regex.Matcher("<span itemprop='name'>(.*?)</span>", Job.GetString)
            Do While m.Find
                Log (m.Group(1))
                    namelinks.Add(m.Group(1))
            Loop
            BuildItems
        End If

Log result with IndexOf: IndexOf: 112851

With IndexOf i can find the <span itemprop='name'> in the String. With Matcher not found.
 
Upvote 0

udg

Expert
Licensed User
Longtime User
Hi,
I tried the following on an on-line regex tool and it works, altough I don't think is an elegant solution; it simply works with data from post #1.
B4X:
<span itemprop='name'>\s*(.*)\s*<\/span>
In Group 1 you read Ferrari TestCar.
Fundamentally it matches any number of whitespaces after "'name'>", followed by the group containing the car model, followed again by any number of whitespaces chars, finally followed by </span>
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…