Android Question [ MiniHtmlParser ] how can i get <img> src inside a div ?

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
hello everyone !
i am trying to get 2 <img> source inside a div called > " playerAvatarAutoSizeInner " , but I came across a " problem ", sometimes depending on the parameter, it may be that there is another div inside " playerAvatarAutoSizeInner ", called > " profile_avatar_frame " and it may not exist ... as in the examples below:

example of div " profile_avatar_frame " inside " playerAvatarAutoSizeInner "
HTML:
<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>

example of div " playerAvatarAutoSizeInner " with only 1 <img>
HTML:
<div class="playerAvatarAutoSizeInner">
         <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/avatars/0c/0c82230c58063b3a7f1beec38528afb0613855be_full.jpg">
</div>

I would like to get only the "src" of the img tags that are inside this div called : " playerAvatarAutoSizeInner ",

but I'm having a hard time getting the src. can anyone help me?
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
B4X:
Sub AppStart (Args() As String)
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>"$
    Dim parser As MiniHtmlParser
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(s)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim imgList As List = parser.FindDirectNodes(playerAvatarAutoSizeInner, "img", Null)
        For Each img As HtmlNode In imgList
            log(parser.GetAttributeValue(img, "src", ""))
        Next
    End If
End Sub
 
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
why is he not getting the 2 links?

my code :
B4X:
    Dim parser As MiniHtmlParser
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(SteamResponse)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    
    Log(HtmlParser.PrintNode(playerAvatarAutoSizeInner))
    
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim imgList As List = parser.FindDirectNodes(playerAvatarAutoSizeInner, "img", Null)
        Log("# list size = "&imgList.Size)
        For Each img As HtmlNode In imgList
            Log("# link img inside list = "&parser.GetAttributeValue(img, "src", ""))
        Next
    End If

logs :
B4X:
*** div ***
|class: playerAvatarAutoSizeInner|
 *** text ***
 |value:  |
 *** div ***
 |class: profile_avatar_frame|
  *** text ***
  |value:  |
  *** img ***
  |src: https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png|
 *** text ***
 |value:  |
 *** img ***
 |src: https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif|
# list size = 1
# link img inside list = https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
I thought that you want to get the img tag that is a direct child of playerAvatarAutoSizeInner.

If you want to find all images inside this div then you should implement a simple recursive search:
B4X:
Sub AppStart (Args() As String)
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>"$
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(s)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim images As List
        images.Initialize
        LookForImgRecursive(playerAvatarAutoSizeInner, images)
        Log(images)
    End If
End Sub

Private Sub LookForImgRecursive (parent As HtmlNode, result As List)
    If parent.Children.IsInitialized = False Then Return
    For Each child As HtmlNode In parent.Children
        If child.Name = "img" Then
            result.Add(parser.GetAttributeValue(child, "src", ""))
        Else
            LookForImgRecursive(child, result)
        End If
    Next
End Sub
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
test:
It only captures the images (src).
B4X:
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
    </div>"$

    Dim pattern As String = "(http[^\s]+(jpg|svg|gif|jpeg|png|tiff)\b)"
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, s)
    Do While Matcher1.Find
        Log(Matcher1.Match)
    Loop
1642953937062.png
 
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
test:
It only captures the images (src).
B4X:
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
    </div>"$

    Dim pattern As String = "(http[^\s]+(jpg|svg|gif|jpeg|png|tiff)\b)"
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, s)
    Do While Matcher1.Find
        Log(Matcher1.Match)
    Loop
View attachment 124601

the problem is that I'm requesting the page > steamcommunity and there are several other links and images that are being loaded, I shared above just the snippet that I need to get the images xD
 
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
I thought that you want to get the img tag that is a direct child of playerAvatarAutoSizeInner.

If you want to find all images inside this div then you should implement a simple recursive search:
B4X:
Sub AppStart (Args() As String)
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>"$
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(s)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim images As List
        images.Initialize
        LookForImgRecursive(playerAvatarAutoSizeInner, images)
        Log(images)
    End If
End Sub

Private Sub LookForImgRecursive (parent As HtmlNode, result As List)
    If parent.Children.IsInitialized = False Then Return
    For Each child As HtmlNode In parent.Children
        If child.Name = "img" Then
            result.Add(parser.GetAttributeValue(child, "src", ""))
        Else
            LookForImgRecursive(child, result)
        End If
    Next
End Sub

works fine ! thanks @Erel and @oparra for attention and support :D
 
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
B4X:
Sub AppStart (Args() As String)
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>"$
    Dim parser As MiniHtmlParser
    parser.Initialize
    Dim root As HtmlNode = parser.Parse(s)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim imgList As List = parser.FindDirectNodes(playerAvatarAutoSizeInner, "img", Null)
        For Each img As HtmlNode In imgList
            log(parser.GetAttributeValue(img, "src", ""))
        Next
    End If
End Sub

I was doing some tests, if I replace the " FindDirectNodes " with " FindNode ", the command : parser.GetAttributeValue(img, "src", "") , returns only the link of the <img> that is inside the child class " profile_avatar_frame ", by logic it should only return the link that is inside "playerAvatarAutoSizeInner" and ignore what is inside " profile_avatar_frame ", am I wrong?

code >
B4X:
Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
</div>"$

    parser.Initialize
    Dim root As HtmlNode = parser.Parse(s)
    Dim playerAvatarAutoSizeInner As HtmlNode = parser.FindNode(root, "div", parser.CreateHtmlAttribute("class", "playerAvatarAutoSizeInner"))
    If playerAvatarAutoSizeInner.IsInitialized Then
        Dim img As HtmlNode = parser.FindNode(playerAvatarAutoSizeInner, "img", Null)
            Log("returned = "&parser.GetAttributeValue(img, "src", ""))
    End If

logs >
B4X:
returned = https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png
 
Upvote 0

TILogistic

Expert
Licensed User
Longtime User
?
B4X:
Private Sub Button1_Click
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
    </div>"$

    Log(GetTagImage(GetTagClassDivHtml(s, "profile_avatar_frame")))
    
End Sub

Public Sub GetTagClassDivHtml(Html As String, Class As String) As String
    Dim pattern As String = $"<div class=\"${Class}\">([^`]*?)<\/div>"$
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, Html)
    Matcher1.Find
    Return Matcher1.Match
End Sub

Public Sub GetTagImage(Html As String) As String
    Dim pattern As String = "(http[^\s]+(jpg|svg|gif|jpeg|png|tiff)\b)"
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, Html)
    Matcher1.Find
    Return Matcher1.Match
End Sub

1642962421783.png
 
Upvote 0

Waldemar Lima

Well-Known Member
Licensed User
Longtime User
?
B4X:
Private Sub Button1_Click
    Dim s As String = $"<div class="playerAvatarAutoSizeInner">
        <div class="profile_avatar_frame">
            <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/465200/e89b3a70625c980c3d68869f5cdb1da9baa447f8.png">
        </div>
        <img src="https://cdn.cloudflare.steamstatic.com/steamcommunity/public/images/items/1504020/397b0a7e2d1355bca92d3e803270f7947ba973aa.gif">
    </div>"$

    Log(GetTagImage(GetTagClassDivHtml(s, "profile_avatar_frame")))
   
End Sub

Public Sub GetTagClassDivHtml(Html As String, Class As String) As String
    Dim pattern As String = $"<div class=\"${Class}\">([^`]*?)<\/div>"$
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, Html)
    Matcher1.Find
    Return Matcher1.Match
End Sub

Public Sub GetTagImage(Html As String) As String
    Dim pattern As String = "(http[^\s]+(jpg|svg|gif|jpeg|png|tiff)\b)"
    Dim Matcher1 As Matcher = Regex.Matcher(pattern, Html)
    Matcher1.Find
    Return Matcher1.Match
End Sub

View attachment 124607

How would I get the 2 links?
 
Upvote 0
Top