Android Question Need help to pull data from a flextable on a webpage

Past11

Member
Hello all!
I need help or better of example of how to pull data of a website "Flextable" i need the gas prices as shown in the image below, the webpage for this is " https://www.ok.dk/privat/produkter/benzinkort/benzinpriser" i have absolutely no idea where to start, i made this in Vb.net many years back when if was pulling data of a label, but i dont know how to pull data out of a table, a small example if it exists om how to do it, would help me most, since i learn best by eamples i hope you can help :) thank you guys!
Fuel.PNG
 

drgottjr

Expert
Licensed User
Longtime User
this is one way to do it:
B4X:
    Dim job As HttpJob
    job.Initialize("",Me)
    job.Download("https://www.ok.dk/privat/produkter/benzinkort/benzinpriser")
    Wait For (job) JobDone(job As HttpJob)
    Dim text As String
    If job.Success Then
        text = job.GetString
    End If
    job.Release
    ' flatten 'er out
    text = Regex.Replace2( "[\r\n]", Regex.MULTILINE, text, "" )
    ' look for row
    Dim rowmatcher As Matcher
    Dim cellmatcher As Matcher
  
    cellmatcher = Regex.Matcher($"<div .*?role="gridcell">(.+?)<\/div>"$, text )
    Do While cellmatcher.Find
        If cellmatcher.Group(1).Contains("button") Then Continue
        Log("got a cell: " & cellmatcher.Group(1))
    Loop
    Log("done")
 

Attachments

  • capture6.png
    capture6.png
    31 KB · Views: 205
Last edited:
Upvote 0

Past11

Member
Hello drgottjr.
Oh my god you ra lifesaver thank you so much, this is something i have been struggling with or a long time, but could not figure out where to start thank you sooooo much :)
 
Upvote 0

Past11

Member
Hello Togo.
Thank you for your suggestion, unfortunately i dont have the skills, yet, since i dont even know what to search for, i just started using b4a about 4 month ago, coming from vb.net as a personal hobby, so i have alot to learn :) but thank you very much for your time and suggestion :)
 
Upvote 0

emexes

Expert
Licensed User
i have absolutely no idea where to start
I'm late to this party and you've already got a ????? solution anyway, but:

a good starting point for data scraping ideas is to load the web page up in a browser and use view source to see if the data you want is in the HTML and which surrounding tags can be used to locate and extract it.
 
Upvote 0

Past11

Member
Hi Emexes.
yes i allready have a working solution although there are some certain points of it which i really dont understand, so any help on "data scraping" would be nice, my idea about this was to make a widget that would compare gas prices from local dealers around me, this code ive got from drgottjr indeed do get what i needed, i thought that by having an example of pulling this data i could modify to allmost any datascraping, like i did in the old days where prices like this was sent to labels, or had "Nametag" i could pull, but it aint so easy anymore :)
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
all page scraping starts with "view source". then a target and, possibly, a pattern.

a big problem nowadays involves dynamically created pages (actually, it's always been an issue). a case in point would be where the the petrol prices were not originally part of the html source, but, rather, were loaded into an html template via an ajax call. viewing the source, and even downloading it with okhttputils would have you scratching your head since you wouldn't see the prices.

if you looked at the source in such a case, you wouldn't see the prices as they are generated at runtime with javascript. in that case, you need to use "inspect element" along with view source. i assume we're talking about chrome's so-called developer's tools. you would probably also have to look at the network activity tab (the prices might have been loaded in a .json file with ajax. that would have made things even easier, as you could skip all the page scraping and simply download the .json directly.)

in the case at hand, i could see the prices with view source, so page scraping would point to the target.

in this case, the pattern was flex-table - a glorified <table></table> construct. instead of a standard table's <tr> and <td> tags, flex-table uses <div> tags
with a "role=" attribute. for example, a flex-table's "cell" has - guess what? - a "role=cell" attribute. all i had to do was match what was inside "<div></div>" blocks where the div's role was a cell. regex to the rescue. matching the rows would have been a lot harder. thankfully, it didn't occur to the op to ask for that.
 
Upvote 0

emexes

Expert
Licensed User
I have sometimes had to resort to programmatically driving a browser and even doing OCR on the screen. Fun stuff.
 
Upvote 0

Past11

Member
Drgottjr.
It makes very good sense, i have ALOT og reading up to do i can see ? im getting more and more into getting my data from the web, so therefore i hoped there was an "easy" way of doing thing, but i think i need to learn alot more than what i initially thought ? thank you very much for your assistance, it helped me to see in what kind of direction i would need to go, i was hoping it was something to do with .json but i will look for that in the future :) thank you so much :)
 
Upvote 0

Past11

Member
Hi Again drgottjr.
Can i ask a hopefully small favour, if too big please decline it, i wanna try and get some " Corona Data" from this website https://nyheder.tv2.dk/samfund/2020...erden-saa-mange-er-smittede-doede-og-indlagte when i look at the source code, the data is available right there, so i hope it would be easy finding an "easy way" of extracting the data in the image below, i just need to get an understanding of how it works, sorry if it is a "dumb" question, i dont really fully understand it thank you :)
 

Attachments

  • 1610578105561.png
    1610578105561.png
    12.2 KB · Views: 197
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
and who doesn't get a warm feeling all over by getting a browser to cough up something it didn't know it knew with some executejavascript?
 
Upvote 0

Past11

Member
i luckily managed to do it, took a bit of going back in the days, but managed to use a "search for text" go the position extract x amount of text and then simply run it down in size its very rough and could easily give some problems later on, as the numbers start to increase, but for now it will do the trick :) thank you så much for your help :) im on the path to learn :)
 
Upvote 0

afields

Member
Licensed User
hi past11
i've read your problems and i think that you should consider that:
- from time to time the html that is under the html pages can be diferent ( that is the places inside the html page)
- thanks to ajax ( a internet tecnology) the all page is not downloaded to the browse client. as you choose something on those pages then a dialog is made with the server pages and it inject the piece of information in that page..
so i would recomend you that:
if it's your html pages ( that is those that you build) then an approach with b4x is a very good one But
if you want something else then you should get a kind of file ( csv, xls, xml ...) first and then work with b4x after.
To get those files you have for exemple (it's an automation program) that builds such files for you ( if you tell that program where to locate the informations that you want...
hope that this also will help you..
 
Upvote 0

Past11

Member
Hi afields.
Thank you very much for your answer, unfortunately it is not my pages with html, i could be anything have alot of different ideas in mind, but allmost everyone of them needs to extract some sort of information from the web, and from what drgottjr showed me, i am very much behind ? i am only a hobby programmer, trying to learn, i did alot of "local pc" coding in vb.net for about 10 years, but this is a whole new world for me, and i am trying to learn as much as i can as fast as i can, so by seeing alot of examples, i can "learn" some of the techniques used to accomplish the tasks :) thank you all for your help guys, you are awesome thank you :)
 
Upvote 0

emexes

Expert
Licensed User
i wanna try and get some " Corona Data" from this website https://nyheder.tv2.dk/samfund/2020...erden-saa-mange-er-smittede-doede-og-indlagte when i look at the source code, the data is available right there, so i hope it would be easy finding an "easy way" of extracting the data in the image below,
Try using same code as before but with this regex:

https://regex101.com/r/5ikcCc/2

for the first datum you highlighted and this regex:

https://regex101.com/r/50UfMS/2

for the second. You need to scroll the TEST STRING box to the bottom to see where it's pulling out the data.

I've assumed you're only interested in the number, not the trailing space. If you *really* need the trailing space(s) too, that's no problem, just move the closing bracket ")" to after the white-space matcher "\s*".

sorry if it is a "dumb" question
There are no dumb questions in this forum, but it always warms my heart when I can see somebody's had a go at their problem before posting. ✌

www.regex101.com is an excellent site for playing around with regexes, seeing how they match sample data, and constructing regexes such as the two above.
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
www.regex101.com is an excellent site for playing around with regexes, seeing how they match sample data, and constructing regexes such as the two above.
For that coronavirus data page you gave a link for, it looks like you can also grab the graph frames directly eg:

https://datawrapper.dwcdn.net/9fvu1/384/

and then grab the daily data from it eg using regex like:

https://regex101.com/r/mSqR12/1

cycling through dates from say 2020-01-01 to tomorrow (to be sure, to be sure) eg 2021-01-15. Yeah that's hugely inefficient but regex matching is pretty fast, and it's way simpler to search one day at a time than decipher a massive collection of matches covering all dates.
 
Upvote 0

Past11

Member
Hi Emexes.
Thank you very much for your answer i really appreciate this, because i have something to study with this i was wondering where you would get those Regex string from thank you very much, i will have a good look at it this weekend thank you o much for your time greatly appreciated :)
 
Upvote 0
Top