Can someone help me with B4PPC project (Webpage scraping)?

Minimalist

New Member
I am aware that Windows Mobile and B4PPC are 'dead' for many years.
Yet I still have a wonderful and fully operational Pocket PC Loox from Fujitsu that I would like to revive for a small project (instead of a Rasberry based one).
In a separate thread ("Weather web service using HTTP") someone offered an interesting project that I could use as an example to work on.
Unfortununately the source code was removed and is no longer available in this thread.
I tinkered with other examples , have almost all sample libs in my archive; I can access an URL, see the page content on a form (web object) already.
But I need to 'scrape' or 'search' its content for strings I'd like to extract and work with. I do not see any method/handle of the web object that would serve this function?
The forementioned weather app probably did something similar as I intend, and could be useful for my small project,

Does anyone have this weather app source code?
Or can someone provide some clue how to access the web page content (or the raw data of any webpage) in B4PPC?
Thanks a lot!
 

Minimalist

New Member
Thanks for your reply and willingness to help!
In a nutshell:
- I want to read out and collect the data of a WLAN device in my home network
- the data is provided in a HTML via a IP address
- the page has apparently JAVA format and throws 'ReList' and 'ReTip' errors during the load in the old B4PPC web control; it is displayed without readable values, but the source code of the page contains the desired value as "webdata_now_p".
- the goal is to collect this data programmatically in a timed way, accumulate the values in a CSV file and upload it to my local NAS via FTP

I am using the sample library WebBrowser1.2 I found in my old Basic4PPC archive as a starter; all is attached here in a ZIP with for reference, the path to the provided HTML file is currently hardcoded.
It does connect and load the page after the forementioned error throwing, but the real values are only visible when inspecting the page's source code. The script error can apparently not be flagged or ignored, comes right out of the DLL?
How can I access/search the page raw content for the desired string and extract it for further handling? If I get a handle on this part I can continue on my own.

My B4PPC knowledge got rusty after all these years. Yet I am STILL using (in DT mode) a great statistical expense control app I once wrote for WindowsMobile some 17 years ago. With sqlite database, great charting capabilities, parametric query design and all. Wish I could port THAT to my Android devices - a different story.

Thanks for any useful idea or hint!
 

Attachments

  • WebBrowser1.2_mod.zip
    16.6 KB · Views: 93

agraham

Expert
Licensed User
Longtime User
I'm afraid you are probably out of luck. Looking in the help I see that I wrote that library back in 2008 and included a comment, mainly to remind myself, as to why the DocumentText property is write only. If I remember the WebBrowser in Windows CE does not have the capability to read the document text. This is probably because Windows Forms on the WinCE had to be cut down dramatically in size to fit on devices.
 

Minimalist

New Member
Thanks!
I checked several of the old B4P lib samples, some do no longer run, none seem to lead into the proper direction.
The forementioned Weather App with HTML may have been my best baseline, albeit it probably would have stumbled over the JAVA structure in the HTML page anyways.
So I dumped my little project, will use a basic VB6 app (works fine, but needs bigger hardware and wattage to run),
Modern times require newer technology like Raspberry, Python scripts ect.
Too much hassle for an old geezer like me.
 
Top