I'm back. I now have a working application, sort of, using JTidy as advised, and am retrieving the specific details I want from the website.
I have a problem, however, with some entries which I have tracked down, I think, to the operation of JTidy. If you look at the attached .txt file, you will see that there are some very long lines. These are being truncated and a linefeed inserted in inappropriate (for me) places to create a new line. For example, look at line 361 in the text file, it has become lines 173 and 174 in the xml version. I can't feed the airline name
"British
Airways"
into my database without stripping the linefeed.
I thought I could easily remove the line feeds from the retrieved strings with this code:
GoodString = BadString.Replace(CHR(10), " ")
but it doesn't make any difference, much to my surprise.
So, can someone please advise if this is the expected behaviour of JTidy, or a feature that can be modified, perhaps to allow for longer lines; and if I have to live with it, how do I remove the unnecessary linefeeds ?
Thanks for any help.
Caravelle