Regards ,
I have the problem that I need to read and scrape information from a website that has 1000 pages, the following code works fine. but it reads all the pages at one time and causes the error of "time out", the question is someone would kindly Indicate how to read page by page one at a time, thanks.
Note: the pages are named from 4000 to 5000
error is :
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
…
I have the problem that I need to read and scrape information from a website that has 1000 pages, the following code works fine. but it reads all the pages at one time and causes the error of "time out", the question is someone would kindly Indicate how to read page by page one at a time, thanks.
Note: the pages are named from 4000 to 5000
B4X:
Sub AppStart (Args() As String)
WEB
StartMessageLoop
End Sub
Sub WEB
For i=4000 To 5000
DIR="https://www.web.com/web/"&i&"-web-"&i&"-.html"
Dim job As HttpJob
job.Initialize("Main page", Me)
job.Download(DIR)
Next
End Sub
Sub JobDone(Job As HttpJob)
Select Job.JobName
Case "Main page"
If Job.Success = True Then
DIR=Job.GetString
T1=Mid(DIR, DIR.IndexOf("<title>"),100)
T1=Mid(T1,0,T1.IndexOf("</title>")).Replace("""," Pulg.").Replace("-","").Replace("<title>","").Trim
Log(T1)
T1=Mid( DIR, DIR.IndexOf("var idDefaultImage ="),30)
T1=Mid(T1,0,T1.IndexOf(";")).Trim
Log(T1)
Else
Log("Error: " & Job.ErrorMessage)
End If
' Job.Release
Case "Main page1"
Log("other")
End Select
End Sub
Sub Instr(Text As String, TextToFind As String, Start As Int) As Int
Return Text.IndexOf2(TextToFind,Start)
End Sub
Sub Left(Text As String, Length As Long)As String
If Length>Text.Length Then Length=Text.Length
Return Text.SubString2(0, Length)
End Sub
Sub Right(Text As String, Length As Long) As String
If Length>Text.Length Then Length=Text.Length
Return Text.SubString(Text.Length-Length)
End Sub
Sub Mid(Text As String, Start As Int, Length As Int) As String
If Length>0 And Start>-1 And Start< Text.Length Then Return Text.SubString2(Start,Start+Length)
End Sub
error is :
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
…