B4J Question jOkHttpUtils2 Read timed out

PABLO2013

Well-Known Member
Licensed User
Longtime User
Regards ,
I have the problem that I need to read and scrape information from a website that has 1000 pages, the following code works fine. but it reads all the pages at one time and causes the error of "time out", the question is someone would kindly Indicate how to read page by page one at a time, thanks.

Note: the pages are named from 4000 to 5000
B4X:
Sub AppStart (Args() As String)


WEB

StartMessageLoop

End Sub

Sub WEB
For i=4000 To 5000

DIR="https://www.web.com/web/"&i&"-web-"&i&"-.html"

Dim job As HttpJob
job.Initialize("Main page", Me)
job.Download(DIR)
Next    

End Sub


Sub JobDone(Job As HttpJob)

Select Job.JobName

            Case "Main page"

If Job.Success = True Then
DIR=Job.GetString

           T1=Mid(DIR, DIR.IndexOf("<title>"),100)
            T1=Mid(T1,0,T1.IndexOf("</title>")).Replace("&quot;"," Pulg.").Replace("-","").Replace("<title>","").Trim
           Log(T1)


           T1=Mid( DIR, DIR.IndexOf("var idDefaultImage ="),30)
            T1=Mid(T1,0,T1.IndexOf(";")).Trim
           Log(T1)
       
   Else
      Log("Error: " & Job.ErrorMessage)
   End If
'   Job.Release

           

Case "Main page1"
Log("other")

    
End Select

End Sub



Sub Instr(Text As String, TextToFind As String, Start As Int) As Int
   Return Text.IndexOf2(TextToFind,Start)
End Sub
Sub Left(Text As String, Length As Long)As String
   If Length>Text.Length Then Length=Text.Length
   Return Text.SubString2(0, Length)
End Sub
Sub Right(Text As String, Length As Long) As String
   If Length>Text.Length Then Length=Text.Length
   Return Text.SubString(Text.Length-Length)
End Sub
Sub Mid(Text As String, Start As Int, Length As Int) As String
   If Length>0 And Start>-1 And Start< Text.Length Then Return Text.SubString2(Start,Start+Length)
End Sub

error is :
java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

 

PABLO2013

Well-Known Member
Licensed User
Longtime User
Thanks, if that's what i want, but how should i do to do this not manually but automatically one by one, thank you
 
Upvote 0

KMatle

Expert
Licensed User
Longtime User
- Start Job 1
- in Job.Done start the next one
- you need to do some simple logic via variable like

B4X:
If PageNumber < 5000 Then
PageNumber=Pagenumber+1 
DIR="https://www.web.com/web/"&i&"-web-"&i&"-.html" 'add PageNumber here
Dim job AsHttpJob
job.Initialize("Main page", Me)
job.Download(DIR)
End If
 
Upvote 0

PABLO2013

Well-Known Member
Licensed User
Longtime User
regards Thanks

I do not know if I error, but the sub that I put is the same as you recommend,
by means of (if PageNumber <5000 ...)...

The problem I have is that in both cases all (+-5000) the query is done at one time and this causes "java.net.SocketTimeoutException: Read timed out"; I am looking for is to do the "5000" queries but one by one not in block or all At the same time, or how you could avoid timeout ....

How to use it or for what... job.GetRequest.Timeout = .....

thank you
 
Upvote 0
Top