Android Question Access denied when downloading a website as text.

Filippo

Expert
Licensed User
Longtime User
Hi,

I am trying to download a website with my app, it always worked until recently.
Now this website refuses me access and I get this error message:
ResponseError. Reason: , Response: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;c6011002&#46;1702559562&#46;75b01d0
</BODY>
</HTML>

Here is my code:
B4X:
    Dim strUrl As String
    strUrl = https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"

    getWebSeiteAlsString(strUrl)


Sub getWebSeiteAlsString(sURL As String)
    Dim job As HttpJob
    job.Initialize("WebSeiteAlsString", Me)
    job.Download(sURL)
    ProgressDialogShow2("Bitte warten...", True)
End Sub

Sub JobDone(Job As HttpJob)
     Dim parser As JSONParser
    Dim res As String
    
'    Log("JobName = " & Job.JobName & ", Success = " & Job.Success)
    
    If Job.Success Then
        res = Job.GetString
        parser.Initialize(res)
        
        Select Job.JobName
            Case "WebSeiteAlsString"
                ...               
        End Select
    Else
        MsgboxAsync("Die Charts können nicht angezeigt werden.","Kein Internet verbindung!")
    End If
    Job.Release
    
    ProgressDialogHide
End Sub

Can this block be lifted? If yes, how?
 

aeric

Expert
Licensed User
Longtime User
Maybe you need to allow cookie or provide an API key?

By the way, why don't you use Wait For with OkHttpUtils2?
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
It's a simple case of blocking based on some data from the client. Probably user agent, or something like that.

Doesn't work, just as you posted:
Bash:
sandman@mothership:~ curl "https://www.finanzen.net/suchergebnis.asp?frmAktienSucheTextfeld=DE0005313704"
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
 
You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;9f034917&#46;1702562512&#46;ad377c1
</BODY>
</HTML>
sandman@mothership:~

If I get the page in Firefox and copy the actual curl request from within the browser instead, it works just file:
Bash:
sandman@mothership:~ curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' --compressed -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'DNT: 1' -H 'Sec-GPC: 1' -H 'Connection: keep-alive' -H 'Cookie: at_check=true; mbox=session#30e9195ff95d42549383dfdd023a471c#1702564179; _sp_v1_ss=1:H4sIAAAAAAAAAItWqo5RKimOUbKKxs_IAzEMamN1YpRSQcy80pwcILsErKC6lpoSSrEA-EAOLpYAAAA%3D; _sp_v1_p=505; _sp_v1_data=686534; _sp_su=false; googleanalytics_consent=active=true; fintargeting_consent=active=true; jwplayer_consent=active=true; euconsent-v2=CP2xrcAP2xrcAAGABCENAdEgAP_gAEAAACQgJFBR5DrFDGFBMHBaYJEAKYgWVFgAQEQgAAAAAQABAAGAcAQCw2AiIASABCAAAQAAgAABAAAECAEEAAAAAAAEAAAAAAAAgAAIIABAABEAAgIQAAoAAAAAEAAAAAABAAAAmAAQAALAAAQAQAAQAAAAACAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAIAAAAAAQAAAAABBDmA_AAoACwAKgAcABAACKAE4AUAAyABoAEQAJgATwA3gBzAEQAJwAfoBKQC5gGKANwAlYBLQCdgFDgLzAX8AxkBjgDIQG6gQ5ARBAAQF_BIBYAVQA_ACGAEcAPwAigBGgCSgJEAYMBIoKAIAAUACKAE4AUABzAS0Av4BjIDHAgAUADYAPgBCAEcAJ2KAAgEcGAAQCODoDgACwAKgAcABAAEQAJgAVQAxABvAD9AIYAiABOAD8AIoAR0AkoBKQCxAFzAMUAbgBF4CRAE7AKHAXmBDkCRQ4AiABcAGQANAAngCEAEcAP0AhABEQCLAEZAI4ATsBKwDBgGQgN1LQAQBHFgAIBHAwAQAEQBsgENgJaIQCgAFgBMACqAGIAN4AjgCKAEpAMUBIogAFAIyARwAsQBcwGeEoB4ACwAOABEACYAFUAMUAhgCIAEcAPwAuYBigEXgJEAXmBIokAGAAuAIQAjIBHAErAM8KQFwAFgAVAA4ACAAIgATAAqgBiAD9AIYAiAB-AEdAJKASkAuYBuAEXgJEATsAocBeYEOQJFFAB4ACgALgAyABoAE8AQgAjgBOAD9AIsARwAsQBigGeAN1AA.YAAAAAAAAAAA; consentUUID=5af7ba19-e98a-4421-8f38-b8fd3e6fe64a_26; gpt_ppid50=eM3MpclsV8LrAB4NVhn1NJcQTtUICaaHjLMCEp7p0nUeXR53bs' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: none' -H 'Sec-Fetch-User: ?1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'
...the page html removed here...
sandman@mothership:~

Next step for you would be to start stripping down the curl command to see how much you can remove before getting an error. When you've reached the bare minimum you know what to impersonate in your B4X code.
 
Upvote 1

DonManfred

Expert
Licensed User
Longtime User
Can this block be lifted?
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
 
Upvote 2

Sandman

Expert
Licensed User
Longtime User
Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.
Bash:
curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
Thank you! It works perfectly.

@Sandman
Many thanks!
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Contact the website author/admin and ask to unblock you.

Maybe try to add a customized Header to "simulate" being the request from a Firefox-browser

B4X:
Dim j As HttpJob
j.Initialize("job name", Me)
j.Download(<link>) 'it can also be PostString or any of the other methods
j.GetRequest.SetHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0")
Too bad, it worked until a few days ago.

Is there perhaps another possibility or a trick?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Too bad, it worked until a few days ago.
Probably they did changed the System behind. Adding a new security-layer or something.
How many requests are you doing per minute,hour, day? Maybe you are doing it to much often for them to block you.

Contact finanzen.net and ask for an Api which you then can use.
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
Just tracked your requests, often ones, or long-period ones from the fixed IP address. It's standard defence of the web-sites against the grabber apps.
Try to change the user-agent after some requests batch.
Proxy servers are for such tasks.
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
Is there perhaps another possibility or a trick?
Shouldn't be needed, this continues to work just fine:
Quick follow-up. Just as I expected, you just need to set a user-agent that they can accept. Here's the one from my example above. That's all that's needed to get the html.
Bash:
curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'

Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Firefox/120.0'
Perfect! Now it works again, thank you!

Either you're not mimicking the curl request well enough, or you're hammering their server so much that they blocked you.
The request is only sent by a single app (my private app), maybe 1-2 per week. That should not be the problem.

Contact finanzen.net and ask for an Api which you then can use.
I know the site has an API, but it's not free, and for the small number of requests I send, it's not worth it.
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Hi guys,

it worked for exactly 1 year, now I get the same error message again:

ResponseError. Reason: , Response: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;c6011002&#46;1702559562&#46;75b01d0
</BODY>
</HTML>

What should I change now to make it work again?
Does anyone have any more tips?

Many thanks in advance
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
Don't you change the user agent dynamically ?
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
I had a look. They've simply blocked that specific user agent, nothing much to worry about. You had all instructions you needed in #4 to solve this, but I might as well post one solution for you.

This works fine.
Bash:
curl 'https://www.finanzen.net/aktien/carl_zeiss_meditec-aktie' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0'

Meaning, for your code, try this as the user agent:
B4X:
Mozilla/5.0 (X11; Linux x86_64; rv:120.0) Gecko/20100101 Firefox/120.0

One way to make your code more future-proof is to take a number of user agents from a site like this and just randomize between them for each call:
 
Upvote 0

Filippo

Expert
Licensed User
Longtime User
Hi @Sandman ,

Thank you very much for your answer.

After trying all possible user agents, I think the website is blocking everything, I always get the same answer:
*** Receiver (httputils2service) Receive (first time) ***
ResponseError. Reason: , Response: <HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;finanzen&#46;net&#47;suchergebnis&#46;asp&#63;" on this server.<P>
Reference&#32;&#35;18&#46;9c41402&#46;1747645328&#46;f6fea6a
<P>https&#58;&#47;&#47;errors&#46;edgesuite&#46;net&#47;18&#46;9c41402&#46;1747645328&#46;f6fea6a</P>
</BODY>
</HTML>
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
In that case I would suggest this:

1. Try to run the code from another public IP. Just to make sure they haven't blocked you based on your IP.

...and if that works...

2. Figure out a way that's simple for you to inspect your network traffic. (The curl command in #16 still works perfecly for me.)


In any case, it wouldn't be a bad idea for you to install curl to your machine so you could try it out yourself:
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
Have you tried this with my code from post #1?
Sure, I do it all the time when I use the emulator and communicate with my own server API. Great way to see the requests and responses. It's really simple, too. Just install mitmproxy and use the proxy settings in the emulator and you can easily follow all the chatter.


However, if this is something you're not used to, I'd recommend first installing curl and also trying another IP before doing the proxy thing.
 
Upvote 0
Top