For a home automation project I am trying to pull train delay data for our magestically unreliable commuter trains. An API wrapper exists, with cURL examples. These work fine, but both Python's requests.get and httpx.get are slow to pull data (up to a minute for requests and ca. 4 seconds httpx) but curl, or pasting a link in the browser, returns almost immediately. Why?
The internet suggested that some sites have anti-scraping protections and may throttle or block requests, as it uses HTTP1.0. On this API httpx does seem to be much faster, but nowhere near as fast as curl or the browser.
Some examples - this Python snippet takes ca. 4 seconds:
import httpx
client = httpx.Client(http2=True)
response = client.get('
https://v6.db.transport.rest/stations?query=berlin')
https://v6.db.transport.rest/stations?query=berlin')
/>
print(response.text)
This takes up to a minute:
import requests
response = requests.get('
https://v6.db.transport.rest/stations?query=berlin')
print(response.text)
This returns almost immediately:
import subprocess
command = 'curl \'
https://v6.db.transport.rest/stations?query=berlin\' -s'
result = subprocess.run(command, capture_output=True, shell=True, text=True)
print(result.stdout)
print(result.stderr)
What's the magic here? Thanks in advance for any hints.
No comments:
Post a Comment
Thanks