Introduction
The essential features when learning quickly Python for an immediate use :
- Using Python virtual environments (Published : January, 2020)
- Reading, writing JSON files, handling JSON data with Python (Published : April, 2020)
- Handling Python programs arguments with the packages argparse et getopt (Published : April, 2020)
- Application configuration : environment variables, ini and YAML files (Published : April, 2020)
- Managing HTTP requests using the packages requests and httplib2 (Published : April, 2020)
In this chapter, how to handle HTTP requests in a Python program.
A sample PHP program rpc-articles-indexing.php
sends in JSON format the last 10 articles to be indexed (column data_ixgoo
is null) :
rpc-articles-indexing.php
<?php
$conn=mysqli_connect('localhost','sqlpac_ro','********','sqlpac',40000);
mysqli_set_charset($conn,"utf8");
if(!$conn) {
die('Connexion error : ' . mysqli_connect_error());
}
$sql = "select filename, id_lang from articles where date_ixgoo is null ";
$sql .= " order by date_ol desc limit 10 ";
$data = array();
$get_articles = mysqli_query($conn, $sql);
if($get_articles)
{
foreach ($get_articles as $row) {
$data[] = $row;
}
}
print json_encode($data);
?>
Querying https://www.sqlpac.com/rpc/rpc-articles-indexing.php, the resulting data produced with json_encode
have the following format :
[
{"filename":"mariadb-columnstore-1.2.3-installation-standalone-ubuntu-premiers-pas.html","id_lang":"fr"},
{"filename":"mariadb-columnstore-1.2.3-standalone-installation-ubuntu-getting-started.html","id_lang":"en"},
{"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"},
{"filename":"influxdb-v2-getting-started-setup-preparing-migration-from-version-1.7.html","id_lang":"en"}
]
Let’s see how to perform HTTP requests in a Python program.
2 useful packages are available : requests
and httplib2
.
Another package is available : urllib2
, but it requires more code.
Package requests
Installation
If not installed in your Python virtual environment, install the package requests
with pip
:
pip3 search requests
requests (2.23.0) - Python HTTP for Humans.
pip3 install requests
Installing collected packages: urllib3, chardet, certifi, idna, requests Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.9 requests-2.23.0 urllib3-1.25.9
A simple GET request with requests
In the Python program, just import the package requests
and call the method get
:
import requests r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php') print(r.status_code) print(r.headers) print(r.text)
200 {'Date': 'Thu, 16 Apr 2020 14:59:04 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Apache', 'X-Powered-By': 'PHP/7.3', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'X-IPLB-Instance': '30837', 'Set-Cookie': 'SERVERID108286=102098|Xpic6|Xpic6; path=/' } [ {"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"}, {"filename":"influxdb-v2-getting-started-setup-preparing-migration-from-version-1.7.html","id_lang":"en"}, {"filename":"linux-ubuntu-fail2ban-installation-configuration-iptables.html","id_lang":"fr"} ]
So easy that the code does not need any comments.
requests - Adding parameters in the GET query
Let’s enhance the query in the PHP program to add criteria : https://www.sqlpac.com/rpc/rpc-articles-indexing.php?section=oracle&year=2006
$sql = "select filename, id_lang from articles where date_ixgoo is null ";
if (isset($_GET["section"])) { $sql .= " and filename like '".$_GET["section"]."%'"; }
if (isset($_GET["year"])) { $sql .= " and date_ol between '".$_GET["year"]."-01-01' and '".$_GET["year"]."-12-31'"; }
$sql .= " order by date_ol desc limit 10";
To send the criteria :
import requests q = {'section':'oracle', 'year':2006} r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', params=q ) print(r.status_code) print(r.text)
200 [ {"filename":"oracle-resultats-procedure-stockee-vers-ms-sql.html","id_lang":"fr"}, {"filename":"oracle-trigger-systeme-after-logon.html","id_lang":"fr"} ]
No need to import the json package, an integrated JSON decoder is available with the method json
:
import requests q = {'section':'oracle', 'year':2006} r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', params=q ) jresult = r.json() print(type(jresult)) print(jresult[0]["filename"])
<class 'list'> oracle-resultats-procedure-stockee-vers-ms-sql.html
requests - The method POST
To send data with the method POST, use the method post
with the argument data
, easy as the method get
and its argument params
:
The PHP program rpc-update-article.php
updates a table using the POST variables sent by the Python program and returns in JSON format the results
(number of rows affected or error code):
rpc-update-article.php
<?php
$resp = array();
if (! isset($_POST["filename"]) || ! isset($_POST["datets"])) {
$resp[0]["returncode"] = -1;
$resp[0]["reason"] = "Missing parameter, filename or timestamp";
} else {
$sql = "update articles set date_ixgoo='".$_POST["datets"]."' where filename='".$_POST["filename"]."'";
$conn=mysqli_connect('localhost','sqlpac_ro','********','sqlpac',40000);
mysqli_set_charset($conn,"utf8");
if ( ! $conn ) {
$resp[0]["returncode"] = -2;
$resp[0]["reason"] = "Connexion to database issue";
}
else {
$sql = "update articles set date_ixgoo='".$_POST["datets"]."' where filename='".$_POST["filename"]."'";
if ( ! mysqli_query($conn,$sql) ) {
$resp[0]["returncode"] = -2;
$resp[0]["errorcode"] = mysqli_errno($conn);
$resp[0]["reason"] = mysqli_error($conn);
} else {
$resp[0]["returncode"] = mysqli_affected_rows($conn);
$resp[0]["filename"] = $_POST["filename"];
$resp[0]["datets"] = $_POST["datets"];
}
mysqli_close($conn);
}
}
print json_encode($resp);
?>
Data are sent as following :
import requests formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'} p = requests.post('https://www.sqlpac.com/sqlpac/rpc-update-article.php', data=formdata) print(p.status_code) print(p.json())
200 [{'returncode': 1, 'filename': 'python-http-queries-with-packages-requests-httplib2', 'datets':'2020-04-16'}]
The package requests
is powerful when uploading files in a POST method, just use the argument files
:
import requests
formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'}
uploadfiles = {'file': open('file1.txt', 'rb'), 'file': open('file2.txt', 'rb')}
p = requests.post('https://www.sqlpac.com/sqlpac/rpc-update-article.php', data=formdata, files=uploadfiles)
requests - Disabling SSL certificate verification
Add the option verify=False
to disable SSL certificate validation using get
or post
method:
import requests r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', verify=False )
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.sqlpac.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
requests and the basic HTTP authentication
When a web folder is protected with basic HTTP authentication (.htaccess
and .htpasswd
file), use
the argument auth=HTTPBasicAuth('user','password')
with importing HTTPBasicAuth
import requests from requests.auth import HTTPBasicAuth r = requests.get('https://www.sqlpac.com/rpc/send-data.php', auth=HTTPBasicAuth('sqlpac', '*********')) print(r.status_code)
200
Other authentication methods can be used with requests
: Digest, Oauth…
Package httplib2
Let’s investigate another package : httplib2
. The package requests
is so powerful, and we
may conclude we’ve got all what we need with requests
but the package httplib2
must also be studied
as Google code samples use this one, and there is one aspect rarely addressed in the documentations and tutorials dealing
with the package requests
: the caching mechanism, available natively in httplib2
Installation
If not installed in your Python virtual environment, install the package httplib2
with pip
:
pip3 search httplib2
httplib2 (0.17.2) - A comprehensive HTTP client library.
pip3 install httplib2
Installing collected packages: httplib2 Successfully installed httplib2-0.17.2
Compared to the package requests
, httplib2
is autonomous and does not require dependencies.
The package requests
depends on chardet
, urllib3
and others.
The GET request with httplib2
Using the same sample PHP programs when exploring the package requests
, to run a GET request :
import httplib2 http = httplib2.Http() r = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET") print(r)
({'date': 'Thu, 16 Apr 2020 16:20:09 GMT', 'content-type': 'text/html; charset=UTF-8', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'server': 'Apache', 'x-powered-by': 'PHP/7.3', 'vary': 'Accept-Encoding', 'x-iplb-instance': '30846', 'set-cookie': 'SERVERID108286=102098|XpjaH|XpjaH; path=/', 'status': '200', 'content-length': '904', '-content-encoding': 'gzip', 'content-location': 'https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php'}, b'[{"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"}, … ]')
The results are less easy to operate than the ones with the package requests
.
2 objects are returned :
- The headers or response : class 'httplib2.Response'
- The response content : class 'bytes'
Objects reponse can be splitted using the following syntax
import httplib2 http = httplib2.Http() (headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET") print(headers.status)
200
httplib2
does not provide native JSON translation as the requests
package does with the method json
,
the package json
must be imported and used :
import json import httplib2 http = httplib2.Http() (headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET") if (headers.status==200) : jdata = json.loads(content) for elt in jdata: print("%s %s" % (elt["filename"], elt["id_lang"]))
ms-sql-server-2016-dbcc-clonedatabase-usage.html fr ms-sql-server-2016-using-dbcc-clonedatabase.html en
httplib2 - GET query with parameters
The parameters must be given in the URL, so don’t forget to encode the query string with urlencode
:
import httplib2 import json from urllib.parse import urlencode params = { "section": "oracle", "year": 2006 } (headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php?" + urlencode(params), method="GET") if (headers.status==200) : …
oracle-resultats-procedure-stockee-vers-ms-sql.html fr oracle-trigger-systeme-after-logon.html fr …
httplib2 - POST method
Using the POST method, the method is obviously set to POST
, and 2 other arguments are provided :
headers
: content type, set toapplication/x-www-form-urlencoded
for a form.body
: data values to be sent, to be encoded withurlencode
.
import httplib2 from urllib.parse import urlencode http = httplib2.Http() formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'} (headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-update-article.php", method="POST", headers={'Content-type': 'application/x-www-form-urlencoded'}, body=urlencode(formdata) ) print(content)
b'[{'returncode': 1, 'filename': 'python-http-queries-with-packages-requests-httplib2', 'datets':'2020-04-16'}]'
Compared to the package requests
, more code is necessary for managing files uploads in POST method.
httplib2 - Disabling SSL certificate validation
Set the property disable_ssl_certificate_validation
to True
before issuing a request if
the SSL certificate validation needs to be disabled for any reason, no warning is raised compared to the package requests
:
import httplib2
http = httplib2.Http()
http.disable_ssl_certificate_validation=True
(headers, content) = http.request("url", method="GET")
…
httplib2 and the basic HTTP authentication
When using HTTP basic authentication, use the method add_credentials(user, password)
before calling the method request
:
import httplib2 http = httplib2.Http() http.add_credentials('sqlpac','*********') (headers, content) = http.request("https://www.sqlpac.com/rpc/send-data.php", method="POST") print(headers.status)
200
httplib2 advantages : cache usage
The package httplib2
is less easy than the package requests
, but httplib2
has a big advantage in some circumstances : the cache.
Queries results can be cached in a directory :
import httplib2 http = httplib2.Http("/tmp/.cache") (headers, content) = http.request("https://www.sqlpac.com/rpc/send-data.php", method="POST") print(headers.status)
200
In the example above, data are cached in the directory /tmp/.cache
, if the directory does not exist, the program tries to create it.
The expiration can be governed by the header Expires
sent by the Web server. With Apache, to define an expiration in a
.htaccess
file :
.htaccess
<IfModule mod_expires.c>
ExpiresActive on
ExpiresDefault "access plus 4 hours"
</IfModule>
The property headers.fromcache
(True | False
) gives the status "read from cache" for the response.
import httplib2 http = httplib2.Http("/tmp/.cache") (headers, content) = http.request("https://www.sqlpac.com/rpc/1.html") print("Expires : %s" % (headers["expires"])) print(headers.fromcache) (headers, content) = http.request("https://www.sqlpac.com/rpc/1.html") print(headers.fromcache)
Expires : Fri, 17 Apr 2020 14:50:13 GMT False True
All subsequent calls are read from cache until the expiration date/hour, including the next programs runs.
It can be useful for some needs, for example avoiding network access costs if data are relatively static :
Expires : Fri, 17 Apr 2020 14:50:13 GMT
True
True
To override and update the cache for a call : use the header cache-control
and set the value to no-cache
:
import httplib2 http = httplib2.Http("/tmp/.cache") (headers, content) = http.request("https://www.sqlpac.com/tmp/rpc/1.html") print("Expires : %s" % (headers["expires"])) print(headers.fromcache) (headers, content) = http.request("https://www.sqlpac.com/tmp/rpc/1.html", headers={'cache-control':'no-cache'}) print(headers.fromcache)
Expires : Fri, 17 Apr 2020 14:50:13 GMT True False
The package requests
does not support in native mode caching, but a derivative package is available : requests-cache
.
Conclusion
Depending on your needs, the package requests
is the best one for handling HTTP requests if JSON format is intensively used,
its syntaxes are the easiest.
For native caching mechanism, httplib2
seems more suitable. Caching with the package requests
needs
an optional package (requests-cache
), not discussed in this paper.