Python, Managing HTTP requests with the packages requests and httplib2

Logo

Introduction

The essential features when learning quickly Python for an immediate use :

In this chapter, how to handle HTTP requests in a Python program.

A sample PHP program rpc-articles-indexing.php sends in JSON format the last 10 articles to be indexed (column data_ixgoo is null) :

rpc-articles-indexing.php
<?php

  $conn=mysqli_connect('localhost','sqlpac_ro','********','sqlpac',40000);
  mysqli_set_charset($conn,"utf8");
  
  if(!$conn) {
    die('Connexion error : ' . mysqli_connect_error());
  }
  
  $sql  = "select filename, id_lang from articles where date_ixgoo is null ";
  $sql .= " order by date_ol desc limit 10 ";
  
  $data = array();
  
  $get_articles = mysqli_query($conn, $sql);
  
  if($get_articles)
  {
     foreach ($get_articles as $row) {
        $data[] = $row;
     }
  }
  
  print json_encode($data);

?>

Querying https://www.sqlpac.com/rpc/rpc-articles-indexing.php, the resulting data produced with json_encode have the following format :

[
  {"filename":"mariadb-columnstore-1.2.3-installation-standalone-ubuntu-premiers-pas.html","id_lang":"fr"},
  {"filename":"mariadb-columnstore-1.2.3-standalone-installation-ubuntu-getting-started.html","id_lang":"en"},
  {"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"},
  {"filename":"influxdb-v2-getting-started-setup-preparing-migration-from-version-1.7.html","id_lang":"en"}
]

Let’s see how to perform HTTP requests in a Python program.

2 useful packages are available : requests and httplib2. Another package is available : urllib2, but it requires more code.

Package requests

Installation

If not installed in your Python virtual environment, install the package requests with pip :

pip3 search requests
requests (2.23.0)                   - Python HTTP for Humans.
pip3 install requests
Installing collected packages: urllib3, chardet, certifi, idna, requests
Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.9 requests-2.23.0 urllib3-1.25.9

A simple GET request with requests

In the Python program, just import the package requests and call the method get :

import requests

r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php')

print(r.status_code)
print(r.headers)
print(r.text)
200

{'Date': 'Thu, 16 Apr 2020 14:59:04 GMT', 'Content-Type': 'text/html; charset=UTF-8',
  'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive',
  'Server': 'Apache', 'X-Powered-By': 'PHP/7.3',
  'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'X-IPLB-Instance': '30837',
  'Set-Cookie': 'SERVERID108286=102098|Xpic6|Xpic6; path=/'
}

[
  {"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"},
  {"filename":"influxdb-v2-getting-started-setup-preparing-migration-from-version-1.7.html","id_lang":"en"},
  {"filename":"linux-ubuntu-fail2ban-installation-configuration-iptables.html","id_lang":"fr"}
]

So easy that the code does not need any comments.

requests - Adding parameters in the GET query

Let’s enhance the query in the PHP program to add criteria : https://www.sqlpac.com/rpc/rpc-articles-indexing.php?section=oracle&year=2006

  $sql  = "select filename, id_lang from articles where date_ixgoo is null ";
  if (isset($_GET["section"])) { $sql .= " and filename like '".$_GET["section"]."%'"; }
  if (isset($_GET["year"]))    { $sql .= " and date_ol between '".$_GET["year"]."-01-01' and '".$_GET["year"]."-12-31'"; }
  $sql .= " order by date_ol desc limit 10";

To send the criteria :

import requests

q = {'section':'oracle', 'year':2006}
r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', params=q )
          
print(r.status_code)
print(r.text)
200
[
  {"filename":"oracle-resultats-procedure-stockee-vers-ms-sql.html","id_lang":"fr"},
  {"filename":"oracle-trigger-systeme-after-logon.html","id_lang":"fr"}
]

No need to import the json package, an integrated JSON decoder is available with the method json :

import requests

q = {'section':'oracle', 'year':2006}
r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', params=q )

jresult = r.json()

print(type(jresult))
print(jresult[0]["filename"])
<class 'list'>
oracle-resultats-procedure-stockee-vers-ms-sql.html

requests - The method POST

To send data with the method POST, use the method post with the argument data, easy as the method get and its argument params :

The PHP program rpc-update-article.php updates a table using the POST variables sent by the Python program and returns in JSON format the results (number of rows affected or error code):

rpc-update-article.php
<?php

  $resp = array();
  
  if (! isset($_POST["filename"]) || ! isset($_POST["datets"])) {
     $resp[0]["returncode"] = -1;
     $resp[0]["reason"] = "Missing parameter, filename or timestamp";
  } else {
     $sql = "update articles set date_ixgoo='".$_POST["datets"]."' where filename='".$_POST["filename"]."'";
     
     $conn=mysqli_connect('localhost','sqlpac_ro','********','sqlpac',40000);
     mysqli_set_charset($conn,"utf8");
     
     if ( ! $conn ) {
        $resp[0]["returncode"] = -2;
        $resp[0]["reason"] = "Connexion to database issue";
     }
     else {
        $sql = "update articles set date_ixgoo='".$_POST["datets"]."' where filename='".$_POST["filename"]."'";
        if ( ! mysqli_query($conn,$sql) ) {
           $resp[0]["returncode"] = -2;
           $resp[0]["errorcode"] = mysqli_errno($conn);
           $resp[0]["reason"] = mysqli_error($conn);      
        } else {
           $resp[0]["returncode"] = mysqli_affected_rows($conn);
           $resp[0]["filename"] = $_POST["filename"];
           $resp[0]["datets"] = $_POST["datets"];
        }
        mysqli_close($conn);
     }
     
  }
  print json_encode($resp);

?>

Data are sent as following :

import requests

formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'}
p = requests.post('https://www.sqlpac.com/sqlpac/rpc-update-article.php', data=formdata)

print(p.status_code)
print(p.json())
200
[{'returncode': 1, 'filename': 'python-http-queries-with-packages-requests-httplib2', 'datets':'2020-04-16'}]

The package requests is powerful when uploading files in a POST method, just use the argument files :

import requests

formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'}
uploadfiles = {'file': open('file1.txt', 'rb'), 'file': open('file2.txt', 'rb')}

p = requests.post('https://www.sqlpac.com/sqlpac/rpc-update-article.php', data=formdata, files=uploadfiles)

requests - Disabling SSL certificate verification

Add the option verify=False to disable SSL certificate validation using get or post method:

import requests
          
r = requests.get('https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php', verify=False )
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.sqlpac.com'.
Adding certificate verification is strongly advised.
See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

requests and the basic HTTP authentication

When a web folder is protected with basic HTTP authentication (.htaccess and .htpasswd file), use the argument auth=HTTPBasicAuth('user','password') with importing HTTPBasicAuth

import requests
from requests.auth import HTTPBasicAuth

r = requests.get('https://www.sqlpac.com/rpc/send-data.php', auth=HTTPBasicAuth('sqlpac', '*********'))

print(r.status_code)
200

Other authentication methods can be used with requests : Digest, Oauth…

Package httplib2

Let’s investigate another package : httplib2. The package requests is so powerful, and we may conclude we’ve got all what we need with requests but the package httplib2 must also be studied as Google code samples use this one, and there is one aspect rarely addressed in the documentations and tutorials dealing with the package requests : the caching mechanism, available natively in httplib2

Installation

If not installed in your Python virtual environment, install the package httplib2 with pip :

pip3 search httplib2
          httplib2 (0.17.2)                         - A comprehensive HTTP client library.
pip3 install httplib2
          Installing collected packages: httplib2
Successfully installed httplib2-0.17.2

Compared to the package requests, httplib2 is autonomous and does not require dependencies. The package requests depends on chardet, urllib3 and others.

The GET request with httplib2

Using the same sample PHP programs when exploring the package requests, to run a GET request :

import httplib2

http = httplib2.Http()

r = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET")
print(r)
({'date': 'Thu, 16 Apr 2020 16:20:09 GMT', 'content-type': 'text/html; charset=UTF-8', 'transfer-encoding': 'chunked',
'connection': 'keep-alive', 'server': 'Apache', 'x-powered-by': 'PHP/7.3',
'vary': 'Accept-Encoding', 'x-iplb-instance': '30846', 'set-cookie': 'SERVERID108286=102098|XpjaH|XpjaH; path=/',
'status': '200', 'content-length': '904', '-content-encoding': 'gzip',
'content-location': 'https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php'},

b'[{"filename":"influxdb-v2-prise-en-main-installation-preparation-migration-version-1.7.html","id_lang":"fr"}, … ]')

The results are less easy to operate than the ones with the package requests.

2 objects are returned :

  • The headers or response : class 'httplib2.Response'
  • The response content : class 'bytes'

Objects reponse can be splitted using the following syntax

import httplib2

http = httplib2.Http()

(headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET")

print(headers.status)
200

httplib2 does not provide native JSON translation as the requests package does with the method json, the package json must be imported and used :

import json
import httplib2

http = httplib2.Http()

(headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php", method="GET")

if (headers.status==200) :
	jdata = json.loads(content)
	
	for elt in jdata:
		print("%s %s" % (elt["filename"], elt["id_lang"]))
ms-sql-server-2016-dbcc-clonedatabase-usage.html fr
ms-sql-server-2016-using-dbcc-clonedatabase.html en

httplib2 - GET query with parameters

The parameters must be given in the URL, so don’t forget to encode the query string with urlencode :

import httplib2
import json
from urllib.parse import urlencode

params = { "section": "oracle", "year": 2006 }
(headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-articles-indexing.php?" + urlencode(params),
                                    method="GET")

if (headers.status==200) :
  …
oracle-resultats-procedure-stockee-vers-ms-sql.html fr
oracle-trigger-systeme-after-logon.html fr
…

httplib2 - POST method

Using the POST method, the method is obviously set to POST, and 2 other arguments are provided :

  • headers : content type, set to application/x-www-form-urlencoded for a form.
  • body : data values to be sent, to be encoded with urlencode.
import httplib2
from urllib.parse import urlencode

http = httplib2.Http()

formdata = {'filename':'python-http-queries-with-packages-requests-httplib2.html', 'datets':'2020-04-16'}

(headers, content) = http.request("https://www.sqlpac.com/sqlpac/rpc-update-article.php",
								   method="POST",
								   headers={'Content-type': 'application/x-www-form-urlencoded'},
								   body=urlencode(formdata)
								  )
print(content)
b'[{'returncode': 1, 'filename': 'python-http-queries-with-packages-requests-httplib2', 'datets':'2020-04-16'}]'

Compared to the package requests, more code is necessary for managing files uploads in POST method.

httplib2 - Disabling SSL certificate validation

Set the property disable_ssl_certificate_validation to True before issuing a request if the SSL certificate validation needs to be disabled for any reason, no warning is raised compared to the package requests :

import httplib2

http = httplib2.Http()
http.disable_ssl_certificate_validation=True
          
(headers, content) = http.request("url", method="GET")
…

httplib2 and the basic HTTP authentication

When using HTTP basic authentication, use the method add_credentials(user, password) before calling the method request :

import httplib2

http = httplib2.Http()

http.add_credentials('sqlpac','*********')
(headers, content) = http.request("https://www.sqlpac.com/rpc/send-data.php",
								   method="POST")
print(headers.status)
200

httplib2 advantages : cache usage

The package httplib2 is less easy than the package requests, but httplib2 has a big advantage in some circumstances : the cache.

Queries results can be cached in a directory :

import httplib2

http = httplib2.Http("/tmp/.cache")

(headers, content) = http.request("https://www.sqlpac.com/rpc/send-data.php",
								   method="POST")
print(headers.status)
200

In the example above, data are cached in the directory /tmp/.cache, if the directory does not exist, the program tries to create it.

The expiration can be governed by the header Expires sent by the Web server. With Apache, to define an expiration in a .htaccess file :

.htaccess
<IfModule mod_expires.c>
	ExpiresActive on
	ExpiresDefault     "access plus 4 hours"
</IfModule>

The property headers.fromcache (True | False) gives the status "read from cache" for the response.

import httplib2

http = httplib2.Http("/tmp/.cache")


(headers, content) = http.request("https://www.sqlpac.com/rpc/1.html")
print("Expires : %s" % (headers["expires"]))
print(headers.fromcache)

(headers, content) = http.request("https://www.sqlpac.com/rpc/1.html")
print(headers.fromcache)
Expires : Fri, 17 Apr 2020 14:50:13 GMT
False
True

All subsequent calls are read from cache until the expiration date/hour, including the next programs runs.

It can be useful for some needs, for example avoiding network access costs if data are relatively static :

Expires : Fri, 17 Apr 2020 14:50:13 GMT
True
True

To override and update the cache for a call : use the header cache-control and set the value to no-cache :

import httplib2

http = httplib2.Http("/tmp/.cache")


(headers, content) = http.request("https://www.sqlpac.com/tmp/rpc/1.html")
print("Expires : %s" % (headers["expires"]))
print(headers.fromcache)

(headers, content) = http.request("https://www.sqlpac.com/tmp/rpc/1.html",
                                   headers={'cache-control':'no-cache'})
print(headers.fromcache)
Expires : Fri, 17 Apr 2020 14:50:13 GMT
True
False

The package requests does not support in native mode caching, but a derivative package is available : requests-cache.

Conclusion

Depending on your needs, the package requests is the best one for handling HTTP requests if JSON format is intensively used, its syntaxes are the easiest.

For native caching mechanism, httplib2 seems more suitable. Caching with the package requests needs an optional package (requests-cache), not discussed in this paper.