Python impala flask integration using kerberos authentication — Part 1

Selvaganesh
2 min readMay 30, 2021

--

This blog demonstrates how to fetch impala records using flask.

To connect impala there were two best open source clients are available in open source world

  1. impyla
  2. pyhive
  3. impyla + SQLAchemy

In this part we will discuss about how to integrate with impyla module

The main challenge is that we have to connect impala using kerberos authentication protocol

You must have your machine configured as a Kerberos user for ubuntu, this would involve installing the krb5-user package, and configuring the krb5.conf file.

sudo apt-get install krb5-user

Now to create ticket we have below command

kinit <username>@xyx.domain.com -k -t <keytabpath>

We need to run this command before program starts to connect impala daemon. Let’s create a util kerberos.py and move it to utils folder

kerberos.py

Connect using impyla

Install below required packages to connect impala, if we miss anyone of the package we might get SASL authentication errors so make sure all the mentioned packages are installed

pip3 install impyla==0.17.0
pip3 install sasl==0.2.1
pip3 install thrift==0.11.0
pip3 install thrift-sasl==0.4.3
pip3 install kerberos==1.3.1
pip3 install pure-sasl==0.6.2

Create a file main.py, declare conn variable and provide impala host name and port number. By default port number of impala will be 21050 for other systems please refer here

conn = connect(   host="abcdata.domain.com",   port=21050,   database="default",   kerberos_service_name="impala",   auth_mechanism="GSSAPI")

Now write query to fetch records

Renewing Kerberos credentials

Please follow below step if your system requires kerberos auth

When a Kerberos credential expires, connection gets terminated to avoid such instances we can run a scheduler in the background for specific intervals to renew new ticket.

Install APScheduler

pip install APScheduler

Create a scheduler to renew new ticket

To fetch results from flask call the above function to main.py, register_kerberos_scheduler remembers to renew new ticket after 5hrs. This logic can be altered based on micro service need

Next part covers integrating impala using pyhive module ✌️

--

--

Responses (2)