Python impala flask integration using kerberos authentication — Part 1
This blog demonstrates how to fetch impala records using flask.
To connect impala there were two best open source clients are available in open source world
In this part we will discuss about how to integrate with impyla module
The main challenge is that we have to connect impala using kerberos authentication protocol
You must have your machine configured as a Kerberos user for ubuntu, this would involve installing the krb5-user
package, and configuring the krb5.conf
file.
sudo apt-get install krb5-user
Now to create ticket we have below command
kinit <username>@xyx.domain.com -k -t <keytabpath>
We need to run this command before program starts to connect impala daemon. Let’s create a util kerberos.py and move it to utils folder
Connect using impyla
Install below required packages to connect impala, if we miss anyone of the package we might get SASL authentication errors so make sure all the mentioned packages are installed
pip3 install impyla==0.17.0
pip3 install sasl==0.2.1
pip3 install thrift==0.11.0
pip3 install thrift-sasl==0.4.3
pip3 install kerberos==1.3.1
pip3 install pure-sasl==0.6.2
Create a file main.py,
declare conn variable and provide impala host name and port number. By default port number of impala will be 21050 for other systems please refer here
conn = connect( host="abcdata.domain.com", port=21050, database="default", kerberos_service_name="impala", auth_mechanism="GSSAPI")
Now write query to fetch records
Renewing Kerberos credentials
Please follow below step if your system requires kerberos auth
When a Kerberos credential expires, connection gets terminated to avoid such instances we can run a scheduler in the background for specific intervals to renew new ticket.
Install APScheduler
pip install APScheduler
Create a scheduler to renew new ticket
To fetch results from flask call the above function to main.py, register_kerberos_scheduler remembers to renew new ticket after 5hrs. This logic can be altered based on micro service need
Next part covers integrating impala using pyhive module ✌️