11 April, 2021

Python with Oracle using sqlalchemy and cx_oracle

 Here is a simple demo of using Python and the sqlalchemy and cx_oracle libraries


This is the code :

#import required libraries
import pandas as pd
import cx_Oracle
import sqlalchemy
from sqlalchemy.exc import SQLAlchemyError

#setup connection
try:
    oracle_pdb = sqlalchemy.create_engine("oracle+cx_oracle://hemant:hemant@localhost/?service_name=orclpdb1", arraysize=100)
except SQLAlchemyError as e:
   print(e)
   
#setup query and pandas dataframe for results
try:
    employees_query = """SELECT * FROM hr.employees order by employee_id""";   
    df_employees = pd.read_sql(employees_query, oracle_pdb)
except SQLAlchemyError as e:
   print(e)

#Info on the dataframe
print(df_employees.info())
#the first five rows
print(df_employees.head())

#create a new dataframe with a subset of columns
df_emp_selected_cols=df_employees
df_emp_selected_cols.drop(['email','phone_number','salary','commission_pct','manager_id','department_id'],axis=1, inplace=True)
print(df_emp_selected_cols.head())


And here is the output from my database :

Info on the dataframe

RangeIndex: 108 entries, 0 to 107
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   employee_id     108 non-null    int64         
 1   first_name      108 non-null    object        
 2   last_name       108 non-null    object        
 3   email           108 non-null    object        
 4   phone_number    107 non-null    object        
 5   hire_date       108 non-null    datetime64[ns]
 6   job_id          108 non-null    object        
 7   salary          107 non-null    float64       
 8   commission_pct  35 non-null     float64       
 9   manager_id      106 non-null    float64       
 10  department_id   107 non-null    float64       
dtypes: datetime64[ns](1), float64(4), int64(1), object(5)
memory usage: 9.4+ KB
None
The first 5 rows
   employee_id first_name last_name  ... commission_pct manager_id department_id
0          100     Steven      King  ...            NaN        NaN          90.0
1          101      Neena   Kochhar  ...            NaN      100.0          90.0
2          102        Lex   De Haan  ...            NaN      100.0          90.0
3          103  Alexander    Hunold  ...            NaN      102.0          60.0
4          104      Bruce     Ernst  ...            NaN      103.0          60.0

[5 rows x 11 columns]
With selected columns only
   employee_id first_name last_name  hire_date   job_id
0          100     Steven      King 2003-06-17  AD_PRES
1          101      Neena   Kochhar 2005-09-21    AD_VP
2          102        Lex   De Haan 2001-01-13    AD_VP
3          103  Alexander    Hunold 2006-01-03  IT_PROG
4          104      Bruce     Ernst 2007-05-21  IT_PROG


Once you are familiar with this method, you can use numpy, matplotlib and a host of other python libraries with the dataset.

This article by Yuli Vasiliev is a good starter.


No comments: