Windowing function for SQLite3 using Python on VSCode
13:23
We(not necessarily old techies) love and understand SQL Queries and often finds it very handy tool for data manipulation.
Which is equally true with the power of Pandas. Let’s mix the Capability of both create some interesting query processing in an easy way.
Use Case: Often working with search capability and windowing function, we find it little difficult.
Ex: Let’s say we have powerful search engine based on Elasticsearch and we have indexed all the searchable, its working great. But now we need to do some processing to find the Lead or Lag in Dataset.
Windowing Function on Elasticsearch.
Elasticsearch query DSL supports some of the window based aggregations but not all and pretty complex. Example, if we need to do the simple Lead/Lag Operation which is row-by-row processing on a window of dataset
Solution: Get the dump from Elasticsearch in CSV File, use pandas to read CSV and put it in the SQLite DB
Create a cursor, fetch all results and index it back into the ElasticSearch
This is possible workaround, but Idea of doing this sometimes helps us in difficult situation wherein we probably doing the Batch mode processing.
There can be similar multiple situations, wherein you have data in CSV and quick SQL query processing is needed to validate your results.
For these situation, we can rely on the power of pandas and SQL query capability from Light weight SQLite3.
Steps:
Requirement: Python, VSCode and install plugin vscode-sqlite3(to visualize tables in db)
Plugins in VSCode: Pylance, SQLite
Create a virtual environment in VSCode
Install related libraries — Pandas, db-sqlite3
Read CSV File using Pandas into Dataframe
Dump the dataframe into the SQLite Table in Append mode
Create curser on SQLite Connection Object
Execute query and fetch results.
Persist the result as CSV or index it into the ElasticSearch or to any connections.
Run the Following Command:
pip install pipenv
pipenv shell
pipenv install pandas
pipenv install db-sqlite3
Please refer to the Github link to execute and try on your VSCode,
SQLQueries.py file ==> Explains the Like Clause Execution and multiple keyword search in SQLite Dataset. After fetching results, row by row manipulation in Python For Loop
WindowSQL.py ==> Explain the Lead function to calculate the difference in value of a column between 2 consecutive rows and then taking the difference.
Something like this
I hope this small snippet helps you solving complex situation.
Happy Reading :)