Windowing function for SQLite3 using Python on VSCode

Keshav Thakur
3 min readOct 20, 2020

--

== Simplicity vs Complexity ==

13:23

We(not necessarily old techies) love and understand SQL Queries and often finds it very handy tool for data manipulation.

Which is equally true with the power of Pandas. Let’s mix the Capability of both create some interesting query processing in an easy way.

Use Case: Often working with search capability and windowing function, we find it little difficult.

Ex: Let’s say we have powerful search engine based on Elasticsearch and we have indexed all the searchable, its working great. But now we need to do some processing to find the Lead or Lag in Dataset.

Windowing Function on Elasticsearch.

Elasticsearch query DSL supports some of the window based aggregations but not all and pretty complex. Example, if we need to do the simple Lead/Lag Operation which is row-by-row processing on a window of dataset

Solution: Get the dump from Elasticsearch in CSV File, use pandas to read CSV and put it in the SQLite DB

Create a cursor, fetch all results and index it back into the ElasticSearch

This is possible workaround, but Idea of doing this sometimes helps us in difficult situation wherein we probably doing the Batch mode processing.

There can be similar multiple situations, wherein you have data in CSV and quick SQL query processing is needed to validate your results.

For these situation, we can rely on the power of pandas and SQL query capability from Light weight SQLite3.

Steps:

Requirement: Python, VSCode and install plugin vscode-sqlite3(to visualize tables in db)

Plugins in VSCode: Pylance, SQLite

Create a virtual environment in VSCode

Install related libraries — Pandas, db-sqlite3

Read CSV File using Pandas into Dataframe

Dump the dataframe into the SQLite Table in Append mode

Create curser on SQLite Connection Object

Execute query and fetch results.

Persist the result as CSV or index it into the ElasticSearch or to any connections.

Virtual Env setup in VSCode for Python

Run the Following Command:

pip install pipenv

pipenv shell

pipenv install pandas

pipenv install db-sqlite3

Windowing Function on SQLite

Please refer to the Github link to execute and try on your VSCode,

SQLQueries.py file ==> Explains the Like Clause Execution and multiple keyword search in SQLite Dataset. After fetching results, row by row manipulation in Python For Loop

WindowSQL.py ==> Explain the Lead function to calculate the difference in value of a column between 2 consecutive rows and then taking the difference.

Something like this

Window Partition by Appname and Sorted By Time

I hope this small snippet helps you solving complex situation.

Happy Reading :)

--

--

Keshav Thakur
Keshav Thakur

Written by Keshav Thakur

Lead Software Developer → JAVA, Scala, Python → Data Visualization(Power BI) → Azure AI Engineer

No responses yet