Skip to content

zeenfaizpy/mufasa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uv license license

mufasa

Musafa is simple Query Processing Engine (aka dataframe library)

WIP: It is still in development.

not-by-ai

Installation

pip install git+https://github.com/zeenfaizpy/mufasa.git@main

Usage

from mufasa.core import ExecutionContext
from mufasa.functions import col, eq, lit

ctx = ExecutionContext()
df = (
    ctx.csv("employee.csv")
    .select(col('state'), col('first_name'), col('last_name'))
)

# where
df = df.filter(col("salary").gt(lit(12000)))

# group by and aggregations
df = (
    df.group_by(col('dept'))
    .agg(sum(col('salary')))
)

# save it to temp table, then query using sql
df.create_or_replace_table('employees')
new_df = ctx.sql("select first_name, salary from employees where salary > 10000")


# print the logical plan
df.show_plan()

# print the final data
df.collect()

Features

  • Dataframe API
  • SQL Support with catalog
  • Pyspark Compatible API

SQL Operations

  • FROM
  • WHERE
  • SELECT
  • GROUP BY
  • HAVING
  • JOIN
  • SubQueries
  • CTE
  • Window Functions
  • CASE

License

The GNU license. Please check LICENSE for more details.

About

WIP: Mufasa - Query Processing Engine ( dataframe library )

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages