Project Repo: https://github.com/athletedecoded/rusty-ds
CI/CD Data Science with Rust
CLI and Notebook EDA using polars/plotters/evcxr + CI/CD distroless deployment
"Futureproofs" by testing build across rust releases: stable, beta, nightly
Setup
# Install Rust
make install
# Install evcxr_jupyter
make evcxr
# Check versions
make toolchain
Rust x Jupyter
- Launch
./notebook.ipynb
>> Select Kernel >> Jupyter Kernel >> Rust - Run All Cells
CLI EDA Tool
Supported data formats: .csv, .json files
Summary
# If file includes headers
cargo run summary --path </path/to/data> --headers
# ex. cargo run summary --path ./data/sample.csv --headers
# If file doesn't have headers
cargo run summary --path </path/to/data>
# ex. cargo run summary --path ./data/sample.json
Plot
cargo run plot --path </path/to/data> <--headers> --x <col_name> --y <col_name>
# ex. cargo run plot --path ./data/sample.csv --headers --x fats_g --y calories
# ex. cargo run plot --path ./data/sample.json --x fats_g --y calories
Unit Tests
make test
Files
.devcontainer/
-- configures local development container environment.github/workflows/CICD.yml
-- triggers CI/CD on git push and pull requestdata/
-- sample data files for unit testingsrc/lib.rs
-- shared library formain.rs
andnotebook.ipynb
src/main.rs
-- rusty-ds CLI scriptcargo.toml
-- cargo dependenciesnotebook.ipynb
-- Rust x Jupyter using EvCxR kernelMakefile
-- build commands and utilities
CI/CD
On git push/pull request the CI/CD flow is triggered using Github Actions:
- Install and validate Rust toolchain for each of stable/beta/nightly release
- Format and lint code
- Run unit tests
- Build binary release
- Lint Dockerfile
- Build distroless rusty-ds image
- Push image to Github Container Registry
NB: To build and push to GHCR, uncomment section in .github/workflows/CICD.yml
ToDos
- Add error handling for column name DNE
- Add dynamic plot bounds