Project Snapshot

Run conversational AI experiments entirely on your laptop. This prototype couples a Hugging Face pipeline with GPU-aware guardrails so teams can iterate without depending on hosted endpoints.

Business Context

  • Targets labs and compliance-sensitive teams that need to validate chatbots without sending data to third-party clouds.
  • Provides a repeatable launchpad for onboarding collaborators who work across Linux, macOS, and Windows.

Core Capabilities

  • Local BlenderBot inference driven by the facebook/blenderbot-400M-distill checkpoint via the transformers text-to-text pipeline.
  • Notebook/script parity maintained with a Jupytext pair (basic_chat.ipynbbasic_chat.py) so edits stay synchronized across IDEs and browsers.
  • GPU-friendly bootstrap that sets PYTORCH_CUDA_ALLOC_CONF and clears CUDA caches to squeeze models onto 2 GB cards while still offering CPU fallback.
  • Environment diagnostics through gpu_ts.py and pt-cuda-ts, confirming PyTorch/CUDA availability before allocating large tensors.
  • Conda-lock reproducibility with environment.yml, multi-platform lock files, and Makefile targets (make expenv, make updenv) that keep dependencies in sync.

Implementation Notes

  • Ships with concise setup steps for installing the right PyTorch build, transformers, accelerate, and sentencepiece.
  • Encourages tight VRAM management by demonstrating how to toggle precision or device maps inside the pipeline.
  • MIT-licensed and structured so larger checkpoints or UI wrappers (Gradio, Streamlit) can be swapped in later.

My Role

I packaged the notebook workflow, scripted the GPU health checks, and automated environment locking to make local LLM prototyping dependable for teammates.

Tech Stack

Python 3.9 · PyTorch · Transformers · Jupyter/Jupytext · Conda · CUDA

Explore the Code