Project Snapshot
Run conversational AI experiments entirely on your laptop. This prototype couples a Hugging Face pipeline with GPU-aware guardrails so teams can iterate without depending on hosted endpoints.
Business Context
- Targets labs and compliance-sensitive teams that need to validate chatbots without sending data to third-party clouds.
- Provides a repeatable launchpad for onboarding collaborators who work across Linux, macOS, and Windows.
Core Capabilities
- Local BlenderBot inference driven by the
facebook/blenderbot-400M-distillcheckpoint via thetransformerstext-to-text pipeline. - Notebook/script parity maintained with a Jupytext pair (
basic_chat.ipynb⇄basic_chat.py) so edits stay synchronized across IDEs and browsers. - GPU-friendly bootstrap that sets
PYTORCH_CUDA_ALLOC_CONFand clears CUDA caches to squeeze models onto 2 GB cards while still offering CPU fallback. - Environment diagnostics through
gpu_ts.pyandpt-cuda-ts, confirming PyTorch/CUDA availability before allocating large tensors. - Conda-lock reproducibility with
environment.yml, multi-platform lock files, and Makefile targets (make expenv,make updenv) that keep dependencies in sync.
Implementation Notes
- Ships with concise setup steps for installing the right PyTorch build,
transformers,accelerate, andsentencepiece. - Encourages tight VRAM management by demonstrating how to toggle precision or device maps inside the pipeline.
- MIT-licensed and structured so larger checkpoints or UI wrappers (Gradio, Streamlit) can be swapped in later.
My Role
I packaged the notebook workflow, scripted the GPU health checks, and automated environment locking to make local LLM prototyping dependable for teammates.
Tech Stack
Python 3.9 · PyTorch · Transformers · Jupyter/Jupytext · Conda · CUDA
Explore the Code
- GitHub Repository: rommel-rodriguez/basic-locally-hosted-chat