Local Model Chat Prototype

Project Snapshot

Run conversational AI experiments entirely on your laptop. This prototype couples a Hugging Face pipeline with GPU-aware guardrails so teams can iterate without depending on hosted endpoints.

Business Context

Targets labs and compliance-sensitive teams that need to validate chatbots without sending data to third-party clouds.
Provides a repeatable launchpad for onboarding collaborators who work across Linux, macOS, and Windows.

Core Capabilities

Local BlenderBot inference driven by the facebook/blenderbot-400M-distill checkpoint via the transformers text-to-text pipeline.
Notebook/script parity maintained with a Jupytext pair (basic_chat.ipynb ⇄ basic_chat.py) so edits stay synchronized across IDEs and browsers.
GPU-friendly bootstrap that sets PYTORCH_CUDA_ALLOC_CONF and clears CUDA caches to squeeze models onto 2 GB cards while still offering CPU fallback.
Environment diagnostics through gpu_ts.py and pt-cuda-ts, confirming PyTorch/CUDA availability before allocating large tensors.
Conda-lock reproducibility with environment.yml, multi-platform lock files, and Makefile targets (make expenv, make updenv) that keep dependencies in sync.

Implementation Notes

Ships with concise setup steps for installing the right PyTorch build, transformers, accelerate, and sentencepiece.
Encourages tight VRAM management by demonstrating how to toggle precision or device maps inside the pipeline.
MIT-licensed and structured so larger checkpoints or UI wrappers (Gradio, Streamlit) can be swapped in later.

My Role

I packaged the notebook workflow, scripted the GPU health checks, and automated environment locking to make local LLM prototyping dependable for teammates.

Tech Stack

Python 3.9 · PyTorch · Transformers · Jupyter/Jupytext · Conda · CUDA

Explore the Code

GitHub Repository: rommel-rodriguez/basic-locally-hosted-chat

Project Snapshot#

Business Context#

Core Capabilities#

Implementation Notes#

My Role#

Tech Stack#

Explore the Code#