/u/AbheekG's posts in /r/LocalLLaMA
The software-pain of running local LLM finally got to me - so I made my own inferencing server that you don't need to compile or update anytime a new model/tokenizer drops; you don't need to quantize or even download your LLMs - just give it a name & run LLMs the moment they're posted on HuggingFace
219 upvotes
Mark as read: Add to a list