The software-pain of running local LLM finally got to me - so I made my own inferencing server that you don't need to compile or update anytime a new model/tokenizer drops; you don't need to quantize or even download your LLMs - just give it a name & run LLMs the moment they're posted on HuggingFace
by /u/AbheekG in /r/LocalLLaMA
Upvotes: 219
Favorite this post:
Mark as read:
Your rating:
Add this post to a custom list