Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.
by /u/jakub37 in /r/LocalLLaMA
Upvotes: 1
Favorite this post:
Mark as read:
Your rating:
Add this post to a custom list