/u/jakub37's posts

Year:

Only show posts with narrations

Context caching for different llm user sessions, performance of CPU inference, blending CPU&GPU inference. What results can I expect? Or just use API?

1 upvotes • r/LocalLLaMA

Mark as read: Add to a list

Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.

1 upvotes • r/LocalLLaMA

Mark as read: Add to a list

Title	Upvotes	Subreddit	Mark as read	Favorited	Rating	Add to a list
Context caching for different llm user sessions, performance of CPU inference, blending CPU&GPU inference. What results can I expect? Or just use API?	1	LocalLLaMA
Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.	1	LocalLLaMA