StoryNote
Log in
|
Sign up
/u/jakub37's posts
Year:
All
2024
Show search filters
Search by title:
Hide posts already read
Only show posts with narrations
Context caching for different llm user sessions, performance of CPU inference, blending CPU&GPU inference. What results can I expect? Or just use API?
1 upvotes
•
r/LocalLLaMA
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.
1 upvotes
•
r/LocalLLaMA
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Title
Upvotes
Subreddit
Mark as read
Favorited
Rating
Add to a list
Context caching for different llm user sessions, performance of CPU inference, blending CPU&GPU inference. What results can I expect? Or just use API?
1
LocalLLaMA
--
10
9
8
7
6
5
4
3
2
1
0
Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.
1
LocalLLaMA
--
10
9
8
7
6
5
4
3
2
1
0
«
<
>
»
Page
of 1
Go