StoryNote
Log in
|
Sign up
/u/__JockY__'s posts
Year:
All
2024
Show search filters
Search by title:
Hide posts already read
Only show posts with narrations
With Llama-3.1 70B at long contexts (8000+ tokens), llama.cpp server is taking 26 seconds to process the context before responding with the first token. TabbyAPI/exllamav2 is instant. Is it my fault, llama.cpp's fault, neither, a bit of both, or something else entirely?
49 upvotes
•
r/LocalLLaMA
Mark as read:
--
10
9
8
7
6
5
4
3
2
1
0
Add to a list
Title
Upvotes
Subreddit
Mark as read
Favorited
Rating
Add to a list
With Llama-3.1 70B at long contexts (8000+ tokens), llama.cpp server is taking 26 seconds to process the context before responding with the first token. TabbyAPI/exllamav2 is instant. Is it my fault, llama.cpp's fault, neither, a bit of both, or something else entirely?
49
LocalLLaMA
--
10
9
8
7
6
5
4
3
2
1
0
«
<
>
»
Page
of 1
Go