Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
light
|
darkhn
PUSH_AX
on Sept 12, 2023
|
parent
|
context
|
| on:
Fine-tune your own Llama 2 to replace GPT-3.5/4
You think they are caching? Even though one of the parameters is temperature? Can of worms, and should be reflected in the pricing if true, don't get me started if they are charging per token for cached responses.
I just don't see it.
why_only_15
on Sept 12, 2023
[–]
You can keep around the KV cache from previous generations which lowers the cost of prompts significantly.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
|
Search:
I just don't see it.