Using cached 20k context with cheap used 4th Gen Epyc for CPU and 4x3090 GPU inference? Please review my build plans and alternative API costs.