@sntx

sntx@lemm.ee · 19 days ago

Thanks for the writeup! So far I’ve been using ollama, but I’m always open for trying out alternatives. To be honest, it seems I was oblivious to the existence of alternatives.

Your post is suggesting that the same models with the same parameters generate different result when run on different backends?

I can see how the backend would have an influence hanfling concurrent api calls, ram/vram efficiency, supported hardware/drivers and general speed.

But going as far as having different context windows and quality degrading issues is news to me.

sntx@lemm.ee · 19 days ago

Is there an inherent benefit for using NVLINK? Should I specifically try out Aprodite over the other recommendations when having 2x 3090 with NVLINK available?

sntx@lemm.ee · 1 month ago

yes: sntx.space, check out the spurce button in the bottom right corner.

I’m building/running it the homebrewed-unconventional route. That is I have just a bit of html/css and other files I want to serve, then I use nix to build that into a usable website and serve it on one of my homelab machines via nginx. That is made available through a VPS running HA-Proxy and its public IP. The Nebula overlay network (VPN) connects the two machines.

sntx@lemm.ee · 4 months ago

Please tell ^^

sntx@lemm.ee · 6 months ago

This, or slackhq/nebula

sntx@lemm.ee · 1 year ago

I’m suprised nobody mentioned nebula: A scalable overlay networking tool with a focus on performance, simplicity and security.

I’ve been running it for about two years on multiple machines and it worked flawlessly so far. Even connecting two hosts, both behind mullvad-vpn tunnels.

The only downside is, that you have to host your own discovery server (callled “lighthouses”). One is fine, but running at least two removes the single point of failure from the network.