2x3080 ti or 3090 for ML?

draglord · Jan 8, 2024

Hi

I have 2x3080ti + 3090. The 3080ti's are connected to the motherboard through risers. So they run at x1. 3090 is in x16 slot.

My question is, if i get an additional 3090, would it be better than 2x3080ti + 3090?

That would take less power, and the additional card would fit into my cabinet. And both the cards would run at x8.

But if it's better to have the previous setup, I'll keep that.

Thanks

Futureized · Jan 9, 2024

Wow, did you meant you are having 3 GPUs in one machine?

Cant understand term ML

draglord · Jan 9, 2024

Futureized said:
Wow, did you meant you are having 3 GPUs in one machine?

Capt understand term ML

Yes. 3 GPUs in one machine. But not inside the cabinet.

ML means machine learning

rockyo27 · Jan 9, 2024

I don't know if i am wrong or right.
Let's see, as of now with 3 gpus you get 48gb vram with 2x3090 also it wud be same but power draw wud reduce afaik and you can use nvlink also if both are same brand gpu and would be much easier to manage. This is what i think and would do and it wud also save some money as single 3090 used wud cost less than 2x3080ti.

draglord · Jan 9, 2024

rockyo27 said:
I don't know if i am wrong or right.
Let's see, as of now with 3 gpus you get 48gb vram with 2x3090 also it wud be same but power draw wud reduce afaik and you can use nvlink also if both are same brand gpu and would be much easier to manage. This is what i think and would do and it wud also save some money as single 3090 used wud cost less than 2x3080ti.

Can you tell me a little bit more about nvlink? Is it Physical device?

And 48 gb ram is combined right? How would the model fit into memory? Does it mean the model would treat the cards as a single card and use all the Ram or it would distribute the workload to each card and each individual card would perform the computation and return the results back to the user?

Neotheone · Jan 13, 2024

draglord said:
Can you tell me a little bit more about nvlink? Is it Physical device?

And 48 gb ram is combined right? How would the model fit into memory? Does it mean the model would treat the cards as a single card and use all the Ram or it would distribute the workload to each card and each individual card would perform the computation and return the results back to the user?

I'm keen to hear the answer too, because I am contemplating a similar (albeit weaker setup) with 2 x 3060 12GB cards, as opposed to one 4xxx series 16GB card.

rockyo27 · Jan 13, 2024

Neotheone said:
I'm keen to hear the answer too, because I am contemplating a similar (albeit weaker setup) with 2 x 3060 12GB cards, as opposed to one 4xxx series 16GB card.

For what i know using nvlink one can have direct access to full memory. But it maybe that it won't be usefull in all scenarios. What i could scrape from web, it boosts performance of some workload and none for some.
Check below link,

Dual NVIDIA GeForce RTX 3090 NVLink Performance Review

In our dual NVIDIA GeForce RTX 3090 NVLink compute performance review, we see how scaling to multiple GPUs impacts performance

www.servethehome.com

draglord · Jan 13, 2024

Neotheone said:
I'm keen to hear the answer too, because I am contemplating a similar (albeit weaker setup) with 2 x 3060 12GB cards, as opposed to one 4xxx series 16GB card.

Speaking from limited experience so take it with a grain of salt.

Nvlink is not important. It does not contribute to the performance of dual cards.

And nvlink is only available for 3090 so you're good.

Source: https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

In my opinion you should go for a single 4000 series card. It has fp8 instead of fp16 which is available in 30xx series and fp8 is faster. Apart from that, vram is usually touted to be more important than tensor core speed but unless you're working with large LLMs you should be good with a 16 gb card. Also, while training your model, the data is distributed into batches and sent to each card to be processed. So the vram does not add up. The training speed of the whole model is limited by the slowest card, and all the cards sync up when the slowest card is done. Which means having a 16 gb card would be beneficial for you. Single card means less power consumption, heat and noise.

Plus, You can always buy another 4xxx card later.

Source: https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

Alucard1729 · Jan 13, 2024

draglord said:
My question is, if i get an additional 3090, would it be better than 2x3080ti + 3090?

Working with two similar GPUs would be easier, but having any of the two configs be it 2x3080ti + 3090 or dual 3090 is just a luxury if you aim to learn ML or implement some models.
The majority of people learning ML either for academic, job transition, or passion, have been working with hardware that's far less powerful than any of the GPUs mentioned here. Just a few months back Vesuvius Challenge's first stage was achieved with mere GTX 1070.

ML is not gaming, a better setup won't yield better performance or results rather it's your proficiency with statistics and programming.
Aside from that to answer another question asked above, RTX 3060 is more than enough to implement the majority of ML models. When Meta's Llama model was leaked and later unveiled to the wider masses, people were racing to figure out newer quantization methods to run the models on their Macs, PCs, and even smartphones XD.

As for another question regarding NVlink, it is useful cause it decreases multiple calls to the CPU, provides more granular GPU memory management, and ofc more bandwidth, these factors can be taken advantage of while prototyping. If you don't see the benefit in that then it's not useful in a meaningful way for your work.

If you purely want to just work/play with LLMs and plan to buy a new GPU or upgrade your current setup then going for the newer 40xx series is a good bet. Other than that thinking of RTX 40xx having a major benefit over 30xx series just because of fp8 is moot. You can always modify the model to play nicely with fp16 and produce stable results.

draglord said:
Also, while training your model, the data is distributed into batches and sent to each card to be processed. So the vram does not add up.

Having similar GPUs helps in ascertaining the memory allocation with different batch sizes and circumventing oom errors early on. One thing I have done recently is to use a spare rx550 to drive my monitor to keep the 3090s solely for running the projects. Also, Radeon helps with the out of box support in Linux.

2x3080 ti or 3090 for ML?

Figuring stuff out