Hi there! Working on some deep learning projects I discovered two useful nvidia-smi tips and now I want to share them with you. They work on Linux and they are simple but extraordinary effective.

How to monitor the GPUs using nvidia-smi

When doing some deep learning stuff, open a new Terminal window and type the following:

watch nvidia-smi

This command allow you to monitor your GPUs in real-time and it gives information about their temperature, the current used memory, their voltages, what tasks are using the various GPUs and so on, isn't it amazing?

How to set the power limit

If your GPU is too hot, your fans are too noisy or you simply want to save some money, a possible solution is to reduce your GPU wattage (yes, performance will go down, but maybe it is something you wanna try).

Super warning: setting the GPU wattage could be dangerous. Be extra sure to select the right GPU and to use the right power limit for your GPU. Although I have tested the proposed solution personally, I will not be held responsible for any loss, incidental or consequential damage arising from it.

Here is how to do that:

  • Detect the target GPU. Open a new terminal and run a
    command. Each CPU will have an id associated (usually the ID follow a simple incremental order starting from 0). However to be sure you are focusing on the right GPU try to select it by using the following command:
    nvidia-smi -i <your id>
    for example, if you want to select the first GPU:
    nvidia-smi -i 0
    If the information relative to the right GPU is shown, then that is the id you were looking for!
  • Look carefully what is the current maximum wattage of the selected GPU. Here we are reducing the power limit, so we will use a value less than the current threshold.
  • To be extra sure, check online what is the power consumption of your GPU. For the 1080 Ti I have checked here
  • Once you are sure you are focusing on the right GPU, and you know what is the current limit, then reduce its power (wattage) limit, by using the following command:
sudo nvidia-smi -i <your id> -pl <new limit>

for example, with my GTX 1080 Ti (250W) I often use:

sudo nvidia-smi -i 0 -pl 180
  • That is it! Now the GPU temperature should go down. The best thing is that by reducing the wattage by a small amount, the performance will remain almost the same!
  • Obviously, to restore, you simply have to go backward, by doing:
sudo nvidia-smi -i <your id> -pl <old limit>
  • Remember: the limit will be reset when you shut the pc down or when you reboot it.