Sensors, CPU & Fans Control

From Elvanör's Technical Wiki
Jump to navigation Jump to search

Sensors

CPU temperature

  • To get CPU temperature, kernel modules are needed. I2C support in particular is generally required (compile I2C_CHARDEV).
  • In addition, the correct modules corresponding to your chip should be added in the Hardware Monitoring support and I2C sections. One good technique is to compile everything available, then run sensors-detect, which will find the correct module. Then you can delete all the other modules and rebuild the kernel.
  • For a recent Intel Core CPU, compile in the SENSORS_CORETEMP module (Device Drivers -> Hardware Monitoring support).
  • In user space, you need to emerge lm-sensors.
  • For an Intel i7-11700, CPU temperatures of 40-60 degrees when doing standard operations (Firefox opened with 30 tabs, Dolphin, Konsole, a few other programs) seem standard. You should reach 35-45 degrees when completely idle (with a summer outside temperature). More is not normal and can indicate a CPU cooler issue or ventilation issue.

NVME M2 SSD temperature

  • It's very important to monitor the temperature of an M2 SSD drive. The temperature can get very hot since it is placed close to the CPU and VRM areas of the motherboard.
  • If the temperature crosses some threshold levels, the drive won't work correctly and will start behaving erratically. Reads and writes may fail or may be delayed. Usually this symptom manifests itself with a very slow or unresponsive system.
  • Note that the drive itself won't have any issues / errors, and should be perfectly fine once it cools down but SMART will report the drive as about to fail imminently if the temperature is higher than a critical level. This is not true, but this can also prevent the system from booting - the BIOS will output an error message about the drive, and will block the boot.
  • On my Corsair MP400 2Tb M2 drive, basically any temperature over 70 degrees was dangerous. The levels defined by Corsair (that can be obtained by running smartctl /dev/nvme0 -a) were 75 degrees (warning) and 80 degrees (critical). I am not sure exactly when the drive starts to throttle itself, or starts having serious issues, but it does happen for sure.
  • It's crucial to avoid this situation. While you're troubleshooting how to keep the drive temperature low, it's extremely useful to pin the M2 drive temperature into the bottom panel to get a quick reading easily.

Fan metrics

  • lm-sensors is also needed, along with some kernel modules (same procedure as for CPU temperature above).
  • You may need to add the acpi_enforce_resources=lax command-line kernel boot parameter for some ASUS motherboards (like the ROG Strix Z590-I).
  • You can then get the current fan RPM by running the sensors program.

Monitoring metrics

  • To monitor the CPU temperature, load, and fan metrics (in RPM), use System Monitor Sensor which is well integrated into KDE. You can pin it to the bottom panel, configure different metrics and views (charts, graphs, just numbers...).
  • The fan sensors are usually present in the category Hardware Sensors. On an Asus ROG Strix Z590-I motherboard, the adapter/controller is called nct6798-isa-0290. You can run sensors to get the correct name.
  • Knowing which fan is actually the CPU one (or case one) is difficult as sensors gives you names like Fan 1, Fan 2 etc. The easiest way is to boot up with all the fans off except one, and write down the correspondences.

Monitoring on Windows

  • On Windows, the best monitoring tool seems to be HWInfo. It has access to many sensors / metrics, and it allows you to add those metrics to the bottom panel (system tray). Only downside compared to System Monitor Sensor is that the appearance on the bottom panel is less configurable (you get raw numbers with a given color, but you cannot add a label for instance).

Asus

  • On some ASUS motherboards you may need to build the Asus atk0110 ACPI module. You don't need anything else but this module is not detected by sensors-detect; you must load it manually in the configuration file, see below.
  • If you use the Asus atk0110 module, enter the following in /etc/conf.d/lm_sensors:
MODULE_0=asus_atk0110

Fan Control

  • Fan control is better done in the BIOS. Normally it will allow you to set a fan profile, which associates a given CPU temperature with a given rotation speed. Usually you will want to keep rotation speed low to minimize noise, even if the CPU gets hot. In particular, running the fan at full speed produces a lot of noise at least on Noctua fans.
  • CPU temperature should be less than 45 degrees while idling, 75 degrees under normal load, and 95 degrees under heavy load (gaming).

Pulse-Width Modulation

  • Modern fans implement this interface, and in theory this makes it possible to control the fan speed from the OS rather than the BIOS. In practice, I think it's hard to do so and I was not able to even obtain metrics via PWM.
  • Kernel modules that might be needed for this include Device Drivers -> Pulse-Width Modulation and Device Drivers -> Hardware Monitoring support -> PWM fan.

CPU Control

  • Modern CPUs implement frequency scaling, which means they run at higher clocks under load and run slower when idle or under light load.
  • In the Linux kernel, software support is implemented in the Power management and ACPI options -> CPU frequency scaling sections.

Intel processors

  • There are two layers of CPU frequency scaling / power usage for Intel processors. The first one is given at boot by the BIOS and represents the absolute limits that the OS will be able to require from the CPU.
  • Typically, Intel limits / recommendations are fairly safe. Or you can use vendor defined, more aggressive limits instead but this will typically push up the CPU temperature and power consumption.
  • The option is called MultiCore Enhancement (MCE) in the BIOS.
  • The second layer happens at the OS level, and defines how the kernel scales the frequency of the CPU's cores.

OS Frequency Scaling

  • Modern Intel CPUs (recent Core models) use the P-state driver. Documentation is available here.
  • This driver only support two governors: performance and powersave. Building other governors into the kernel is useless, they won't be used.
  • The performance governor produces high CPU temperatures that generally send the fans to high speeds, while powersave produces lower temperatures.
  • You can switch the current used governor by using the cpupower program:
cpupower frequency-set -g powersave
  • To switch the governor in a permanent way (that survives reboots), choose the default governor in the kernel configuration (Power management and ACPI options -> CPU Frequency scaling) or use the kernel command line parameter in GRUB:
cpufreq.default_governor=powersave
  • You can obtain the currently running frequency and used governor by running:
cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
cat /sys/devices/system/cpu/cpufreq/policy*/scaling_governor

Updating CPU microcode

  • It is recommended to update the CPU microcode at boot. With initramfs it can be done easily; add the initramfs USE flag and emerge sys-firmware/intel-microcode. It will automatically copy an intel-uc.img file to /boot. GRUB can then load this file via an initrd line normally.
  • It's also possible to update the CPU microcode without initramfs support. In that case the correct firmware must be identified and built into the kernel. See this page for details.
  • In any case, you can check if microcode was correctly loaded by looking at dmesg (should be the first log line).

Potential important kernel modules

  • Some modules in the ACPI section (mostly for CPU frequency scaling).
  • Some modules in Device Drivers -> Multifunction device drivers (in particular the Intel LPSS modules?).
  • Some modules in Device Drivers -> X86 Platform specific drivers (Intel ones?).