Skip Navigation

Looking for tools and tips to diagnose issues with a Nvidia card

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/linux_gaming by /u/Ima_Wreckyou on 2023-06-27 20:56:18+00:00.


I currently have a ~9 Year old GTX Titan X, which worked perfectly fine for years, but since around two 1.5-2 years ago, I started to get driver crashes during certain games.

Over time, various driver updates changed the exact behavior of the crash slightly. While it used to be a sudden mono-colored screen (XID 16 in the logs) and no way to recover, it is now just a freeze (XID 8 in the logs) and in 50% of cases I can actually just kill the application and everything is fine again. In the other 50% the screen and mouse just hangs. In all cases, the system is still running and I can login over the network via ssh to read logs and do a clean shutdown.

A lot of games work perfectly fine, even at high resolution and details. Examples are:

  • Satisfactory
  • Subnautica

Some immediately crash:

* Old School Runescape * Path of Excile (used to work perfectly fine in the past)

Some only crash when I go to certain places but can otherwise be played for hours without any issue: * Genshin Impact (Grand Narukami Shrine, certain cut-scenes)

With FFXIV I had a lot of random crashes, but playing with the setting showed that higher resolution and putting more load on the GPU actually made crashes less likely.

Weirdly enough, The amounts of crashes greatly decreased after getting a new 4k display with higher refresh rate. The difference was so big that I first thought the display was the issue all along (I actually bought it in anticipation of getting a new card later, the old display only had DVI and newer cards obviously don't support that anymore). The new display is now connected over DP, crashes are still happening, but less frequent.

I don't see how it can be a thermal issue. I would really like to know if this is a hardware issue and if it can be fixed. I mailed the error reports to nvidia multiple times, their support seems to be a black hole, no answer ever comes back out.

Is there any tooling available today or any good guides on how I can diagnose such an issue? My guess at the moment would probably be the RAM of the card, but I can't even find a RAM testing tool for Nvidia, nor do I find if there is actually something I could do about it once the issue is identified.

Sure, I could just get a new card, but it seems such a waste to throw it away when it would otherwise completely deliver the performance I need and mostly works.

Regards

0
0 comments