メインコンテンツにスキップ
リペアコミュニティに参加 - アカウントの作成

A2115 / 2020 / Processors from 3.1 GHz 6-core i5, 最大3.8 GHz 8-core i7. 発売日は2020年8月4日。

Kernel panic when CPU is hot for long time

Hi everyone!

TLDR: T2 / PCH related kernel panic if CPU is hot for a long time, boots only after I let it cool down. No kernel panic during normal/light use. Possibly a faulty component on the motherboard that has a bad connection? Possible fix if opened?

EDIT: More tests in the comments

Long version:

I obtained a faulty 2020 iMac 5K with an i7-10700K and 5500XT to be used as a DIY 5K project base. The ad said that the GPU is faulty, it randomly restarts but during normal use, no problem.

The screen is the most important for me (only slight pink hue around the edge, no problem), I did not really care about the issue but here is what I discovered and it made me more interested in fixing the issue.

I benchmarked the GPU using Heaven Benchmark for 1-2 hours running at max fan speed, the GPU was at 80-90 degrees and it did not restart.

Then I benchmarked the CPU using Cinebench, survived 10 minute single-core but crashed 2-3 seconds after starting multi-core. Later when it cooled down, I tested the multi-core again and it lasted a lot longer but not 10 minutes.

When it restarts, sometimes it crashes on boot but mostly it gets to the login screen, can stay there for hours but after entering my password and it would start loading everything, it crashes until I let it cool down so it has a thermal headroom or something. Macs Fan Control turns up the fan speed immediately after login but still not early enough, I also turned off Intel Turbo Boost to decrease the temp generation.

The kernel panic logs (when present) show T2 / PCH / SEP related crashes (BAD MAGIC, x86 global reset detected - CORE 0 is the one that panicked / void AppleEmbeddedPCIeUpLinkMgmt::_linkInterruptAction(IOInterruptEventSource *, int): A link timeout has been seen after 650000 microseconds and 49999 iterations - CORE 0 is the one that panicked

But the weird thing is that I have been using this thing everyday for basic tasks, logging in, sleeping, passwd auth, everything seems to be working as usual. I guess normal tasks use the T2 as well but it does not heat up that much maybe?

What I'm planning to do in the coming weeks is to open it up, check visual defects on the motherboard, get an LGA1200 PC motherboard to test if the CPU is okay or not.

This whole issue seems to be only happening when the CPU is over 75-80 degrees for longer period of time when the nearby components are also heated up, I suspect a faulty connection somewhere that is when hot, not connecting correctly. Maybe the T2 chip's connection is bad or something?

What do you think, what would be the best steps to troubleshoot this issue? Is there a tool that only stress tests the T2 chip and not the CPU? Maybe a feature in macOS that really stresses that?

Thank you in advance!

この質問に回答する 同じ問題があります

この質問は役に立ちましたか?

スコア 0
コメントを追加

2件の回答

最も参考になった回答

I would try replacing the thermal pads or paste... seems like thermal throttling.

このアンサーは役に立ちましたか?

スコア 1

19 件のコメント:

Will definitely try but the previous owner said it has been replaced already on the CPU/GPU. During a thermal throttle, the CPU/GPU would decrease the performance, not kernel panic, no?

Maybe the T2 has thermal pad/paste as well? I suspect that T2 overheats or something there and that is the reason for the crash. That is why I want to try stressing only the T2, not the CPU to validate this theory

さんによる

Could also be a GPU failure

さんによる

You can also try Apple Diagnostics. Just turn off your Mac, turn it back on and immediately press and hold the D key until you see a language selection or progress bar.

さんによる

I tested the GPU under full load for 1-2 hours with temps reaching above 80 degrees for the GPU and it did not restart. I once tried to use Apple Diag after many reboots and it also crashed during the test, did not get any result code, will try to take the computer outside and run the test in 10 degrees ambient temp

さんによる

Just ran a diagnostic test outside, no issues were found

さんによる

14件以上のコメントを表示

コメントを追加

Have you installed a good thermal monitoring App which also allows you to boost the fans RPM/ I personally like TG-Pro it will allow you to see what's getting too hot and you can boost the fan's RPM so you don't cook things. I also like it as it can create a log (CVS file) tracking the temps so you can see when the error pops what was happening

I would also make sure the fan blades and the heatsink fin area is full clean of dust and debris.

このアンサーは役に立ちましたか?

スコア 1

14 件のコメント:

I performed many tests this evening, the results are documented under Amazing FiXeR’s answer as comments, I use Macs Fan Control to check the temps and set the fan speed.

The previous owner said that it has been cleaned in a tech shop, but I will open it up when I have time, perform a visual inspection and maybe replace the thermal paste but according to my tests, the issue is not with the CPU or GPU or RAM.

The temps are normal, or actually what is visible in the app. The CPU under heavy load can reach 100 degrees but it throttles down to 90-95 as usual but the test keeps going for 20 or more minutes (outside with ambient temps below 10 celsius) if the GPU test is not running. GPU test can go for hours even inside

During normal use (Safari, code editing, document editing, chatting) it does not heat up, I also set the fan speed manually to speed up when the CPU temp is at 55 degrees but I suspect there is another component either on the motherboard or the PSU itself that heats up and causing the crash

さんによる

@scania471 - yes I think you're right this is a deeper logic board fault. There are six VRM models if I remember which regulate the power to the CPU in this series which can overheat as they sit quite close to the CPU. It could be as simple as a cold solder joint on one of these and there support components.

さんによる

@danj I just opened the iFixit teardown of this iMac this afternoon and saw a comment about these VRM modules. I have been thinking about them for hours and the fact that the i5 version has less of these modules then the i7/i9 versions but this machine came with an i7 from the factory, so no problem there.

If one of these modules are bad, that could be an answer for the crashes in warm environment and making it past 20 minutes in cold weather but I can't seem to find an answer for crashing if I started both CPU and GPU tests even not fully loaded (like 70 degrees max) and crashing in 2-3 minutes but I will definitely check those modules visually when I get to open this thing

さんによる

@danj I took it apart, checked the VRM modules and this is the only weird thing I could find: https://imgur.com/a/HrKm5N0

It does not seem to be cracked, just a scratch or similar. I tried to poke every component on the motherboard that is this size but none of them moved. If one side is not connecting perfectly, it should move at least a little bit, right?

さんによる

@scania471 - Sorry I don't see the crack, are you speaking about the darker mold seam line on the inductor? They look OK from what I can see. The view of the VRM chips are being blocked, can you take a picture straight down nice and tight like this one?.

As far as a cold solder joint, that doesn't mean it's physically loose. I was thinking the VRM chip it's self or maybe the capacitor or resistors around them.

さんによる

9件以上のコメントを表示

コメントを追加

回答を追加する

Martin Terhes さん、ありがとうございました!
統計データ:

過去 24時間: 0

過去 7 日: 6

過去 30 日: 31

今までの合計 237