Posting this because, maybe, somewhere down the line this will be useful to some forumite.
Synopsis 1: My PC turned off one day and wouldn't turn back on. Cue around 8 to 10 hours of troubleshooting to figure out why, and resolve the problem. Did I get off scott-free? No. But I managed to isolate the problem and get back to internet capable functionality.
Maybe I'm just forum-deprived for the last two days and need to type some shit, or maybe I'm just elated that I feel like I've navigated this web of fuckery and want to talk about it.
This post will cover several main points related to sudden PC calamity and what you can do/might do and what to look for. In my case it will cover:
1. WTF just happened.
2. How did I work through the issues?
3. How did I make it harder/more stressful on my myself?
4. What are the long term consequences?
Chapter I: Trouble In Paradise.Be me, playing 7 Days To Die with friends, late on a school night. Blood Moon is coming. Gun be gud.
And then *poof*, darkness. My whole PC shuts down with barely a whisper.
I've been building my own PCs for near 20 years now. I work Tech Support as a job. My brain goes to work.
Assumption #1: Power surge. Even though none of the other lights in my room did anything, it was my gut reaction.
Solution #1: Wait a moment, power on PC.
Result #1: PC doesn't turn back on. Something in my case clicks when I hit the button, but nothing starts spinning.
Assumption #2: Short/Power supply has died. Very feasible explanation for why I get zero reaction to powering on.
Observation #1: Strangely though, an LED power switch directly on my motherboard is still lit up after a minute or two of looking at the situation and thinking. Couldn't be residual juice left in the board or battery because it drains off pretty quickly.....
Solution #2: Power off Power Supply, unplug it from the wall, wait 30 seconds. Let whatever electrical/magical wizardry dissipate. Reconnect, flip power switch back on, try to power on.
Result #2: Nada. Zero. Zilch.
Conclusion #1: Power supply could be having real problems.
Double-checking #1: Flip everything off again. Check Surge Protector. Yep, it's on, so the socket is good as well. Swap plugs on the surge protector. Yep, still not turning on.
Conclusion #2: PSU definitely fucked.
Consideration #1: Suddenly loss of voltage under load can have severe consequences for any component of your PC. Sure my power supply might be dead but who knows what else might be toast....
But it's way too late to start digging into the real guts of troubleshooting. So I go to bed and dream of a terrible world without internet and video games.
Chapter II: Panic, Mistakes, Discoveries and Troubleshooting 101.The next day arrives and I do several things: I ask a co-worker if I can borrow his Antec 750w Power Supply to test in my rig. And I start checking PSU prices on Newegg. There are some nice deals on some good PSU's and they're going to be running out soon....
I order a PSU and take my loaner PSU home.
Troubleshooting Step #1: I disconnect my PC from wall and peripherals and take it into the living room to start disconnecting the PSU from everything.
Discovery #1: As I'm looking in and getting ready to disconnect stuff I notice....the supplementary connector that is part of the 24 pin ATX connector is dangling loose while the rest of the connector is still plugged......ruh-roh. That should not be like that!
Realization #1: It might not be the PSU. And the damage could be way, way more severe than I thought. The mobo losing power before the other components could fuck up all sorts of things.
Troubleshooting Step #1A. Still, I can't confirm it didn't uh...fall out while I was moving it. So I proceed on with replacing my current PSU. I get it all hooked up, hold my breath and......
Result #3: Fuck all, beyond *click* again. Well, now I'm in deep water. Could the motherboard be shot? Luckily I still have a phone and Google so I ask "computer won't turn on, what do?"
Lesson #1: Motherboards have this thing called the CMOS. It's like the core settings for the motherboard (*not an expert*). And the thing that helps it remember these settings is a battery about the size of a watch battery that sits on the motherboard itself. If you switch a certain jumper on the motherboard (to break the electrical circuit and "purge" the connection between battery and CMOS), you can pop that battery out and reset the CMOS. After a severe, unexpected shutdown due to something like, say, a random chunk of power suddenly disappearing, weird shit can happen to how computers store information. It gets corrupted, unreadable. It can't be processed and in the big chain of things that need to work for your PC to turn on, it runs in to a show stopper.
Troubleshooting Step #2: Still using the replacement power supply, I move the jumper over on the motherboard, move it back after 10 to 15 seconds, pop the CMOS battery out, wait 30 seconds, put it back in, hook everything back up, flip the power on and.....
Result #4: POOOOOOWWWWWWAAAAAHHHHHH! Things are spinning. Fans are turning in the case and on the CPU and that's good! But uh, wait a second.....
I've got power but nothing is happening. The screen doesn't show anything.
Panic #1: PPPPPPPPPPPPFFFFFFFFFFFFFFFFTTTTTTTTTTTTTTTT.
Chapter III: The Plot SickensTroubleshooting Step #3: Seek help. Not from the internet this time.
Lesson #2: There are a million paper manuals you get for the shit you own these days, and I keep almost none of them. But one manual I always hold on to are my motherboard manuals, because if you know what to read for there's a lot of diagnostic information in it. And if you buy a fancy enough motherboard with good diagnostic features.....My motherboard (Asus z97 Pro) has several LED lights to indicate problem areas of the board, and another LED to tell you (by way of codes you look up in the manual) what the motherboard is currently "doing." There's lots of small subroutines and stuff your motherboard has to do before Windows even starts up. 99% of that is invisible to you and happens in milliseconds. But when just one of those steps is fucking up....you got a seemingly dead PC on your hands.
Troubleshooting Step #4: Look up the codes.
Chapter IV: Chaos.My recollection here gets a little hazy because, well, to be honest, I did a lot of stuff without a ton of rigor.
Essentially, between the codes being shown on the motherboard display, and the LEDs that point to specific areas and components of the motherboard as having troubles, it was telling me that it couldn't read my "IDE devices" (basically my hard drives and/or CD/DVD drive) or that there was a problem with my VGA slot (i.e my video card.)
Troubleshooting Step #???: These were dark times. I spent a lot of it alternatively unplugging hard drives trying to resolve why the motherboard couldn't see them.
Result
: Every time I'd get the IDE detection problem to go away, the motherboard would complain about the VGA slot. It was frustratingly inconsistent.
Hindsight #1: Getting confused about what drive is plugged in to what SATA port, and what the drive label names are, on top of all the other inconsistent weirdness going on, added a lot of time and confusion to the troubleshooting process. I'll expound on this more in a bit.
Troubleshooting Step #5: Since things are so fucking weird and inconsistent between reboots and device swapping, I decide to reset the CMOS battery again (after switching the jumpers) and let it drain out for a good 30 mins while I collected myself. I also removed the video card because I couldn't seem to resolve any problems going on around it, and plan to just hook my monitor to the default display port on the motherboard itself.
Consideration #2: Electricity is a strange beast. It doesn't always operate cleanly the way we'd like it to with our devices. Charges get held on to by pieces of hardware, and this charge represents what it thinks it's doing. Maybe that's what's going on with all this inconsistent weirdness, or maybe my motherboard has been damaged by all this shit.....
Troubleshooting Step #6: I replace the CMOS battery, pray to the Dark Gods and....
Result #5: WE HAVE POST. Immediately after POST it goes to the BIOS. After everything that's been going on that's not unexpected I'd guess but it's indicative of other weird shit going on, that it didn't just immediately boot to the primary drive. The system clock has been reset (because the CMOS battery was reset) and that's expected.
Hindsight #3: Your device boot priority also gets reset when you pop the CMOS battery. If your drives were arranged to boot in a specific order, well, that order is now lost.
Observation #2: In the BIOS it does not recognize any of my drives. Wat. That....that's not right. Nothing on them has changed.
Troubleshooting Step #4: Set the right clock time, reboot, see what happens.
Result #6: System reboots, goes back to the BIOS immediately, and the system time is not what I set it to. Ruh-roh raggy. Devices still aren't visible in the BIOS boot priority list.
Troubleshooting Step #5: Breathe deeply, and calmly reset the system one more time.
Result #7: System goes immediately to the BIOS on boot. The system time I set previously is now being displayed. I have two devices showing in the boot list: my CD/DVD drive and one hard drive. (I have three total in the system.)
Observation #3: As I ponder all this, I happen to notice that, between various restarts and what not, that my GPU fans aren't turning. Not especially abnormal when the system isn't even really doing anything, but it's weird that they kick a little bit every 45 seconds or so.
Troubleshooting Step #6: Since I've got at least one drive visible, I should try to boot to that.
Result #8: The system POSTs and....I see a black screen and text saying BOOTMGR NOT FOUND.
Chapter V: The Tech Priest goes to workIf we were in to it before, we're knuckle deep now.
Assumption #3: BOOTMGR is basically a core file on the harddrive that needs to be there before Windows will start. If it's not there, my hard drive is probably corrupted or unreadable.
Hindsight #3: Remember what I was saying about 3 drives and not remembering what they were and how they're physically plugged in? Two of them (my main and my backup) both have Windows installs. The third is like 12 years old and is only there because is still runs, pretty much.
Stupidity #1: Guess which drive I was booting to, in my failure to read which one the BIOS was saying it saw. The shitty 12 year old drive with an utterly broken set of data on it. No wonder it didn't boot. But I don't realize this at the time, so I....
Troubleshooting Step #7: Try to repair the drive via the Windows disk. Worth a shot right?
Observation #4: Most troubling, during some of my reboots the BIOS screen locks up....uuuuuuhhhhhhh............
Result #9: Windows can't fix my shit ass old drive (because there's nothing to fix.) Moving on I....
Troubleshooting Step #8: Work my way up the chain. I unplug all the drives and plug them in one at a time as the primary and only drive next to my CD/DVD drive, to see if I can get one to boot.
Result #10. After trying to boot my shittiest, oldest drive, I boot my second oldest drive, which has a Windows install on it. It boots! I see the usual "Windows didn't shut down right" message and am like, yeah, ok, let's try to fix windows so I can get ANYTHING going. I'm concerned my primary drive might be fucked too and what to see if I can read, and maybe copy, data off it.
Troubleshooting Step #9: Let Windows do its thing.
Result #11: After a few minutes of Windows trying to repair the installation on disk, it reports back that it can't. What a shock.
Troubleshooting Step #10: I hook up the last drive in the stack as the primary drive.
Realization #2: I am a dumbass.
Result #12: I'm greeted with another "Windows did not shut down properly" message. But this time it's for the right Windows install. I tell Windows, you know what, go ahead and boot and voila....I'm looking at a glorious 800x600 rendition of my desktop.
Synopsis 2: PSU: Probably not dead.
Mobo: Works....sorta? Some quirky shit going on.
GPU: The jury is out still.
CPU: Good, otherwise none of this shit would have worked.
Memory: Memory problems are often a show stopper. To be on the safe side I unplugged and reseated all the memory anyways. Even blew on them too.
Hard drives: My two older drives are intact but need reformatting. But my primary drive seems ok!
Smart Things #1: First off, I power it all down. I replace my loaner PSU with my own (just to make sure it really is OK) and hook up my older drives again as secondary drives. I do not yet hook up the GPU.
Result #13: System powers on and boots normally. (Sorry Rosewill, I maligned you for nothing) And in Windows I can see my other two drives. Things are starting to look up!
Smart Things #2: I format my other two drives and do fresh backups of documents, music and pictures. At least I can breathe a little easier knowing that's done.
Troubleshooting Step #11: All that's left now is to hook the GPU back up. Hell I might even get some gaming done tonight!
Result #14: The system boots, I see the POST messages....then it goes to black screen as soon as Windows tries to boot. FML.
Chapter VI: Or Is It Merely A Trick Of The Light?Observation #5: The GPU fans are still not spinning correctly. Which is not encouraging. The system is clearly booting, Windows is clearly loading, why do I have no display...hrm.....
Troubleshooting Step #12: I have two monitors plugged in to the GPU. (When I was plugged directly in to the board I had just one monitor.)
Supposition #1: Perhaps Windows, after all this, and the program I run to use dual monitors non-natively in Windows, are very confused by having two inputs.
Troubleshooting Step #12a: Simplify the problem. Go down to one monitor.
Result #15: After POST, I see the Windows logo loading!...and then it goes immediately in to Windows repair. Oops. I guess all those reboots midstream when I don't actually know what the system was doing is really pissing Windows off. I decide it's best to just let Windows do Windows and so I wait.....
Observation #6: After about 10 minutes of waiting for the Windows repair to run, my display suddenly shuts off.
Assumption #4: My graphics card crashed.
I'm now in a state where Windows is doing repairs, after too many sudden restarts causing problems, and I can't see anything it's doing. Like when it's done or when it says it can't fix the problem. Which means I'm going to have to do another blind restart.
Panic #2:AAAAAAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!
Troubleshooting Step #13: Do not make the problem worse. Luckily almost all computer cases have a little light to let you know there's hard drive activity. Resetting your PC while there's hard drive activity enough that light is constantly on is generally a bad thing. So I use it as my guide. I wait about 20 minutes until I stop hearing the hard drives run and that light is no longer constantly flashing. Still staring at a black screen wondering what's going to happen next, I hold the power button and my breath and.....
Result #16: It reboots and goes straight into a normal Windows start up. PHEW. Ok.
Smart Things #3: After THAT little scare I decide to back up some more shit, because at this point I'm taking nothing for granted.
Observation #7: I'm about to start copying files over to my back up drives when my screen just goes white, and I can't see my mouse cursor.
Assumption #5: My graphics card died again. It's the only real explanation for why my display is turning on and off seemingly at random and not coming back on until a restart.
Troubleshooting Step #14: All I can really do is a) reboot the system or b) pull the graphics card. I decide to give it one more shot. Again I'm told Windows failed to shut down properly so, I decided to boot it in safe mode, so it doesn't have to load the graphics card drivers, so I can just get some more files copied before something else goes wrong.
Result #17: Windows begins loading core files for booting up safemode. It hits CLASS.PNP I believe when......the screen goes solid lime green and freezes.
Conclusion #3: My graphics card, if not completely fucked, is more or less fucked because it's unstable and is faulting while not even under load. The color is kind of a dead give away. If the graphics card is crashing just trying to be initialized sometimes, there's no way it's stable anymore.
Troubleshooting Step #15: Unplug and remove graphics card, plug single monitor directly into the board's onboard display adapter.
Result #17: So far I've been running for about 2 hours with no restarts, hangs or colored screens of any kind. I've got internet, my data is backed up, my primary drive has been error checked and come back clean, and things are looking ok.
Epilogue: What Price Victory?So here's where I stand:
-PSU: OK. So easy to assume it's a PSU problem sometimes.
-CPU: OK. Never gave me a problem.
-HDD: OK. After all I put them through....
-MEMORY: OK. Like the processor, there was never anything to indicate a problem here, either in start up issues or LED codes and such.
-MOBO: ....It's hard to say. The BIOS locking up could be due to a lot of things. The failure to save BIOS changes on occasion could be due to a lot of things. If my display adapter on the board started giving me trouble, I'd say the board was damaged at the very least. It's also possible that my video card isn't the thing that's damaged, it's the PCI-E port that's messed up. But right now it's doing it's job so I'm inclined to say it's ok.
-GPU: The real casualty of this event (other than my weekend and peace of mind) It works for a little bit but it's damaged and so I'm going to have to get a new one. And now is a particularly pricey time to buy a video card. This card is a Geforce 970 and I hear people rave about their 1070s....but I'm not super enthusiastic about spending that kind of money when I was happy with my 970. As I said above, it's possible the video card isn't damaged and it's the PCI-E slot it's going in that is unstable.... and that's something I could test using one of my other PCI-E slots. It wouldn't be the first time I've heard of the expansion slots themselves getting damaged. I might as well try it before I order a new video card entirely.
So that's muh story. Let it be a cautionary tale about thinking you know what's going without testing, re-testing and verifying. Computers are legos with voltage, and voltage can make some weird shit happen to hardware. And much like software, there's a stack of processes that need to happen for things to work correctly, and the problem you're seeing might actually just be a symptom of something else.
I've already cancelled my order from Newegg for a new PSU and will be placing one for a new GPU soon. Hopefully when I install it everything goes smoothly and when the system goes under load I don't find out that, like, my memory is shot or some capacitor on the board is actually fried.
THE FINAL LESSON: So basically the random chaos of a connector becoming unplugged at the worst possible moment cost me a weekend and probably a couple hundred bucks. But maybe it wasn't so random. See I just moved a couple months ago to this new place. And while my PC booted up fine and everything when I moved in....I never did stop and double check all the connections. Hell I've gone to LAN parties over the years and never bothered to check the internals of my systems after I moved the computer around. I still don't have any evidence the loose power connector to the mobo actually popped out when my system died, I only saw it out after I'd been doing things. But it seems the likely culprit.
So the next time you move your PC, double check your critical connections like your CPU power, your CPU heat sink, your HDD power and SATA connections and of course your MOBO power. Any one of these things coming loose while your system is under load can set off a chain reaction of tech-fuckery that, obviously, if you don't know what you're doing can result in "welp, there goes my computer" or "better go pay a computer tech to figure it out." And if you're a builder yourself, just remember that when shit goes wrong, simplify the problem you're looking at. Take it one piece at a time, isolate the problem, fix it and go on to the next problem.