Topic: Things that made you go "WTF?" today o_O (Read 14964586 times)

wierd · « **Reply #154350 on:** November 14, 2020, 08:30:37 am »

Quote from: Reelya on November 14, 2020, 04:48:28 am

Just to note however, this wasn't a case of them skimping out because it's a console: PS2 came out in 2000, that's the same year Mobos started rolling out with ATA-66. They did put the fastest port available into the thing, i.e. it was already future-proofed as much as possible. It's just with exponential increases in capacity/throughput per year any stats are going to sound terrible very quickly. This is a 20 year old machine. It's like trying to run an original 1979 IBM PC in year 2000 then complaining you can't put a decent sized hard-drive in that.

To be fair, you totally *CAN* put a large hard disk in one of those old 5150s. You will need the XT-IDE bios (which is a FOSS project btw) loaded as an option ROM, but it is TOTALLY doable. One popular way is to load the XT-IDE bios image into a network card's netboot flashrom, and have it execute as an option rom that way. (Usually a jumper on the network card itself for such old systems, since the main BIOS would scan the adapter rom region for magic byte IDs at 16kb increment, and if it finds the right bytes, it will begin execution there. This is how "Boot from LAN" worked back in the stone-age. In modernish systems, it's done by changing boot order to include LAN as an option.) The ROM image does not really care too much about the spec of the IDE controller it detects and then controls, since it is made to be generic, nor about where in adapter rom space it loads from-- but it handles the INT13 interface, and can often do LBA24 addressing just fine, even on those old klunkers.

What you will find instead though, is that on such ANCIENT HARDWARE, which is hobbled by the OLD AS HELL 8 bit bus architecture of such a system, that the data throughput makes the PS2's ATA-66 look lickety split in comparison. (we are talking transfer speeds of 2 to 5mb sec, tops, using PIO mode 4, in 8 bit transfer mode.)

Lord Shonus · « **Reply #154351 on:** November 14, 2020, 05:23:50 pm »

Quote from: Reelya on November 14, 2020, 05:35:13 am

Quote from: LordBaal on November 14, 2020, 05:06:52 am
On the other hand, software development has seen some bad sliding towards careslesness as more powerful systems become readily. With seemingly more sloppy code and a tendency to not measure things like memory ussage, to the point, several browsers in example are memes now.

However one core point is that these aren't actually inefficiencies, they're externalities. This is not targeted at anyone but it's just some general observations.

When Google makes Chrome, they are optimizing things, but they're optimizing the total resources of Google that's allocated to making Chrome, while balancing that with things like usability, flexibility and market penetration. Optimizing the exact amount of memory the program takes on your PC isn't a heavy design constraint, and certainly not a key one for the success or failure of the product. After all, if the product fails it doesn't matter how "optimal" you claim it was so everyone "should" have used it. The thing is, those over-engineered products which "work really well" tend to fail for this same reason. They're not sustainable.

Sure, it's possible to say that Google should make the browser that uses the least amount of possible resources on anyone's machine. However making a browser that way would vastly increase costs in every other aspect. Optimizations are costly and time-consuming. Optimizations are specific to specific people's hardware. Optimizations are buggy and error prone. Optimization are highly tied to specific workloads and tasks.

So, Chrome development is optimized in these ways: make a flexible product that has good error-checking, that runs well on anyone's machine no matter what hardware they have and multiple different OS's, and deliver frequent updates that lack bugs. These constraints are not consistent with the idea of making the fastest possible to-the-bare-metal browser you possibly can. You can get a to-the-bare-metal browser, just don't expect to get frequent updates or a flexible plug-in system at the same time. So it's not actually that suitable for most people, so they don't get a big market share.

Users, too, are looking at trade-offs when they select a browser. The few bucks needed for more RAM is worth it to most people to have a browser that's seamlessly updated with security patches and works well for any website they need. It's why Windows is generally preferred over Linux: not because people are dumb, but because if you use Linux most people will take a productivity hit, that's basically a waste of time that they could be spending on doing their actual job.The constrain here is time. Say you save 10 hours worth of labor by buying Linux rather than Windows, but you have to spend 1 hour extra per week on IT maintenance as a result. Well, your initial saving was blown away after only 10 weeks, and now the primary resource you really care about, free time (and possibly productivity if this is a work machine) is impacted. That's time you could have used to learn a new skill or something but instead you spend it on your cheaper PC to keep things ticking along like they would on Windows by default.

If Chrome uses "excessive" memory or hard-drive space, that's because it has to, to do what you're asking of it. Any low-hanging fruit that can be cut off has already been cut off. The thing "oh this uses up all my RAM, they didn't design it well" is the externality I mentioned. No, they designed it how they needed to design it to get it to the point where you're actually using it and it's maintained and updated, bug free with a consistent update cycle. Maybe they can squeeze down some RAM for your specific use-case but only at the expense of having a much harder to maintain product with longer cycles between updates, and vastly more testing to ensure their optimizations don't fuck up on different machines. This is the limitation of trying to design a product that works the same for everybody.

There's also the fact that "RAM usage" is not the be-all and end-all of performance metrics. One of the reasons Chrome (and other modern browsers) takes up such huge amounts of RAM is that they do a lot of compartmentalization for their various operations. This results in a lot of "wasteful" duplication, but ensures that if your Youtube video or browser FPS or whatever crashes it doesn't take the entire browser with it.

Rolan7 · « **Reply #154352 on:** November 14, 2020, 05:43:42 pm »

Okay but can Firefox maybe add a check to recognize that it's taking up 80% of my memory and maybe removing some tabs??

I only want it to switch some tabs out of memory to drive space.
Is that really an unassailable goal?

In the meantime, Firefox is by far the best browser. I stand by that.
I just have to restart it, sometimes.

wierd · « **Reply #154353 on:** November 14, 2020, 05:49:31 pm »

sandboxing is a thing, yes, but another is "unrolled loops."

https://en.wikipedia.org/wiki/Loop_unrolling

I have had this argument MANY MANY MANY times on slashdot, where developers act like you just suggested porking their mother in church if you suggest that maybe they should have reservations about doing this. It does not matter that you can give them concrete examples of the OS doing it too, along with examples of driver stacks doing it, and even web applications themselves doing it, all trying to get "That performance boost!!".

You can pull out metrics that trying to glom through that much RAM totally kills the performance, because running from a tight loop structure from CACHE is actually FASTER once everyone and their brother starts trying to jam unrolled loops in everywhere. It does not matter, they just get more and more upset with you for daring to question their architectural decision.

The real-world examples of this come from examining analogous code from 20 years ago, intended to run on a processor that runs at 200mhz, and has very tight, compact loop structure that wants to be exectuted from processor cache, against the modern unrolled loop structure that is intended to run on ~2ghz processors, that glom through gigabytes of RAM for its own personal piggybank. The two examples of code "Do essentially the same thing", but the one intended for the 200mhz processor does it "HELLA FASTER", because it is not constantly shuffling large quantities of RAM in and out.

methylatedspirit · « **Reply #154354 on:** November 14, 2020, 06:55:47 pm »

So what you're saying is that unrolled loops are only always good on systems that lack any form of cache. Once you're on anything even remotely recent, it stops making sense to use unrolled loops all the time, since that means that you end up stalling the system as it tries to access memory because your code won't fit in cache. You ironically lose performance by doing something that "should" give you more.

Or is the real solution just giving everything 512 megs of L3 cache, because that's clearly not wasteful at all?

bloop_bleep · « **Reply #154355 on:** November 14, 2020, 07:40:07 pm »

The answer to this is simple.

Measure it. Is it faster and still maintainable enough? Use it. Otherwise? Don’t.

Also I’ve heard compilers may unroll small loops for you already.

methylatedspirit · « **Reply #154356 on:** November 14, 2020, 07:57:36 pm »

Quote from: bloop_bleep on November 14, 2020, 07:40:07 pm

Measure it. Is it faster and still maintainable enough? Use it. Otherwise? Don’t.

Now, I'm probably gonna sound stupid or something (not a programmer myself), but how would you optimize code for crappier systems than yours if you don't have a crappy system on hand? What might fly on your 10th-gen i5 would chug hard on a Pentium 4 (then again, everything chugs on those things, so probably a bad example). That's a problem if you want to have compatibility over a wide range of devices and configurations.

bloop_bleep · « **Reply #154357 on:** November 14, 2020, 10:38:42 pm »

Well if you were targeting a Pentium 4 and that was an important target to you you would have a Pentium 4 there to measure on. If you want your code to support some lowest common denominator of some kind you’d probably have that on hand. People are generally bad at guessing performance. Measurement is important.

wierd · « **Reply #154358 on:** November 15, 2020, 02:11:19 am »

Quote from: methylatedspirit on November 14, 2020, 06:55:47 pm

So what you're saying is that unrolled loops are only always good on systems that lack any form of cache. Once you're on anything even remotely recent, it stops making sense to use unrolled loops all the time, since that means that you end up stalling the system as it tries to access memory because your code won't fit in cache. You ironically lose performance by doing something that "should" give you more.

Or is the real solution just giving everything 512 megs of L3 cache, because that's clearly not wasteful at all?

No.

The reason to use unrolled loops has to do with branch prediction.
https://en.wikipedia.org/wiki/Branch_predictor

Basically, with a very tight loop, you want to have "The most likely to be executed" path stored in the cache, with data the loop will need prefetched. This is to avoid that unsightly necessity where you have to fetch data from the memory bus, which is many times slower than operating out of the cache, and introduces whole operation cycle waiting times. (which is why it does not make sense to do, from a performance standpoint, unless your loop uses an absurd number of conditional checks. If your loop is both tight AND lean, it will execute faster than an unrolled, but larger than the cache can hold bit of code, because even though there is conditional code being executed, the time to execute the condition is smaller than the penalty for hitting the memory bus-- OR, where there is a significant chance that your conditional logic will need to pull data in other than what is prefetched, in which case you have the extra loop control conditional checking hit, AND the memory bus access hit. This latter problem is usually able to be resolved with more clever design of the loop though, but that gets into the "My time is VAAAAAASTLY more important than YOURS, you FILTHY USER." attitude of many programmers.)

When your loop is unrolled, there is no branch. The code is one long string of spaghetti. HOWEVER, as the wiki article on unrolling points out, it HIDES this performance hit. Instead of sailing along at fucking warpspeed and then suddenly go "Deerrrrrrrp" for a moment (as either branch prediction fails and a cache miss happens, or when memory must be accessed for some other reason), then sailing along at warpspeed again-- it instead stays in Derrrrrrrp type speeds, because it is constantly hitting the memory bus.

It is a useful technique for code that is timing sensitive, such as for code that needs to run in a multiprocessor environment, and which needs to have cache and data concurrency.

What I am saying, is that for most purposes where you want actual speed performance, you want to use a tight loop, and use the branch predictor inside the processor in your compiler optimization flag set-- NOT glom down a huge memory array, with code that chipper-shreds it, and use unlooped code to make sure you stay concurrent, should another processor need that data.

Don't have the gall to say it is a performance enhancement when it really isn't, and where real-world examination shows it is quite a lot slower after you do it, vs code that is tightly looped.

Don't pretend that it is always OK to do it either, because when *EVERY* bit of software in the computer tries to do it, you make a high end bit of hardware run like a fucking idaho potato. (Because if two or more processors exist in the system, and both processors are trying to use the same data, and so you are using unlooped code, then access to the databus has to wait for one processor to be done with it before the other can work, and the previous processor then has to wait for this second processor to finish working with the memory before it can continue. The wait for the memory bus is made EEEEVEENN LOONNNGER-- When that happens *ALL THE FUCKING TIME* because *EVERYONE* is doing it with their code, the whole system acts constipated, all the time.)

It makes the programmer rockstars on slashdot get severe butthurt to make them re-evaluate their architectural decisions in a "bigger picture" framework. Ideally, only routines that must share data across processors should be unrolled, and memory structures should be such that program data and application data have some degree of separation, and the program code is tolerant of going "Derrrrrrrrrp" every so often when data has to be fetched from the memory bus.

Doing that requires having significantly more "Give a damn" though, and "My time is super duper important, and shit, and your time as an end user is not of concern to me", so they get really shitty when you suggest that user experience and actual performance requires more give a shit than they are willing to put down.

Reelya · « **Reply #154359 on:** November 15, 2020, 04:08:30 am »

Quote from: wierd on November 15, 2020, 02:11:19 am

When your loop is unrolled, there is no branch. The code is one long string of spaghetti. HOWEVER, as the wiki article on unrolling points out, it HIDES this performance hit. Instead of sailing along at fucking warpspeed and then suddenly go "Deerrrrrrrp" for a moment (as either branch prediction fails and a cache miss happens, or when memory must be accessed for some other reason), then sailing along at warpspeed again-- it instead stays in Derrrrrrrp type speeds, because it is constantly hitting the memory bus.

This explanation doesn't feel right. First, how branch prediction actually works.

Quote

In computer science, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps (the eponymous "pipeline") performed by different processor units with different parts of instructions processed in parallel.

Loop unrolling, branch prediction and cache hit prediction are actually not the same thing, at all. Loop unrolling is about avoiding the instruction hit on each iteration, since you have to process the jump instruction. And, in a loop you need to keep track of the loop counter, which means you're really doing multiple extra instructions per loop: update the counter, test the counter, then jump if the test is true. If the test is false, that doesn't cause a memory page load, which is laggy. That's not what's happening.

How that intersects with branch prediction is that each time through the loop, the branch predictor will correctly predict that the loop will continue, except on the last iteration, in which case it will make the wrong call, thus have to flush the instruction pipeline. but, categorically, this is nothing to do with memory cache hits.

As for branch prediction, imagine a program as a restaurant with a chef and 19 assistants. If you know what you're making then you might make 10 souffles an hour. But, if the chef suddenly says to stop making souffles and make a lobster, then all 20 people's prep work needs to be thrown out and started from scratch: but because it's a pipeline, we throw out pre-prep for 19 souffles. We effectively lose 20 souffles (10 souffles worth of completed work and 10 souffles worth of people standing around waiting for the new prep to get up to them), because there was a surprise lobster in the mix.

CPUs do branch prediction, which means they store some data about what path was taken last time, and try and prep for that same result next time, so if last time we made souffle then we prep for another souffle. So CPUs read-ahead in the instruction queue and pre-process the instructions, in such a way that all parts of the CPU are doing something.

The absolute worst-case scenario therefore for an "if" statement is one which flip-flops between true and false. For example, if you loop through some numbers, and say if it's odd, do this, and if it's even, do that, that's going to royally fuck with the branch prediction. What you would do is unroll the loop into pairs, then do the odd value and even value on different lines. This would be like the chef telling the staff ahead of time that they always alternate between souffles and lobsters. But, unless there's a flip-flopping "if" statement inside your loop, the loop itself is only going to fail on branch prediction on the final iteration where it leaves the loop, not each time through the loop.

bloop_bleep · « **Reply #154360 on:** November 15, 2020, 12:51:44 pm »

Yeah, you guys are saying what I was thinking about. IIRC loop unrolling helps because it decreases the number of branches dealt with, but loops are generally easy branches to predict so branch prediction nowadays removes most of the benefit of unrolling.

McTraveller · « **Reply #154361 on:** November 15, 2020, 01:23:42 pm »

Rule number one of programming: "Don't hand optimize."
Rule number 2 (for experts only): "Yet."

Regardless of loop unrolling or whatever, the point still stands that we had graphical spreadsheets and word processors on a Commodore 64 with the same basic functionality of Office that, well, could run on a Commodore 64.

I don't know why modern programming is in the state it's in - and I've been programming for more than two decades. It definitely makes me go WTF - I just don't understand what it is that is going on in the code paths on some modern apps. I almost said I can't imagine what programs are doing that make them so slow. Well, yes I can - it's a lack of understanding of big-O scaling of their algorithms and/or really poor data structure design. Couple that with use of libraries that themselves had questionable design (ahem, why does a chat program *cough* Teams *cough* Slack *cough* require hundreds of megabytes of RAM? Why does it register as multiple percent of my CPU load?) and it's clear.

MrRoboto75 · « **Reply #154362 on:** November 15, 2020, 03:06:53 pm »

Its because the limitations of hardware at the time required optimization and innovation.

Give people no limits and creativity diminishes.

-----

Today I tried to buy multivitamins. Within like 5 minutes of me getting home I try to open the child safety cap on it ("Press down and turn") and the cap just straight up breaks. It turned maybe twice and got stuck, and it won't turn open or closed. So in a bit of anger I grab a pocket knife and just saw open the bottom and cut it open like a tin can. I just dumped the gummies into a ziplock bag to store them.

Naturegirl1999 · « **Reply #154363 on:** November 15, 2020, 03:17:23 pm »

Quote from: MrRoboto75 on November 15, 2020, 03:06:53 pm

Its because the limitations of hardware at the time required optimization and innovation.

Give people no limits and creativity diminishes.

-----

Today I tried to buy multivitamins. Within like 5 minutes of me getting home I try to open the child safety cap on it ("Press down and turn") and the cap just straight up breaks. It turned maybe twice and got stuck, and it won't turn open or closed. So in a bit of anger I grab a pocket knife and just saw open the bottom and cut it open like a tin can. I just dumped the gummies into a ziplock bag to store them.

Responses are to their respective word strings relative to where the 5 hyphen string is
Why would no limits hinder creativity?

- - - - -

Spoiler (click to show/hide)

scriver · « **Reply #154364 on:** November 15, 2020, 03:25:17 pm »

My tv remote seem to have broke.

It's shit that you can't do any interfacing/usage from the actual contraptions these days.

Bay 12 Games Forum

News:

Author Topic: Things that made you go "WTF?" today o_O (Read 14964586 times)

wierd

Re: Things that made you go "WTF?" today o_O

Lord Shonus

Re: Things that made you go "WTF?" today o_O

Rolan7

Re: Things that made you go "WTF?" today o_O

wierd

Re: Things that made you go "WTF?" today o_O

methylatedspirit

Re: Things that made you go "WTF?" today o_O

bloop_bleep

Re: Things that made you go "WTF?" today o_O

methylatedspirit

Re: Things that made you go "WTF?" today o_O

bloop_bleep

Re: Things that made you go "WTF?" today o_O

wierd

Re: Things that made you go "WTF?" today o_O

Reelya

Re: Things that made you go "WTF?" today o_O

bloop_bleep

Re: Things that made you go "WTF?" today o_O

McTraveller

Re: Things that made you go "WTF?" today o_O

MrRoboto75

Re: Things that made you go "WTF?" today o_O

Naturegirl1999

Re: Things that made you go "WTF?" today o_O

scriver

Re: Things that made you go "WTF?" today o_O