Topic: Random thoughts - On the Origins of "I Could Eat A Horse" (Read 218987 times)

Frumple · « **Reply #1440 on:** April 08, 2021, 10:14:41 pm »

... why isn't five million output images feasible, again? Those words read to me like saying it's not feasible for several million word works of fiction to exist, tbh. Maybe it isn't, but that doesn't stop people :V

methylatedspirit · « **Reply #1441 on:** April 08, 2021, 10:35:59 pm »

Consider that I'm working with raw image data, in 24-bit RGB. Uncompressed raw image data. The image dimensions are 2000 x 3000. This means:
Size per raw image = 2000 px * 3000 px * 3 bytes per color = 18 MB per image.

So, uh, 18 megs times 5 million would be 90 terabytes, if I've understood my SI prefixes correctly. That's feasible for a large server, completely inaccessible for the average Joe. Imagine how much you'd pay in hard drive space!

Hell, even with the lower bound of [128, 500], you'd still need 2 and a bit terabytes. Maybe you have the budget for a 4 TB hard drive. I don't.

But, hey, if you wanna try, I've patched my Python script to iterate over resolutions instead of bitrates! You'll need a copy of FFmpeg in PATH and a 2000x3000 raw image as input. Oh, and you'll still need an image editor that will support opening raw image data to open the output files. I suggest Irfanview.

Spoiler (click to show/hide)

Code: [Select]

import os

filename = "foo-bar"
extension = ".raw"
intermediate_extension = ".mkv"
encoder = "libx264"
pix_fmt = "rgb24"
framerate = "30"
for a in range(128, 2000):
    for c in range(128, 3000):
        inSize = str(a) + "x" + str(c)
        b = 64000
        input_side_enc = " -f rawvideo" + " -s " + inSize + " -r " + framerate + " -pix_fmt " + pix_fmt + " -i " + filename + extension
        output_enc_filename = filename + "-" + encoder + "-" + str (b)
        output_side_enc = " -c:v " + encoder + " -b:v " + str(b) + " " + output_enc_filename + intermediate_extension
        cmd_enc = "ffmpeg" + " -y " + input_side_enc + output_side_enc

        input_side_dec = " -i " + output_enc_filename + intermediate_extension
        output_side_dec = " -f rawvideo" + " -pix_fmt " + pix_fmt + " " + output_enc_filename + extension
        cmd_dec = "ffmpeg" + " -y " + input_side_dec + output_side_dec
        
        print(cmd_enc)
        print(cmd_dec)
            
        ### I hereby waive any responsibility for the shitshow that will occur on your system if you are to uncomment the following lines.
        #os.system(cmd_enc)
        #os.system(cmd_dec)

Code: (Copyright notice) [Select]

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to <http://unlicense.org/>

King Zultan · « **Reply #1442 on:** April 09, 2021, 02:35:20 am »

So what exactly is the endgame for this whole project?

methylatedspirit · « **Reply #1443 on:** April 09, 2021, 04:42:51 am »

There is none! Just like before. Not every project has to have a satisfying ending; do most things have satisfying endings? Does every movie have a good ending? Do most people die in a blaze of glory? Is every discovery as useful as the Discrete Cosine Transform? Good endings are statistically very, very rare. It's nice if anything I do ends up being significant in the future, but it's not likely.

The metapoint in me documenting these things is that I'll be able to recall elements from these explorations ("project" implies a definite goal) that'll be used later on. Whether it'll bear fruit now is irrelevant. I'm sure Einstein had a couple of duds before coming up with Relativity, but to say that those duds were entirely worthless because they weren't themselves revolutionary is to deny the fact that knowledge is incremental. Rome wasn't built in a day, just like knowledge can't be.

I've basically been guessing at things to do with raw data, and the ones that produce viable results are just lucky. This one... I'll let it incubate in the Idea Machine, freely mixed in with a hundred different ideas. Something will happen in there, and I have no way of knowing what.

taat · « **Reply #1444 on:** April 09, 2021, 07:55:35 am »

You don't actually need to save the files to disk, you can just discard every image that doesn't set a new "noisiness" record (unless you want to compare every image to every other image or something horrible like that)

Granted iterating all 5 million images would probably take several weeks still, and you need an actual objective way to determine noisiness

methylatedspirit · « **Reply #1445 on:** April 09, 2021, 08:12:51 am »

Quote from: taat on April 09, 2021, 07:55:35 am

You don't actually need to save the files to disk, you can just discard every image that doesn't set a new "noisiness" record (unless you want to compare every image to every other image or something horrible like that)

Ayyy! You've moved the problem from "infeasible" to "doable given enough time"!

Quote from: taat on April 09, 2021, 07:55:35 am

Granted iterating all 5 million images would probably take several weeks still

I do have a little Intel NUC sitting at home that I can control remotely. I suppose the computational requirement is basically done; that thing's just sitting around doing mostly nothing for months on end. I've been itching to find dumb ways to use all 2.8 GHz of its power. It's "cloud computing". I think I (or someone else; I put it in public domain) could repatch that script to automatically create PNG files and do nice image comparison stuffs; FFmpeg's very versatile, and I've made that script platform-agnostic without even realizing it.

Quote from: taat on April 09, 2021, 07:55:35 am

and you need an actual objective way to determine noisiness

I dunno, my first idea was "use SSIM to compare between a random noise image and the current image", but I really don't think that was the intended use for such a metric. The theory there would be "find the image that most closely resembles noise", but I doubt it works at all like that. Not when you're literally pitting noise against slightly-less noisy noise to see if slightly-less noisy noise is noisier than the noisiest noise found thus far.

Frumple · « **Reply #1446 on:** April 09, 2021, 09:14:06 am »

Who cares about intent if it works, though? Might as well try!

taat · « **Reply #1447 on:** April 09, 2021, 09:41:01 am »

Now that I think of it, the best way to figure out how noisy the image is would probably be to use compression, since images with higher entropy/noise are less compressible. either see how large a losslessly compressed result is (though the fact that thedimensions of the image change with the parameter may skew the result) or see how large the difference between a lossy compressed image and the original is

There's still the problem that different compression methods give different results. And the absolute best compression methods are extremely slow

McTraveller · « **Reply #1448 on:** April 09, 2021, 07:17:58 pm »

I think you can use a FFT or other spectral analysis, which can be very fast (much faster than compression) and get a deterministic metric describing the noise.

For a FFT, if the levels of all frequencies are the same, that basically means it's random and as noisy as you can get. If you have peaks at one or more frequencies, that means it's not random.

You could probably do the FFT, do a standard deviation on the levels with respect to frequency, and the lower it is, the more noisy it is.

methylatedspirit · « **Reply #1449 on:** April 11, 2021, 04:58:23 am »

I've been thinking of this "uncompressed data conversion" matrix. I've sorta wanted to review it. Here:

Code: [Select]

                            Increasing bandwidth
                       ----------------------------->

                       +-----+-------+-------+-------+
                     | |\    |       |       |       |
                     | | \ In|       |       |       |
                     | |  \  | Audio | Image | Video |
                     | |Out\ |       |       |       |
                     | |    \|       |       |       |
                     | +-----+-------+-------+-------+
                     | |     | Audio | Image | Video |
                     | |Audio|  to   |  to   |  to   |
                     | |     | audio | audio | audio |
Increasing bandwidth | +-----+-------+-------+-------+
                     | |     | Audio | Image | Video |
                     | |Image|  to   |  to   |  to   |
                     | |     | image | image | image |
                     | +-----+-------+-------+-------+
                     | |     | Audio | Image | Video |
                     | |Video|  to   |  to   |  to   |
                     | |     | video | video | video |
                     v +-----+-------+-------+-------+

If we were all to go back to dial-up over POTS lines (the technical term is DS0), we'd end up converting all our data to audio at some point. All data ends up as audio. Hell, even T-whatever and DSL are literally transmitting data over many POTS lines at once, so that too is actually audio at some point. Interesting point, but not the one I'm focusing on.

See, I'm thinking of the idea that any uncompressed data can be converted to any of these formats (text is a valid format, but too-low bandwidth to consider here) with enough prodding. With a lossless encoder, it must always yield a result that's identical to the original when decoded back to its original form, though you may lose a couple bytes converting to video. With a lossy encoder... you'll get interesting results. Even more prodding is required in practice, but I know it can yield viable results. Something that's implied here is that the third and final step is always "> (Original format)". I am not touching that rabbit hole of doing stuff like "Audio to video to image to audio to image to...".

The trivial ones are the X to X ones. These do have some use, but it's mostly what is called "digital generation loss". If you've ever seen those "deep-fried" memes, that's an example of going "Original image > JPEG > JPEG > JPEG > (Superimpose smaller image on top) > JPEG..." until you end up with a suitably-degraded image. 'Video to image' and 'Image to video' are also trivial. That's under "no shit, Sherlock": all videos are series of images; series of images can be compiled into videos.

So that's 5 excluded for being too boring. This leaves 4.

Image to audio is pretty good. Depending on your choice of lossy codec, you could end up with some good effects, though it's a lot of blurring and line-by-line artifacting at lower bitrates.

Video to audio is rather limited, as far as lossy codecs go. Using Bad Apple as a test video (as you do), there really aren't many codecs that run at less than 32 kbps to yield suitable amounts of distortion. Off the top of my head, they are Opus, AAC, Speex, and G.7-something (it's one of the GSM codecs). Then again, I'm using black-and-white footage; it's not really showing off the full power of this technique. God, I need good, freely-licensed test footage. I'm aware of Xiph's collection of lossless footage, so I really should look at it in depth.

Audio to image... I tried it once, ghastly results. I should really revisit it; maybe screwing with royalty-free music might be seen as "ironic", even if I haven't a clue what I'm doing.

Audio to video is also pretty good. All my ears can hear is quantization noise; it's great! Seems to need bitrates below 32 kbps, and there aren't many video codecs able to operate at this low a bitrate. Then again, I could always increase the internal resolution to evade this issue entirely. It's good as long as you intend it to be audio at the end. Just 'audio to video' yields something like:

methylatedspirit · « **Reply #1450 on:** April 13, 2021, 08:02:30 pm »

The thing about trying to degrade raw image/video data using lossy audio-based methods (recursive re-recording, lossy audio codecs) is that color is the first thing that dies out. Always. Doesn't matter which color space you use.

In RGB-land (and other color-component-separated color spaces), the R, G and B components in a(n) image/video are almost always strongly correlated, at least when working with footage of things in real life. If you were to define the average RGB component, Av = (R + G + B), then my strong suspicion is that the values Av - R, Av - B, and Av - B, are very "weak" relative to the magnitude of Av. As such, the first things to go are the colors, since they're the weakest part of the "signal". Those are the parts that tend to get "smoothed-off" first, as it is. Even if each component is processed individually (and they are), color is still the first thing to go.

In YUV-land (or any color space that defines a grayscale component and a set of chroma components), a different problem arises. The grayscale component is the "sharpest" signal. That's where your detail is. The chroma parts are very "blurry", as it is; the color is painted on top of a grayscale image. When you process this, you end up losing the already-blurry chroma signal first. This is why VHS tapes lose their color long before they lose their grayscale if you induce generation loss by recursively re-recording tapes. There's simply very little detail to preserve in the color components. Don't blame them for this, blame evolution. Our eyes have good grayscale perception, but weak color perception. As such, blurry color is always harder to spot than blurry grayscale.

So at this point, does there exist a color space that doesn't suffer from either issue? If there is, there wouldn't be much incentive to support it; any such 'space simply doesn't follow human visual perception, nor has the convenience that RGB offers for our tech. Still, I'd consider a color space in which the grayscale and color parts degrade at roughly the same rate an interesting thought experiment, if such a thing can exist. Send me the encoder and decoder for that if you make one. I wanna see that shit.

bloop_bleep · « **Reply #1451 on:** April 14, 2021, 01:35:22 am »

The deviations from average brightness are high frequency and low amplitude, so I can see why they could get shaved off. You could just try encoding the color channels separately and then interleaving.

methylatedspirit · « **Reply #1452 on:** April 14, 2021, 04:54:46 am »

Quote from: methylatedspirit on April 14, 2021, 04:54:46 am

So at this point, does there exist a color space that doesn't suffer from either issue?

I'll respond to myself. I have tentative evidence to say that there likely does exist such a color space, and I sure as hell didn't expect it to be widely available. This is a frame from Big Buck Bunny, encoded in YUV 4:2:0, and pushed through the reference Opus audio codec (libopus) at 8 kbps. The wraparound isn't intentional, but I think it adds to the VHS-like aesthetic. The ones next to it are the same thing using YUV 4:4:4, again with 24-bit RGB, and the original.

For the YUV color spaces:
Extreme loss of detail on all channels, check.
Colors retained despite that, check.
Unintentional artificial VHS look, check.

Oh, and the attribution? "The movie frames shown above are (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org". Check. (Hey, between this and paying someone to use their stuff, I'd much prefer this)

Edit: I tentatively take it back. YUV spaces tend to do quite well with color preservation, at least when working with video and using lossy audio codecs. RGB 24-bit suffers from sudden color death. For Big Buck Bunny over there, the point of sudden color death is between 20 kbps and 20.5 kbps with libopus.

I wonder if 'abuse of lossy audio codecs' and 'generation loss by recursive re-recording' actually behave differently with respect to color spaces and color preservation under their respective forms of degradation.

Quote from: bloop_bleep on April 14, 2021, 01:35:22 am

The deviations from average brightness are high frequency and low amplitude, so I can see why they could get shaved off. You could just try encoding the color channels separately and then interleaving.

Wait, isn't that what interleaved RGB is? I'm having trouble trying to interpret what you mean. If I just induce exponential generation loss on this here innocent picture...

And those are, left-to-right, the original, pass 1, pass 2, pass 4, pass 8, and pass 16. If you can see the hue shifting a bit, your eyes aren't fooling you. Interleaved images are pains-in-the-ass to deal with, precisely for this reason; they can sometimes shift hue by 30 degrees (rather than the 60 you'd expect from byte misalignment), which I didn't believe until I put some calibration bars on an earlier image.

It's doing pretty good color-wise given that's 16 iterations. If that's what you meant, that I'd agree.

Here's one I "baked" earlier: a planar RGB image. These are the original, pass 1, 2, and 3.

It's not even on an exponential scale, and the color's still fading so damn hard. It's almost disheartening to see the color fade so quickly. A shame, too. I love working with planar images because they don't suffer from the hue-shifting issue that interleaved images do.

bloop_bleep · « **Reply #1453 on:** April 16, 2021, 01:45:29 am »

I mean separate the channels first (uninterleave), encode each, then interleave again.

methylatedspirit · « **Reply #1454 on:** April 16, 2021, 02:52:46 am »

My first thought was:

1. Start with RGB,
2. Split into planes of R, G and B with FFmpeg's extractplanes filter.
3. Use mergeplanes to generate colorized versions of the initially-grayscale planes by using one plane each as input, since extractplanes yields grayscale versions.
4. Encode those components as YUV.
5. Pipe those individually through an audio system, save them. (Or perform some other databending activity)
6. Re-encode those back to RGB.
7. Use extractplanes to extract only the individual components.
8. Use mergeplanes to merge together the individual components back to RGB.
9. Done.

Incredibly convoluted. That's Rube Goldberg-levels of complexity. That's, like, 15 individual FFmpeg operations, probably fewer if you used filtergraphs (which are black magic to me).

That, or you mean:

1. Start with RGB.
2. Split into RGB planes with extractplanes (remember, the output planes are grayscale).
3. Re-encode them back to RGB.
4. Pipe them individually through an audio system.
5. Use extractplanes to extract only the individual components in the RGB-ized components (which themselves represent RGB planes)
6. Use mergeplanes to merge them back to RGB.
7. Done.

I think that's 8 individual FFmpeg operations without filtergraphs.

I'm not sure of any other interpretations. Is this what you meant? "Interleave" is a 'reserved word' to me within this context; to me, it can only ever mean the way in which some pixel formats are stored RGB RGB RGB (...) instead of RRR (...) GGG (...) BBB as planar formats are stored (replacing R, G and B with the appropriate components in the general case). "Packed" is also reserved for this reason. That's why I thought you meant interleaved RGB.

Bay 12 Games Forum

News:

Author Topic: Random thoughts - On the Origins of "I Could Eat A Horse" (Read 218987 times)

Frumple

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

King Zultan

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

taat

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

Frumple

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

taat

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

McTraveller

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

bloop_bleep

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

bloop_bleep

Re: Random thoughts - On the Origins of "I Could Eat A Horse"

methylatedspirit

Re: Random thoughts - On the Origins of "I Could Eat A Horse"