(Skipping past the diversion into "CAPTCHA clearly has the wrong idea of what a tractor/motorbike/chimney is, but I need to tell it what it thinks or it'll think *I'm* wrong" or "which extended bits of the traffic light (light, frame, pole?) it expects me to select" issues, both of which I've definitely mentioned before, here or elsewhere, as I started on the following overlong post before the last few messages appeared.)
That's assuming the AI can be made genre-blind[1], when we really can't expect it to be so within what passes for its inner thoughts. I see no reason why it cannot be fully aware that that what-it-understands-as-a-CAPTCHA-pattern is present there. The electronic 'id' is used to identifying all kinds of things that it hasn't seen before in the currently presented circumstances, in order to let the electronic 'ego' think it knows what to say given what is presented. (The mediation of 'superego' may be involved.)
You could train it to only respond positively only to certain (most) circumstances, but exclude specific situations with negative reinforcement (I assume this is what they're trying). You could develop (and they have done) image-processing algorithms to latch onto a QR code regardless of how it is presented, decode it and present the resulting data, to which you could add the stipulation to not reveal anything that
if it was "http://..."-starting datastring and there was a green backround (but let through any other thing 'within green' or whatever had no(t enough) green regardless). Without too much human prodding to get it to work, adding the green-http 'block' rule as a modifier to its original best effort is more likely than seperately constructing all combinations
except the green-http combo as effectively three highly specific ranges of detection separately optimised and worked together without any element of the unwanted detection-range.[2]
That goes moreso if it's an add-on filter, carving away from the 'answerspace' separately, for every such desired carvaway. The 'grandma' framing device (and imagery setting) just indicates a missing 'negative space' necessary to hobble the process. Clearly there was no refusal to even train the software to cover such cases, and it needs actual sufficient negative reinforcement to prevent 'logical leakage' from unambiguously unproscribed <input=>answer>-space over into oddly presented banned area.
[1] Well,
it can, but in a highly evolved way similar to nature exploiting a way to fall for a
supernormal situation that
we are internally built too differently to at all fall for.
[2]
Not that it's going to be so clearcut, anyway, but thinking back to my being in '80s-'90s era lectures where the issue of (in one example) training an algorithm to decide if a person was ellegible for a pension depending upon the simplest boiled-down data of age and gender (when pension ages were disjointed that latter way). The method of assembling 'convex' regions into an additive union of viable subsets (themselves being setwise intersections) is more effort to encourage than (if you have the facility) using a small negating complement to remove a slice from a comparitively simple superset, to produce the convex(/’stepped') intended match.
Obviously a human programmer could do either, fairly simply (for the example of pension-age) at will, but when leaving the mechanism used to the unseen internals of a trainable analyser it is going to effectively either say "above this age, yes, except if not above this higher age and male" or "below this age, no, unless not below this age and female". Rather than automatically know that m/f 'cutoffs' are different and just making two (different) 'up to, no's or 'down to, yes'es as we might.
For such a simple example, perhaps optimisations like the latter are now possible under more flexible 'educatable algorithms', hands-free of human prompting (data-pushed, only, through training of yay/nay examples fo guide it), but it gets more difficult with "hue-background and context-specific content as per the QR example.
And "answer an image query (including reading text of all kinds in all kinds of situations) unless it looks like these particularly prohibitted examples (none of which have been framed as a badly photoshopped insert to another larger image)" is going to be a 'coverall' best-guess covered by an overlaid lesser 'negative mask' that cannot be expected to cover every imaginative variation that can (or will) be thought of.
If modern AI isn't as, in its core, so tightly bound to these atomic setwise concepts (as I'm sure isn't, but for the sake of argument I'll accept fuzzy-blob logic leading to weighted membership-chances as equivalent, and so on), then it's even more impenetrable. Like my dabbled-with "month-number" evolved algorithm[3], of times past.
[3] Just a bit of fun, that, but I had decided to make a converter from month-names to month-number, without being explicit and list-led (although obviously list-
trained).
It could take "February" or "Feb" (or "february" or "FEB") and munge the bits in the supplied text to consistently return "2". "March"/"Mar"/"MARCH", etc returns 3. Which obviously I could do manually by looking for the bits that form "FEB" (minus the unnecessary bits of the ASCII that define "f", "e" and "b" as lowercase). It might also have identified "febrile" as the second month (maybe, maybe not "feather", or even "friend", depending on how 'tight' I made the expectations - after all, there are no other months that start with "f"/"F", although "m"/"M" can't distinguish between third and fifth months, even with a second batch of "A"-like bits checked).
Instead, I set it to decide what (from being given a corpus of month terms) it needed to know, what to mask (case-bits) and how to combine (e.g. whatever number it thought "ma" gave, having an "R" next added two less than if it had a "y" next, to give reslectively 3 and 5 by the point of output), although I just gave it all the tools I thought it might need to bud up layers of bitmasking, bit-extracting, bit-arithmatic and let it work out ways of doing it and narrow down to a (possibly undesignable) least costly way of doing it for every method it thought worked.
Having successfully worked out how to identify "February" as '2', of course I knew that it was testable if it could identify "Februar" (German) or "febrero" (Spanish). Maybe also "Février" (depends if it had a check-state that 'fell over' the e-acute, instead of plain-e). And by now it covers a number of others (Italian "febbraio") by extension, even untrained.
But, it needed different strategies per month. "March", "Mars" (fr) not a problem, where "Fév" (fr. abbrev.) might be, but "März" (De) might trip it up, if not trained to include. "April"/"Avril"/"abril"/"aprile" (considering it needs separation from "August"/"Août"/"agosto") needs to be second-letter flexible, as with other months ("D[eéi][cz]ember") or need to go beyond three characters ("Jun" then "Jul" in English, but at least distinguish "Juin" from "Juil(let)" in French). And away from "Jan" (which is "enero" in Spanish and "gennaio" in italian, the latter having "guigno" and "luglio" as 6th and 7th months also so requiring additional paths to '6' and '7' as output than just starting with "J first, not-A second, then look for N-or-L afterwards and alter result accordingly", in human terms.
I actually trained it with about half the European languages/dialects (plus a smattering of others, that I could get easy ASCII versions of, outside the "latin-inspired" family; but also including roman numerals of I..XII for fun!) to get it to the stage of having boiled down every correct input from the training set through the almost inpenetrable (but surprisingly compact) 'recipe' of turning month-names into month-numbers. Then I tried it on the non-training languages I'd held in reserve (the 'other half' of European languages, plus other external oddities) to see if it was good enough to continue the language-agnostic 'comprehension' of case-insensitive and abbreviation-tolerant conversion to numbers in the right ballpark. (Of course, I also set it non-month words. I'd not forced it to reject "byzanteum" or "Hawai'i" or "Canol y dref", nor even constrain them to either within or without the 1..12 range of output. I figured that adding "infallable rejection of invalid inputs" was going to add a lot more iterations to the training period, so let it slide... But the post-training tests might have given interesting differences between a "quiet" month and "quite" a month, for example.)
I can't recall whether I did this before or after I learnt of the similar attempt to 'evolve' a colour-name-to-RGB-triple converter (as an alternative to a hand-crafted, and fuzzy-match-enhanced, direct look-up table) that could (apparently) do wonders with distinguishing/converging "reddish brown", "brownish-red", "red-brown-yellow", "mucky red", etc, or that sort of thing. And would also have a good go at "tired", "tractor", "liquid" or "Leicester" (obviously just the cast-off 'remnants' of whatever the more positive hue-matching targets left one you started using/allowing Garbage In/Garbage Out). You might still find published descriptions of this attempt (or one or more of them), but my own month-converter was never actually so publically described (this might be the most thorough summary of it, outside of the close circles I thought might be interested in when I originally did it).
Anyway, back (a bit) on subject, I knew I could get it to return "NaM" (i.e. an output for Not a Month), or even get it to refuse to say anything if Month=2 and (separately, but adjacent in the input) day number was greater than 29, but I would have had it work that out itself (at increased training time with excessive negative feedback on known bad examples) rather than starve it of a subset of training/working data by having to imagine all possible wrongness-scenarios and applying hard filters.
Not that I claim to be aufait with modenr GPT-like methodology, but it really can't be that much more complicated, just far higher volume with a vastly larger phase-space and much more massive 'working memory'/etc.