Organizing my thoughts anyway, might as well share:
So to search for a rock's hard to predict properties and its name (weathering resistances, density, melting point, name), you look it up in small lookup tables according to the following flow chart. Spots in between raw entries are filled in at startup with appropriate blends of nearby neighbors in that subspace.
Sort by major class first:
a) Igneous Extrusive -- Lookup the rock in a dynamically generated table of "deep chemicalspace"* (16-bit)
b) Igneous Intrusive -- Same, but a different table than extrusive.
c) Sedimentary -- classify by flags into biological, chemical, or clastic first:
Biological: By default there is only one kind, which is lignite (further coals are metamorphics of lignite and limestone is treated as chemical). Lignite will occupy part of an abstract "biological space" of which it is the only member by default. But if modders wanted to have 3 kinds of special separate coal series, they could use that bio-space to separate them and define as they please (i.e. coming from different soils, also eventually customizable probably to perhaps themselves come from customizable types of lifeforms.)
Chemical: Lookup by dynamic table of deep chemical space only (limestone, halite, gypsum, etc.)
Clastic: Lookup in a 2-dimensional table of grain size by deep chemical space (This table is bigger than all the others combined at ~15 megabytes, still reasonable probably).
d) Metamorphic -- begin by using special flags that keep track of what major type (of the other 3) the parent rock was. Each of the three have their own 3-D lookup tables: highest temperature on record as of last metamorphosis (4-bit) x highest pressure as of last metamorphosis (4-bit) x shallow chemical space** (8-bit), still < 1 megabyte each. Rocks can be double or triple listed in the different tables -- for example, high grade metamorphism gneiss doesn't really care what the original rock was, all that texture has been smashed out, so it just gets repeat-listed in the raws to represent that.
*16-bit "Deep chemical space:" (whichever is closest to actual value)
Oxygen: 3 bits, 0-7 each step represents step*10% by weight -- helps classify everything
Silicon: 3 bits, 0-7, non-linear space, by step: 0%, 15%, 20%, 24%, 26%, 28%, 30%, 32%, 40% (the narrow, highly sampled range is crucial for bouyancy and many classifications, and pure quartz only gets up to 47%) -- mafic versus felsic rocks in middle, and 0% / 40% are generally crystalline minerals of different kinds
Aluminum: 2 bits, 0-3, 0%, 10%, 20%, 30% -- paraluminous rocks and feldspars
Iron, Magnesium, Calcium: each 2 bits, same as aluminum. -- all sorts of mafic rocks, ores, and limestones, etc.
Alkali metals (sum of sodium and potassium): 2 bits, 0-3, 0%, 3%, 6%, 9% -- mostly indicates feldspar type, also giveaway for halite (9% and everything else 0's)
**8-bit "Shallow chemical space:" At startup, take all of the rocks defined in the raws in that search space, and procedurally determine the most diagnostic single cutoff for each individual element (the level at which closest to half and half of the raws fall on the two sides). Define that for this run of the application as 0 and 1 for each chemical. For silicon, allow 4 steps, attempt to equally divide raws amongst them, and give it 2 bits. Total, this all now adds up to 8 bits, keeping the metamorphic search spaces reasonable (otherwise they take up half a gig)
Toy example including only silicon and oxygen, being made into shallow version:
Note that it doesn't have to perfectly divide like this. Still just one of three metamorphic classifying dimensions.