Post History
The question alludes to at least three correlated, but quite distinct dimensions. Objectivity/subjectivity Room for model's creativity (information theoretical) Crispness of the boundary betwe...
Answer
#1: Initial revision
The question alludes to at least three correlated, but quite distinct dimensions. * Objectivity/subjectivity * Room for model's creativity (information theoretical) * Crispness of the boundary between "correct" and "incorrect" productions. To define them, introduce an additional agent, perhaps a human, acting as a **referee**. The referee observes of the interaction between the prompt and the model's production and eventually marks the model's performance with a percentage of "correctness": 0 for an incorrect production, 100 for a correct production. **Crispness** of correctness - Crisp (black-and-white) prompts will mostly solicit productions scored **0 or 100**. Fuzzy (gray area) prompts will mostly solicit productions scored **somewhere in between**. There's no single most popular measure of fuzziness, but you could pick one from literature or invent your own. **Room for creativity** - For a crisp prompt, define this as the logarithm of the **number** of 100% correct productions for a given prompt. For a fuzzy prompt, you might need something like weighted entropy and/or a "minimum correctness cut-off threshold". **Objectivity/subjectivity** seems to relate to a **population** of referees. An objective prompt will solicit correlated marks from different referees, whereas for a subjective prompt, it's conceivable that different referees will prefer different productions. Ultimately, you can measure that correlation. But the concept is population-dependent. It's not unusual to see one of those dimensions used as a proxy for another. If the competitors are people and not language models, and you need a very high degree of objectivity, it often helps if all the prompts are crisp and the room for creativity is zero - that is, if each prompt has exactly one correct production. Such limitations don't deliver any objectivity in themselves, but they make it easier to evaluate objectivity using a population of assessors. I'm afraid that none of those three dimensions are of primarily linguistic nature, or at least I cannot quite see the connection (and a better answer might be able to point one out).