Post History

66%

+2 −0

Q&A What underlying principle is at play for how objective or subjective a natural language instruction is?

The question alludes to at least three correlated, but quite distinct dimensions. Objectivity/subjectivity Room for model's creativity (information theoretical) Crispness of the boundary betwe...

posted 1y ago by Jirka Hanika‭

Answer

#1: Initial revision by

Jirka Hanika‭ · 2024-02-27T15:33:17Z (over 1 year ago)

Copy Link

Raw

Markdown

The question alludes to at least three correlated, but quite distinct dimensions.

* Objectivity/subjectivity
* Room for model's creativity (information theoretical)
* Crispness of the boundary between "correct" and "incorrect" productions.

To define them, introduce an additional agent, perhaps a human, acting as a **referee**. The referee observes of the interaction between the prompt and the model's production and eventually marks the model's performance with a percentage of "correctness": 0 for an incorrect production, 100 for a correct production.

**Crispness** of correctness - Crisp (black-and-white) prompts will mostly solicit productions scored **0 or 100**. Fuzzy (gray area) prompts will mostly solicit productions scored **somewhere in between**. There's no single most popular measure of fuzziness, but you could pick one from literature or invent your own.

**Room for creativity** - For a crisp prompt, define this as the logarithm of the **number** of 100% correct productions for a given prompt. For a fuzzy prompt, you might need something like weighted entropy and/or a "minimum correctness cut-off threshold".

**Objectivity/subjectivity** seems to relate to a **population** of referees. An objective prompt will solicit correlated marks from different referees, whereas for a subjective prompt, it's conceivable that different referees will prefer different productions. Ultimately, you can measure that correlation. But the concept is population-dependent.

It's not unusual to see one of those dimensions used as a proxy for another. If the competitors are people and not language models, and you need a very high degree of objectivity, it often helps if all the prompts are crisp and the room for creativity is zero - that is, if each prompt has exactly one correct production. Such limitations don't deliver any objectivity in themselves, but they make it easier to evaluate objectivity using a population of assessors.

I'm afraid that none of those three dimensions are of primarily linguistic nature, or at least I cannot quite see the connection (and a better answer might be able to point one out).

Communities

Post History