[ prog / sol / mona ]

prog


Everything is Unicode, until the exploits started rolling in

32 2021-01-24 05:04

>>28

Words can be considered eternal.
There's the threat of a dictionary containing degenerations, such as ebonics, but that's a wider social issue.

Are you trolling? I would be hard pressed to think of something as elusive and transient as natural language. Words are but symbols which describe the human experience in fuzzy ways that are hard to precise. Synonyms in a language have slight differences in meaning, having somewhat fuzzy boundaries and idiosyncratic use in a specific community, as opposed to another. Natural language is continuously changing, and words are constantly moving in a cloud of meaning, with complex interaction with similar words, commonly paired terms, cultural impact... Words are coined every year, some of them survive, some don't, many words fall out of use more and more, some of them become obsolete with the advent of new technologies, some have radical shifts in meaning within a short timespan...There's absolutely nothing eternal about a word. Not even concepts are eternal, they are loose attempts at signifying a set of ideas in our experience, and more often than not, people just can't agree with their precise definitions.
Then there is the concept of a lexicon: Mathematicians use a set of words, programmers, biologists, chemists, physicists, musicians, linguists. All of them have words that are shared across disciplines (and indeed influence each other) while other words have totally different meanings across different disciplines.
What I'm trying to get at is, under what criteria would you delimit a word in an encoding? Clearly not by meaning. Morphologically? Words are also change in their spelling, would you use the brittish or the american spelling of a word? Even if you argue there are minimal differences there, what would it be in 100 or 300 or 500 years when different dialects of english start becoming each their own language? I may be looking too far into the future, but you're the one who said "words are eternal."
Which takes me to the other weird claim that language change among the blacks seems to be an exception rather than the rule, and separating a social issue from language, which is itself the main vehicle and reflection of any social issue whatsoever. Not only blacks or some specific community evolves language, every single one does, and just as disciplines develop their own lingo, every community that forms for any reason does.
Furthermore, how would you catalogue the words? How would you make space for the new words that are coined each year, or which are inevitably overlooked by the initial cataloging scheme, without disorderly pushing them at the end of the list? How could you devise sane boundaries and placing of the different words, especially considering homophones and words which can serve as different syntactic elements (both a noun and a verb, or an adjective and an adverb)? I can think of nothing short of a full philogenetic tree. But what about words for which there is no consensus on their origin? What if new information reveals a word has been misplaced in the tree?
I can see a few (contrived) ways that the approach could arguably work in the real world: 1. The dictionary is essentially the same as the Oxford dictionary of the english language, being updated periodically (say each year or 5 years at most) to reflect current usage of the language, 2. The dictionary describes an official version of the english language out of which any entity would make it's own extentions to allocate idiosyncracies of their own use of the language (thereby inhibiting exchange by groups which are set apart by geography or even domain of discourse), 3. The dictionary is but an extension to a character encoding scheme, a sort of library of words which can be inserted in a character stream wherever a word is thus available. 4. The dictionary is made for a language such as lojban where most of these issues are simply non-existant, and which has well-defined rules for making new words and the meaning attached to them.
All things considered, making such a scheme beyond a trivial collection of high-frequency words seems way more complex than every encoding standard put together.
Sorry for the rant, I just couldn't not take the bait.

51


VIP:

do not edit these