Afonso Xavier Canosa Rodriguez

On philology, potatoes and construction.
Well, this is just my first approach to blog-writing. I want it to be the way to keep in touch with colleagues and friends.

info at canosarodriguez dotnet

Is human language a system made of an infinite number of expressions? (and II)
Allowing all possible combinations for the whole English vocabulary, using one of the longest sentences ever registered in this language, was initially solved to infinity. As these were the expected results, being urged to write my presentation, I sent a draft to Miro Moman who kindly and quickly pointed out that my last operation was rather 1.63585694x1023086. This is a very huge number indeed (compare with the upper bound of the physical universe, 1x10113 m3), though still finite. All right again, who cares about a number which is bigger than the volume of the physical universe? Isn't that infinite enough?

Well, it is all right not to care that much. As I told you when this blog started six years ago, when we were only 6 billion people in this planet, in this section I deal with subjects that could be of interest for four or five people in the whole of humanity at present time. We are more than 7 billion now, the number could have slightly increased... though only by one unit the most if we keep the proportion we had six years ago.

Yet, the issue is, 1.63585694x1023086 stands for the result of applying maxima that over-represent the combinatory potential of any human language. That is, a more approximate value will always be smaller if we only apply syntactical rules! An easy example, this unrestricted combinatory would allow the same word to be repeated up to 4391 times and yet would consider the resulting string a sentence.

Let's go small to try to understand better. Let's take the first branch of the Mabinogi from our corpus: 1605 word-types, very small lexicon, the longest sentence has 64 words (this well represents a high value for sentence length). If we allow all possible combinations, that is, the same word to appear in any position of the sentence, we get 160564 = 4.5x10189, still bigger than the volume of the universe! However, you will soon notice that with such a small lexicon the number of grammatical sentences (not to tell you if we add semantics) must be finite and for sure much smaller!

You can tell me, what about recursion, and adding a loop that infinitely embeds a sentence within a sentence (using a complementizer in English, for instance)? ... all right, go on, you can move towards infinity as much as you want, and indeed create the longest sentence ever... though only when you reach an end (a sentence has to be complete to be a sentence) you will have a sentence unit, a single one. More important, at that point, even if you got a result bigger than 1.63585694x1023086, it would be finite again.

So, sentences are more similar to the lexicon than I previously thought. As far as I understand, the set of sentences in any human language is finite, boundless as the vocabulary of a language, and very huge, though much smaller than the maxima given above. Following the more manageable example for the Mabinogi corpus, using it as a rough extrapolation, it would be a matter of adding rules to come down to our solar system and begin to get a number at least smaller than the volume of our physical universe.
Subject: philology - Published 05-05-2014 15:45
Permanent link to this article
© by Abertal

Warning: Unknown: Your script possibly relies on a session side-effect which existed until PHP 4.2.3. Please be advised that the session extension does not consider global variables as a source of data, unless register_globals is enabled. You can disable this functionality and this warning by setting session.bug_compat_42 or session.bug_compat_warn to off, respectively in Unknown on line 0