[tlhIngan Hol] Klingonia corpus

Iikka Hauhio fergusq at protonmail.com
Sat Jan 22 04:05:00 PST 2022


I'm happy that my corpus is useful.

I'm trying to keep it up-to-date and add new sources regularly. It's great that the Klingon community is active and that there are so many Klingon texts produced each year. The corpus has currently almost 500,000 words, which is a lot for a constructed language. The largest Esperanto corpus has about 10 million words, the largest lojban corpus 7 million, so we are not yet there, but at the rate Klingon text is produced we are not far from one million words. (To see how much text is added each year, see this picture: https://korpus.klingonia.fi/timeline.svg)

The Tatoeba corpus contains a lot of typos, which is understandable given how big it is. If you find a typo, click the link to go to the sentence's page on tatoeba.org and correct the sentence (or comment if you don't have rights to edit sentences).

As a general warning, many sources of my corpus contain errors. Some of the texts are very old, some are poetic and others just have typos or grammatical errors. The corpus is useful for scientific study of Klingon usage and could also be used as an educational tool when learning Klingon, but it's important to know that it doesn't try to be a corpus of good Klingon, it tries to be a corpus of all Klingon. There are errors. I have tried to organize the sources so that the one's with most errors are shown with red color.

Iikka "fergusq" Hauhio
https://klingonia.fi/en

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, January 22nd, 2022 at 01.42, James Landau <savegraduation at yahoo.com> wrote:

> I just found a link to Iikka Hauhio's Klingonia Corpus from the Klingon Wiki. For those who haven't seen it before, it's up at: https://korpus.klingonia.fi/
>
> It's great to see *HarqIn* (which was my request) getting use. (I've also found *HarqIn* in a Klingon blog by googling, and I saw 'enru mentioning the word in the comments section for a chabal tetlh request lately.) I didn't get any hits for *DannI'* nor for *rosmaH*, though.
>
> Very cool to see mayqel's texts on the Greek gods in the corpus! I've come across them by chance when doing Google searches before, so obviously they're from the webpages subcorpus.
>
> I think I may have found a mistake in the Tatoeba sentences, though. When I did a search on *loDHom*, I found this sentence. "Both boys have autism" is translated as *ngor cha' loDHompu'vam*. Shouldn't *ngor* be *ngur*?
>
> majQa', Iikka!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kli.org/pipermail/tlhingan-hol-kli.org/attachments/20220122/76fa74c3/attachment-0002.htm>


More information about the tlhIngan-Hol mailing list