[tlhIngan Hol] Klingon corpus tool

kechpaja at kechpaja.com kechpaja at kechpaja.com
Fri Dec 18 01:02:34 PST 2020


This sounds like a great idea!

While I'm most definitely not a lawyer, one thing you might consider 
doing to avoid creating copyright issues would be to store only 
individual sentences in the corpus, without the information necessary to 
put them back in the order they were in in the original work. I've heard 
about linguists using that strategy for other corpora, although I have 
no idea if it would actually hold up in court or convince someone not to 
sue you.

  - SapIr

On Thu, Dec 17, 2020 at 10:45:21PM +0000, Iikka Hauhio wrote:
>some time ago I made assembled a corpus containing approximately 240,000 Klingon words. I included publicly available texts and a couple of texts with permission. It includes most of the Okrandian canon. You can search the corpus here: https://klingon-corpus.herokuapp.com/ . It allows limiting search to only some sources and using regexes as search queries. It also has a builtin dictionary that can be used to check meanings of included words. I hope it is useful for both language learners and researchers alike. If any of you here have any suggestions to improve this tool I'd be happy to hear them.
>Also, I would be pleased if any of you donated texts to me to include them in the corpus. To protect the copyright of the authors, I have limited the number of search results to one hundred. This way it is not possible to get an entire text using an empty search query. The purpose of the website is to be a search engine, not a way to download copyrighted material.
>Due to this reason, I will not publicly share the whole corpus, but I'm willing to do analysis on it if someone wants. I have already published frequency lists of words, morphemes and syllables on the web site that I hope can be used for example when crafting beginner's word lists etc.
>Best regards,
>Iikka "fergusq" Hauhio

>tlhIngan-Hol mailing list
>tlhIngan-Hol at lists.kli.org

More information about the tlhIngan-Hol mailing list