Corpus Tools: Way To Write : Useful Things:

Novelties: Glossers : Community Resources:

This tests the idea that if you randomly generate words with something close to a languages natural distribution of sounds, you will generate cuss words most frequently. My non-professional theory is that cuss words are especially representative of the phonotactics of a language. This has special application for toki pona because with only 125 or so words, the cuss words can be discovered long before anyone feels the urge to coin them.

If I run this with really large number of repetitions, "nena" consistently comes out tops, which isn't the official cuss word "pakala". But it is still used in "nena meli sinpin". It isn't surprising "nena" is most common because those letters have the highest odds for their positions. Someday I need to rewrite this with probabilities based on the previous letter, but at the moment, I'm too lazy to calculate out the transition matrix. With a transition matrix, it is much harder to known in advance what the most likely randomly generated cussword is. Or maybe not, math is the forte of the tokiponist.

This is a fan site. The creator of toki pona is jan Sonja, which isn't me. All Content create by me is Creative Commons, by Attribution. Feel free to make derivatives to the extent you can or want to. The toki pona corpus texts come from a variety of locations and I believe its usage is acceptable, noncommercial fair use. If you don't think so, email me and tell me what document is yours and I will remove it.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.