How does it work?

You can see the full grammar for this bot (and even play around with it) here!. If you would like to download and adapt it, you can get the source code for the site here.

This website uses Kate Compton's Tracery to generate the sentences. Tracery uses what are called Context Free Grammars / Replacement Grammars to define sentences using both known text and replacement targets

e.g. "my favourite colour is #colour#" will replace #colour# with an element from the #colour# rule (e.g. "colour": ["red","orange","yellow","green","blue","indigo","violet","#colour#y-#color#"]). The text used to replace the target is chosen randomly. Importantly these targets can be nested, as seen in the colour example above. You could get "my favourite colour is red" or "my favourite colour is orangey-green" or if you're very lucky "my favourite colour is bluey-orangey-red". This happens because #colour# has a 1 in 8 chance of picking #colour#y-#colour# as its value, for which each #colour# is replaced again, potentially even choosing #colour#y-#colour# again!

The first and last names were collated using the python pandas library to rearrange datasets of the top 2000 most popular first and last names in the US. A similar technique was used to get the city names (which is a list of all cities in the US with a population of over 10,000). The hardest part was getting the list of words. This is the 10,000 most used sequences of characters found when crawling the web. I had to replace all 1,2,3 and 4 letters sequences because most of them were just meaningless keymashes. These have been replaced with 1,2,3 and 4 letter valid scrabble words. Finally I had to use a dataset of rude words to filter out all the profanity. There was a lot. There may still be some very obscure rude words still in the dataset so let me know if you run into any.

The source codes for both this site and the generator, as well as any blaseballers generated, are CC0. This means they are within the public domain and you may freely reuse them, for profit or no, without attribution (though I'd appreciate it!). This does not extend to the technologies used for either and those licenses may change.

(coming eventually: interactive examples)