Labels

Monday, May 30, 2016

[Java] Use Markov chain to generate text

Generate text based on a Markov chain

Can we write a method to generate a text based on the texts that we have already seen ? The answer is yes, in fact, we can use a Markov chain to generate text based on texts that we used to train the Markov chain. In mathematique, we use Markov chain to describe a random process on a state space and it has a famous property that the probability distribution of transition to the next state depends only on the current state and not on the sequence of events that before it, which is called "memorylessness".
So we can split our task in two steps, training and generating:
  • In training step, we use texts to build a Markov chain
  • In generating step, we start from the first state of the Markov chain and transist to next state following the probability distribution of transition in each state untill the final state.
For the aim of simplicity, the interface of MarkovTextGenerator is given as below
public interface MarkovTextGenerator {

    /** Train the generator by adding the sourceText */
    public void train(String sourceText);

    /** Generate the text with the specified number of words */
    public String generateText(int numWords);

    /** Retrain the generator from scratch on the source text */
    public void retrain(String sourceText);
}
we can see that, training and generating steps correspond to method train and generateText separately, besides, retrain is used to retraining the existing Markov chain.

An implementation of MarkovTextGenerator

My implementation of the interface MarkovTextGenerator during the project can be found here
Besides, we have several test cases in the main method of MarkovTextGeneratorLoL class which gives us an intuitive way to know what exactly the Markov chain is playing in text generating step.

No comments:

Post a Comment