This diagram represents a Markov chain with two possible states/events: Sunny and Rainy.
And in each state, it doesn’t matter how many times it has been Sunny or Rainy before: a Markov chain has no memory, so no transition will make the probabilities change.
With the last example you can start to imagine that a lot of real world phenomena, like weather and speech, are not Markovian, because the weather of the past does influence the current, and what’s going to be said depends a lot on what’s been said earlier. For this reason, don’t expect to program the new Cervantes with a Markov chain.
However, this is an interesting starting point and, also, an approach with multiple use cases, like:
We can model words as events in a Markov process.
Possible strings generated by this model are:
In the last example, we only took into account a single word per state, so we say that the chain is of first-order. For a N-th order chain, each state represents the last N words. But doesn’t it break the Markov principle? Not quite, because we are still using only the present state to decide the next one, only that we define present in a broader sense, as if instead of using “now” to refer to the current hour, you refered to the current day.
This concept will play a very important role in the training process (when we build the chain from existing text). We will see this in more detail when we talk about the implementation.
Markov chains have a wide variety of uses. With a very simple model, easy to build and easy to use, we can simulate rather complex behaviors. Now that we’ve covered the theory basics, I will explain it’s implementation step by step in different programming languages.
Let me know your thoughts in the comments!