5.1 Tokenizing

The first thing we’ll need to do is split each narrative into words so we can keep track of which words appear in which situations. Instead of "SUSP SWUNG UMBRELLA WITH METAL TIP AT V2" we’ll make a list of the words it contains:

  • SUSP
  • SWUNG
  • UMBRELLA
  • WITH
  • METAL
  • TIP
  • AT
  • V2

This process is called tokenization.