Published on Wed Jul 20 2016

Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection

Ruey-Cheng Chen

We present a novel incremental learning approach for unsupervised word segmentation. This includes super-additive penalties for addressing the cognitive burden imposed by long word formation. We show that this intricate design has led to top-tier performance in both phonemic and orthographic word segments.

0
0
0
Abstract

We present a novel incremental learning approach for unsupervised word segmentation that combines features from probabilistic modeling and model selection. This includes super-additive penalties for addressing the cognitive burden imposed by long word formation, and new model selection criteria based on higher-order generative assumptions. Our approach is fully unsupervised; it relies on a small number of parameters that permits flexible modeling and a mechanism that automatically learns parameters from the data. Through experimentation, we show that this intricate design has led to top-tier performance in both phonemic and orthographic word segmentation.