We propose a novel system for the automatic detection and recognition of text in traffic signs. Scene structure is used to define search regions within the image, in which traffic sign candidates are then found. Maximally stable extremal regions (MSERs) and hue, saturation, and value color thresholding are used to locate a large number of candidates, which are then reduced by applying constraints based on temporal and structural information.
A recognition stage interprets the text contained within detected candidate regions. Individual text characters are detected as MSERs and are grouped into lines, before being interpreted using optical character recognition (OCR). Recognition accuracy is vastly improved through the temporal fusion of text results across consecutive frames. The method is comparatively evaluated and achieves an overall $F_{rm measure}$ of 0.87.