Deep learning models can obtain state-of-the-art performance across many speech and imageprocessing tasks, often significantly outperforming earlier methods. In this paper, we attempt to further improve the performance of these models by introducing multi-task training, in which a combined deep learning model is trained for two inter-related tasks. We show that by introducing a secondary task (such as shape identification in the object classification task) we are able to significantly improve the performance of the main task for which the model is trained. Using public datasets we evaluated our approach on two image understanding tasks, image segmentation and object classification.
On theimage segmentation task, we observed that the multi-task model almost doubled the accuracy of segmentation at the pixel-level (from 18.7% to 35.6%) compared to the single task model, and improved the performance of face-detection by 10.2% (from 70.1% to 80.3%). For the object classification task, we observed a 2.1% improvement in classification accuracy (from 91.6% to 93.7%) compared to a single-task model. The proposed multi-task models obtained significantly higher accuracies than previously published results on these datasets, obtaining 22.0% and 6.2% higher accuracies on the face-detection and object classification tasks respectively. These results demonstrate the effectiveness of multi-task training of deep learning models for image understanding tasks.