We investigated the effects of verbal and visuo-spatial working memory (WM) capacities on spatial mental representation in second language listening by manipulating the length of the materials. In the experiment, an advanced class of Chinese learners learning Japanese were separated into four groups according to their verbal and visuo-spatial WM capacities. The learners were required to react to a judgement task in which they must judge whether the picture shown on the computer matches the sentences they had listened to previously. Two or four sentences were used in each trial. The correct rate and reaction time were used as dependent variables. As a result, only the main effect of the length of the materials was observed, which suggests that for Japanese learners, constructing a spatial mental representation during listening becomes more difficult with the increased length of the sentences. The results of correct rate and reaction time did not show significant differences among learners with a large capacity for verbal and visuo-spatial WM or those with small capacity. A possible reason for this result is that the experimental materials were not very difficult for advanced learners; therefore, a certain amount of processing resources during task execution was distributed efficiently and appropriately between verbal and visuo-spatial WM.