This study was carried out to examine the aspects of phonological retention and semantic processing during dictation of Japanese sentences by an intermediate class of Chinese students learning Japanese. In an experiment, learners’ cognitive abilities, working memory (WM) capacity and phonological short-term memory (PSTM) capacity, were set as individual factors. The results showed that: (a) Learners with larger WM capacity perform phonological retention and semantic processing of input speech information almost simultaneously, so that their reproduction based on the semantic representations formed after accessing the semantic representations, while learners with smaller WM capacity immediately reproduced the first information they perceive, so that they rely on the phonological information they have retained, (b) Regardless of the size of PSTM capacity, reproduction starts after access to the perception and morphological representation of the input speech information. but at that time, learners with larger PSTM capacity retain phonology in chunks while paying attention to the connection between words, while learners with smaller PSTM capacity retain phonology in individual word units, (c) Regardless of the size of WM capacity and PSTM capacity, “editing” is done at the reproduction stage through phonology retention and semantic processing, so it was suggested that the longer the unit of phonological and semantic information that can be retained and processed at the listening comprehension stage, the easier it is to understand the meaning of the sentence.