Firstly, this paper classifies utterances in a social activity based on how they contribute to the progression of the activity into two kinds: utterances built in or in parallel with an activity. Necessity and frequency of utterances in an activity vary not only depending on its goal or framework but also every phase of its progression. Besides, some of the utterances seem to be decoupled with bodily behaviors conducted simultaneously, such as gestures, gazes, manipulation of tools, etc., to complete a requisite action. This paper aims to demonstrate how the above-mentioned utterances are used differently, shift each other, and vary in relations with other means through a multimodal analysis of interactions in two Japanese social activities—child-oriented Karate training and hairstyling in a barber shop—and discuss about a cognitive distance between utterances and other environmental elements coupled with a current activity.