The purpose of this research is to investigate how many test items and testers are required to maintain reliability in DLA (Dialogic Language Assessment), utilizing Generalizability Theory. DLA is a language testing method that aims to observe language ability based upon three different criteria: basic conversation, dialogue and cognition. Generalizability Theory is a statistical framework composed of two different studies: G study and D study. The G study validates and predicts the number of testers and pupils, and the number of test items necessary to maintain reliability, estimating the variable factors and the variance components. Based upon the data gained from the G study, the D study provides crucial information to obtain insights to design an effective test. The results indicate that a) if the number of the testers is less than four, it would be difficult to maintain reliability even though the number of test items is large, and b) factors that cannot be explained by the number of testers, pupils, and test items are found. The results also show that given the situation seen in schools where teachers are driven by school duties, employing twelve test items administered by five testers is a realistic solution.