Please translate to simple English:

Spoken and written language
One of the prominent themes in studies of dif- ferences between spoken and written language has been “disentangling the numerous factors that codetermine differences between spoken and writ-
1

ten language” (Redeker 1984), of which the most important are “the amount of planning, the con- ventionally expected level of formality in the situ- ation, the nature and size of the audience, and the subject matter”. In order to study specific differ- ences, researchers have often opted to control for these codetermining factors in various ways: for ex- ample, a study of lexical differences in Dutch by Drieman (1962) was based on the assumption that topic, participants and the circumstances of obtain- ing data from participants should not vary (Akin- naso 1982), while Redeker (1984) studied the differ- ences in degree of involvement/detachment as well as fragmentation/integration by keeping planned- ness, formality and audience constant.
In this work, our aim is not to study features that differ between written and spoken isiZulu in a gen- eral way, but to understand the nature of the differ- ences between the kind of language data for isiZulu that is readily available (namely written corpora) and the kind of isiZulu that voice-enabled applica- tions would be expected to model. This reduces the need to control for various codetermining factors, since the goal of the work is not primarily a linguis- tic or discourse analytic result, but a characterisa- tion of required resources in relation to available re- sources.
What language modelling resources would be ideal for the development of voice-enabled applications for isiZulu? To answer this, we need to understand typical use cases for such applications.
While it is almost impossible to predict the ways in which technology may be applied to improve the lives of people, a useful starting point is to consider where written and spoken language are typically used. As Akinnaso (1982) notes, the two modal- ities are often found in “complementary distribu- tion” in society: “natural conversations are always carried out in spoken language, whereas, in mod- ern industrial societies, speech is inappropriate for much bureaucratic communication such as apply- ing for a job, requesting social services, filling out tax and credit application forms, and so on.” From this description it is clear that the “modern industrial so-
cieties” in view are assumed to have high levels of lit- eracy in the language in question. In South Africa, however, literacy rates are low and home language literacy rates even more so (Posel 2011), which seems to indicate that spoken isiZulu is used beyond the “natural conversations” mentioned by Akinnaso. Presumably, therefore, voice-enabled applications for isiZulu could prove useful in a larger variety of domains than might be the case for the languages of societies with high levels of literacy. This conclusion does not point to the requirement of a very specific kind of spoken language modelling resource, and therefore, presumably, any data comprising sponta- neous spoken isiZulu, and perhaps especially spo- ken dialogue, would be suitable.

The theme of spoken and written language is important in studying their differences. Factors like planning, formality, audience, and subject matter affect these differences. Researchers study these factors to understand specific differences in language. In this work, we focus on the differences between written isiZulu and the language needed for voice-enabled applications. We need to consider how language is used in different situations to develop these applications. In societies with high literacy rates, written language is used for formal purposes, while spoken language is used in natural conversations. But in South Africa, where literacy rates are low, spoken isiZulu is used in various situations. Therefore, voice-enabled applications for isiZulu could be useful in different domains. Any data comprising spoken isiZulu, especially spoken dialogue, would be suitable for developing these applications.