Translate into simple English:

In this study, we performed a quantitative compar- ison between a corpus of written isiZulu and a cor- pus of spontaneous spoken isiZulu. The compar- ison was mainly done on morphological analyses of the corpora obtained via a finite-state morpho- logical analyser, and the methodology followed al- lowed for estimates of relative occurrences of mor- pheme tags in the corpora. The morpheme tags were chosen to represent or relate to features that are known to differ between written and spoken En- glish. Broadly speaking, it was found that isiZulu ex- hibits many of the differences in its spoken and writ- ten modalities that languages such as English (and
Dutch) exhibit. Our results also provide a quan- titative characterisation of these differences, which could inform the development of voice-enabled applications for isiZulu in a resource scarce con- text.
One aspect of the resource scarcity of isiZulu is the available tools for analysing corpora. While the Zul- Morph analyser was able to provide reliable mor- phological analyses of tokens in the corpora, no dis- ambiguation tool currently exists, and this had a sig- nificant impact on the methodology and the kinds of conclusions that could be drawn, namely that we had to express the differences between the corpora in relative rather than absolute terms. Additionally, as evidenced by the results obtained by the approx- imation of gerunds in English by infinitive nouns in isiZulu, a purely morphological approach is not sufficient to investigate some grammatical features, and hence a syntactically informed tool, such as a parser, would enable more complete and accurate results.
Currently, however, morphological analysers exist for some of the other Nguni languages, includ- ing isiXhosa (Pretorius & Bosch 2009), as well as Setswana (Pretorius et al. 2005), both of which are also included in the multilingual soap opera corpus, and so similar morphology-based inves- tigations could also be performed for these lan- guages.
Another possibility would be to investigate social media text in isiZulu, in order to compare it with both the written corpus and the spontaneous spo- ken corpus used in this work. In his doctoral thesis, Wikstro ̈m(2017)investigates“talk-liketweeting”in English as part of a study of “linguistic and metalin- guistic practices in everyday Twitter discourse in re- lation to aspects of speech and writing”. A com- parison of social media text to corpora that repre- sent the speech and writing modalities of in a more traditional way, could shed light on the extent to which social media text corpora could provide use- ful data for language modelling in voice-enabled ap- plications for the resource scarce languages of South Africa.

In this study, we compared written isiZulu with spoken isiZulu. We looked at the structure of the language using a tool that breaks down words into parts. We found that like English and Dutch, isiZulu has differences between how it's written and spoken. Our results show these differences in detail, which could help create better technology for isiZulu in places where resources are limited.

One challenge with isiZulu is the lack of tools to analyze text. While we could analyze the structure of words accurately, we couldn't always tell the exact meaning, affecting our conclusions. To get better results, we may need tools that understand sentence structure, like a parser.

There are similar tools for other Nguni languages, such as isiXhosa and Setswana, which could be used for further research. We could also look into how people use isiZulu on social media, comparing it to written and spoken language. This could help us understand how social media can be used for language technology in countries like South Africa, where resources are limited.