When a text is presented aurally how are we accessing it?

Options are
We are reading it
We are viewing it
We are hearing it
We are watching it

We are hearing it