What do we mean by OOV and CLIP?
The OOV stands for Out of Vision. The presenter is speaking, but we can’t see him/her because there are pictures of the story on screen.
The CLIP is another word for soundbite. It’s somebody else speaking.
So an OOV and CLIP has three elements: 1. A script, 2: some pictures that cover part of the script and 3. An interview CLIP.

The overall duration should be around forty seconds — a twenty second script and a twenty second soundbite.
Look at this MAIJ OOV and CLIP. First of all, here’s the script:

And here’s what it looks like on screen:
The presenter reads the first sentence (Top Line) of the story and she is IN VIS (In Vision — we can see her). The rest of the script is OOV (Out of Vision — we can’t see her, but we can hear her.) Then we have the CLIP.

You might find this old Zoom session on OOV and CLIPS useful:
So that’s an OOV and CLIP. It’s really quite straightforward.
If you’re still confused, here’s a bulletin from the MAJ archives. Every story is in the OOV and CLIP format.