To my surprise, I realized that the ffmpeg specific build for OpenCV (at least in 3.4.4, perhaps it changed later) purposely outputs only 3-channels frames on VideoCapture, even if the source is 4-channels. That's not an ffmpeg limitation, but a build decision from the OpenCV guys.
I can see 2 options here:
1. If the source video images (as in the case you sent me) has no zones of black solid color, use this as an inverse mask, and overlay it on top of the other.
(Using background and foreground videos you sent me, result is)
2. For a more general use-case (i.e., the foreground video can have black solid colors), I'd use command-line ffmpeg to compose them. Perhaps
@moster67 could help in this case.