Another idea is to have the user poke their tongue out, or wink left and right, or some kind of movement that's easy to do but hard to print.
Probably needs to be done with a video camera, though, otherwise the scammer could show multiple photos one after the other without the computer "seeing" that there is more than just the eyes moving.
And then that movement verification would be hacked by holding up a tablet displaying a moving "photo" of the target doing the required movement.
How about requiring the user to have their phone with them, with Bluetooth active? Although... presumably it's possible to spoof Bluetooth device id number too.
?