Ok, some of my results using your code and the file you provided:
Win 10 via Parallels on a i7 Mac Mini with 16GB RAM and SSD drive:
First in B4J in RELEASE mode:
Run#1: 4855, 30611, 31047 -> Batch execution time (T2-T1): 25756
Run#2: 4661, 29557, 29776 -> 24896
Run#3: 4408, 28124, 28541 -> 23716
From command line on same machine:
Run#1: 52195, 69451, 69868 -> 17256
Run#2: 45068, 62978, 63394 -> 17910
Run#3: 47548, 64705, 65107 -> 17157
Ok, first, the terminal in an emulation sucks (that's why the first time is so much longer from the command line than B4J). Yet notice, the difference between between the second time and the first time (the ExecNonQueryBatch) is from 6 to 8 seconds faster when the program is executed from the command line then from B4J itself. Keep that in mind when bench marking.
Ok, the same program ran from the command line on an i3, regular hard drive, 6GB memory, Windows 10:
Run#1: 10767, 22726, 23049
Run#2: 10814, 22650, 22911
Run#3: 10601, 22564, 22903
Now we are sub 12 seconds (virtualization hurts me here).
For fun, I've created a non-ui version of the app (attached) and ran it on the i3-3320:
Run#1: 1782, 14083, 14208
Run#2: 1609, 13466, 16045
Run#3: 1558, 16045, 16154
A tad over 12 seconds.
Now for a change, I have a little Celeron 1037U, 4GB RAM, Ubuntu 16.04 LTS and an SSD and it gets (using non-ui app):
Run#1: 1577, 10868, 11056
Run#2: 1472, 10830, 11020
Run#3: 1548, 11343, 11533
Yeah, sub 10 seconds!
Some notes: DEBUG mode would not work for me. So all results are release mode code. There is a difference even in release mode when run from within B4J or directly from the command line. Virtualization will hurt your bench marks. SSD will also bring down your time, more so than CPU, since, in my case above, the i3 is almost twice as fast as the Celeron.
Question: What is your D: drive? A local drive? A external USB drive? A network drive? This could also make a difference.