Looking at that data, and assuming that for some reason you cannot extract the name column by width, I'd be splitting it into fields by observing that:
- the first column is a number with at least one digit
- the second column is name comma name
- the third column always starts with a ( followed by a digit or an underscore
and so a regex like:
^[ ]*[0-9]+[ ]+([A-Z][^,]+),[ ]{0,2}([A-Z].*?)[ ]+\([0-9_]
should work. Not tested, but:
^ = start of line
[ ]* = optional leading spaces
[0-9]+ = one or more digits
[ ]+ = separation space(s)
( = capture surname
[A-Z] = begins with capital letter
[^,]* = then everything up to the comma
) = end of surname capture
, = the comma between last and first name
[ ]{0,2} = optional? multiple? spaces after comma
( = capture first name
[A-Z] = begins with capital letter
.*? = then everything else up until spaces--bracket-digit-or-underscore (non-greedy, so as to NOT grab the trailing spaces)
) = end of first name capture
[ ]+ = one or more spaces
\( = followed by a bracket
[0-9_] = and then a digit or an underscore
what could possibly go wrong? I guess we'll find out when you log the lines that DON'T match ;-)