Hi to everybody here
I am using TextReader with files of which I don't know the encoding. As a matter of fact, it may be either "Windows-1252" or "UTF-8". The problem arises when these files contain special characters. Despite, in theory, the encoding is declared in the file itself, it is not respected, in practice. What I see is that some files are successfully read with TextReader when one encoding is used, while others, in same conditions, are not read correctly. In other words, while all files are normally declared as "Windows-1252", very often to read them correctly it is necessary to initialize the TextReader with UTF8. Besides the explicit (often false) declaration inside the file, I tried also an analysis on the very first characters of the file, with no success. The only way to have a sure answer is to read the raw bytes to discover when and if the special characters appear. This creates new problems, because the files are often huge (over one Gb) and cannot be entirely loaded in memory with File.ReadBytes. Alternatively maybe I must read byte by byte. I use TextReader because, due to file size, I read them line by line. Moreover the structure of these files is such to require a line by line processing. Is there any way to read raw bytes from a Text file, line by line, not using TextReader? thank in advance for any suggestion.
I am using TextReader with files of which I don't know the encoding. As a matter of fact, it may be either "Windows-1252" or "UTF-8". The problem arises when these files contain special characters. Despite, in theory, the encoding is declared in the file itself, it is not respected, in practice. What I see is that some files are successfully read with TextReader when one encoding is used, while others, in same conditions, are not read correctly. In other words, while all files are normally declared as "Windows-1252", very often to read them correctly it is necessary to initialize the TextReader with UTF8. Besides the explicit (often false) declaration inside the file, I tried also an analysis on the very first characters of the file, with no success. The only way to have a sure answer is to read the raw bytes to discover when and if the special characters appear. This creates new problems, because the files are often huge (over one Gb) and cannot be entirely loaded in memory with File.ReadBytes. Alternatively maybe I must read byte by byte. I use TextReader because, due to file size, I read them line by line. Moreover the structure of these files is such to require a line by line processing. Is there any way to read raw bytes from a Text file, line by line, not using TextReader? thank in advance for any suggestion.