Android Question Decoding quoted-printable html

William Hunter

Active Member
Licensed User
Longtime User
I normally have success decoding quoted-printable using the sub below. I have tried to display Last Week's Most Popular Topics in a WebView, which I have done in previous weeks without a problem. This time I cannot display the decoded html properly. I am attaching the unprocessed rawmail for this weeks topic. I expect that something has changed in the way it has been encoded. I cannot decode it correctly, in the way I have done previously, and hope someone can tell my why, and also how I might change my method of decoding quoted-printable. Any help greatly appreciated.

Best regards :)
B4X:
Sub DecodeQuotePrintable(q As String) As String
    Dim m As Matcher
    m = Regex.Matcher("=\?([^?]*)\?Q\?(.*)\?=$", q)
    If m.Find Then
        Dim charset As String
        Dim data As String
        charset = m.Group(1)
        data = m.Group(2)
        Dim bytes As List
        bytes.Initialize
        Dim i As Int
        Do While i < data.Length
            Dim c As String
            c = data.CharAt(i)
            If c = "_" Then
                bytes.AddAll(" ".GetBytes(charset))
            Else If c = "=" Then
                Dim hex As String
                hex = data.CharAt(i + 1) & data.CharAt(i + 2)
                i = i + 2
                bytes.Add(Bit.ParseInt(hex, 16))
            Else
                bytes.AddAll(c.GetBytes(charset))
            End If
            i = i + 1
        Loop
        Dim b(bytes.Size) As Byte
        For i = 0 To bytes.Size - 1
            b(i) = bytes.Get(i)
        Next
        Return BytesToString(b, 0, b.Length, charset)
    Else
        Return q
    End If
End Sub
 

Attachments

  • RawMailB4A.txt
    15.3 KB · Views: 433

William Hunter

Active Member
Licensed User
Longtime User
This regex will only catch single line quote-printable text. Such as in the subject field.
Thank you for your reply. This has me at a bit of a loss. This AM I received seven messages in quoted-printable/html. The sub in post #1 decoded them flawlessly, and each was displayed correctly in a WebView. The message B4A sent me, in connection with post #2, is in quoted-printable/html and is one of the seven that decoded and displayed correctly. I have attached the unprocessed message, and an image showing how it was displayed.

I should be able to decode and display the message attached in post #1. Up until this last issue, I have been able to decode and display Last Week's Most Popular Topics using this same sub. So, I think that the encoding of the current issue of Last Week's Most Popular Topics has changed in some way that I cannot see. This is the reason for my question - "what has changed in the encoding?".

Best regards
Basic4Android.png
 

Attachments

  • RawMailB4A2.txt
    10.5 KB · Views: 354
Upvote 0

William Hunter

Active Member
Licensed User
Longtime User
Nothing in the forum software or configuration was changed.
Thank you Erel. I’ve been working on the parsing of email for quite some time. I can correctly decode/extract the majority of emails received, but not all. The two subs I have tried for decoding quoted-printable were found on the forum. Exhibit A, while intended to decode single line text as in a header, is far more effective at decoding message body than Exhibit B, which is intended for that purpose. Neither of these subs will decode all message body quoted-printable.

I have done a lot of searching on the Internet looking for a sample of an all embracing mail parser. While I have found examples of Mime parsers in Java, there is nothing beyond that. The only common theme I have found in posings by others, is that the parsing of email is a nightmare. I think I’m at the point of agreeing, and will have to give up my quest. Maybe one day I’ll have an epiphany, or one of our resident experts will create a fully functional mailparser. In any event, thank you for your help.

Regards
B4X:
' Exhibit A - https://www.b4x.com/android/forum/threads/using-pop3-to-communicate-with-android-devices.11310/
Sub DecodeQuotePrintable(q As String) As String
    Dim m As Matcher
    m = Regex.Matcher("=\?([^?]*)\?Q\?(.*)\?=$", q)
    If m.Find Then
        Dim charset As String
        Dim data As String
        charset = m.Group(1)
        data = m.Group(2)
        Dim bytes As List
        bytes.Initialize
        Dim i As Int
        Do While i < data.Length
            Dim c As String
            c = data.CharAt(i)
            If c = "_" Then
                bytes.AddAll(" ".GetBytes(charset))
            Else If c = "=" Then
                Dim hex As String
                hex = data.CharAt(i + 1) & data.CharAt(i + 2)
                i = i + 2
                bytes.Add(Bit.ParseInt(hex, 16))
            Else
                bytes.AddAll(c.GetBytes(charset))
            End If
            i = i + 1
        Loop
        Dim b(bytes.Size) As Byte
        For i = 0 To bytes.Size - 1
            b(i) = bytes.Get(i)
        Next
        Return BytesToString(b, 0, b.Length, charset)
    Else
        Return q
    End If
End Sub
B4X:
' Exhibit B - https://www.b4x.com/android/forum/threads/using-pop3-and-mailparser.66978/#post-424042
Sub DecodeQuotePrintable(q As String) As String
    Dim bytes As List
    bytes.Initialize
    Dim i As Int
    Do While i < q.Length
        Dim c As String
        c = q.CharAt(i)
        If c = "_" Then
            bytes.AddAll(" ".GetBytes("utf8"))
        Else If c = "=" And i < q.Length - 1 Then
            Dim hex As String
            hex = q.CharAt(i + 1) & q.CharAt(i + 2)
            i = i + 2
            Try
                bytes.Add(Bit.ParseInt(hex, 16))
            Catch
                bytes.AddAll(hex.GetBytes("utf8"))
            End Try
        Else
            bytes.AddAll(c.GetBytes("utf8"))
        End If
        i = i + 1
    Loop
    Dim b(bytes.Size) As Byte
    For i = 0 To bytes.Size - 1
        b(i) = bytes.Get(i)
    Next
    Return BytesToString(b, 0, b.Length, "utf8")
End Sub
 
Last edited:
Upvote 0
Top