B4J Code Snippet [Class] [B4X] wmHtml - remove html tags and replace html entities with their corresponding characters

See title. The html entities' data were obtained from https://symbl.cc/en/html-entities/ and some additional ones were added. The attached demo project contains the class; there are no library dependencies. Example code from the project:
B4X:
    Private wmHtml1 As wmHtml
    wmHtml1.Initialize

    ' Get the wmHtmlCode type variable via the passed value (case-sensitive).
    ' If the value is not found, all string members of the variable are set to "".
    Log("GetByCSScode '\0028': " & wmHtml1.GetByCSScode("\0028"))
    'Logged: GetByCSScode '\0028': [IsInitialized=true, id=76, symbol=(, htmlCode=(, CSScode=\0028, unicode=U+0028, entity=(, name=Left Parenthesis, group=Punctuation Symbols]
    Log("GetByEntity '(': " & wmHtml1.GetByEntity("("))
    Log("GetByName 'Left Parenthesis': " & wmHtml1.GetByName("Left Parenthesis"))
    Log("GetBySymbol '(': " & wmHtml1.GetBySymbol("("))
    Log("GetByUnicode 'U+0028': " & wmHtml1.GetByUnicode("U+0028"))

    ' Get the symbol (character) via the passed value (case-sensitive).
    ' If the value is not found, the default value is returned.
    Log("GetSymbolByCSScode '\0028': " & wmHtml1.GetSymbolByCSScode("\0028", ""))
    'Logged: GetSymbolByCSScode '\0028': (
    Log("GetSymbolByEntity '(': " & wmHtml1.GetSymbolByEntity("(", ""))
    Log("GetSymbolByName 'Left Parenthesis': " & wmHtml1.GetSymbolByName("Left Parenthesis", ""))
    Log("GetSymbolByUnicode 'U+0028': " & wmHtml1.GetSymbolByUnicode("U+0028", ""))

    ' Decode an html string and replace the found encoded entities with their symbols (characters).
    Log("DecodeHtml: " & wmHtml1.DecodeHtml("This text’s html has been decoded. "Quote" has been enclosed in quotes."))
    'Logged: DecodeHtml: This text’s html has been decoded. "Quote" has been enclosed in quotes.

    ' Remove all html tags from the given string and return the result.
    Log("RemoveHtmlTags: " & wmHtml1.RemoveHtmlTags("<A>No tags remain <some html stuff here> after this<br></A>"))
    'Logged: RemoveHtmlTags: No tags remain  after this
 

Attachments

  • wmHtmlDemo.zip
    19.5 KB · Views: 73

peacemaker

Expert
Licensed User
Longtime User
v.0.2 is attached, it's OK in Debug-mode now. Include into the project, if needs.
 

Attachments

  • wmHtml.bas
    88 KB · Views: 69
Top