Android Question Regex htpp:// ...

Sergey_New

Well-Known Member
Licensed User
Longtime User
Tell me, is there a regular expression to determine whether a text is a link in the most general form “http://” plus the characters allowed in the address?
 
Solution
ChatGPT answered:
B4X:
Dim m As Matcher = Regex.Matcher(pat, str)
If m.Find() Then
   Log("URL is valid")
Else
   Log("URL is not valid")
End If

JohnC

Expert
Licensed User
Longtime User
ChatGPT would probably work very well for this...

Here is ChatGPT's reply to your exact question:

B4X:
^http:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(:[0-9]+)?(\/[^\s]*)?$

Here's the breakdown of the regex:
  • ^ asserts the start of the line.
  • http:\/\/ matches the "http://" part of the URL. The backslashes escape the forward slashes which are special characters in regex.
  • [a-zA-Z0-9\-\.]+ matches one or more of the characters allowed in domain names, which include letters, numbers, hyphens, and dots.
  • \.[a-zA-Z]{2,} matches a dot followed by two or more letters, which typically signify a domain's top-level domain (like .com, .org, etc.).
  • ( :[0-9]+)? is an optional group that matches a colon followed by one or more numbers, which would signify a port number.
  • (\/[^\s]*)? is an optional group that matches a forward slash followed by any number of non-whitespace characters, which would constitute the path in a URL.
This regex will match most common URLs that start with "http://" and will not match those that start with "https://". To also include "https://", you can modify the regex slightly:

B4X:
^https?:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(:[0-9]+)?(\/[^\s]*)?$

The s? after http allows for an optional "s", covering both "http" and "https" URLs.

Keep in mind that URLs can be complex and may include a wide range of characters, including those allowed in query strings and fragment identifiers. This regex is a general one and might not cover every valid URL according to the official specifications, but it will work for many common cases.
 
Last edited:
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
B4X:
    Dim pat, str As String
    pat="^https?:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(:[0-9]+)?(\/[^\s]*)?$"
    str="https://www.b4x.com/android/forum/forums/android-questions.26/"
    Dim m As Matcher = Regex.Matcher(pat,str)
How to check the validation of the result in B4A?
 
Last edited:
Upvote 0

JohnC

Expert
Licensed User
Longtime User
ChatGPT answered:
B4X:
Dim m As Matcher = Regex.Matcher(pat, str)
If m.Find() Then
   Log("URL is valid")
Else
   Log("URL is not valid")
End If
 
Upvote 0
Solution

Daestrum

Expert
Licensed User
Longtime User
But why did ChatGPT escape the forward slashes, seems odd.
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
if you're using javascript or perl (and possibly others), the "/" character is the pattern delimiter, so it needs to be escaped. in b4x, we use quotes as the delimiter,
but in, eg, javascript, to match "hello", you create a regex pattern using /hello/

also, note in a regex class, you don't have to escape . or -, as in the pattern shown. in a class, a dot (.) is a dot. you don't need \. and hyphen (-) is valid by itself if position either right after the [ or right before the closing ]. eg, "[a-zA-Z0-9.-]"
(from regular-expressions.info: The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character that is not an x or a hyphen.)
 
Last edited:
Upvote 0

JohnC

Expert
Licensed User
Longtime User
Because "/" is normally not treated as a character by regex, so by adding the "\" it tells regex to treat the following "/" as a character.
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
is there a regular expression to determine whether a text is a link in the most general form “http://” plus the characters allowed in the address
I understand this is already flagged as solved, but why not let Java/Android do the heavy lifting of checking if a URL is compliant?
B4X:
'https://www.baeldung.com/java-validate-url
Public Sub isValidURL(URL As String) As Boolean
	Dim success As Boolean = True
	Try
		Dim joURL As JavaObject
		joURL.InitializeNewInstance("java.net.URL", Array(URL))
		joURL.RunMethod("toURI", Null)		
	Catch
		success = False
		'Log(LastException)
	End Try
	Return success	
End Sub

URL checking is actually one of the HARD things, and regex will not catch every case...

Note: See the post below (#13 - https://www.b4x.com/android/forum/threads/regex-htpp.157380/post-966299) for an explanation of why this is not a universal URL validation routine after all. :-(
 
Last edited:
Upvote 0

OliverA

Expert
Licensed User
Longtime User
Looks like I'm shooting myself in the foot with this one and should have left it alone:
B4X:
Public Sub isValidHttpURL(URL As String) As Boolean
	Dim success As Boolean
	Try
		If URL.ToLowerCase.StartsWith("http") Then
			success = isValidURL(URL)
		End If
	Catch
		Log(LastException)
	End Try
	Return success
End Sub
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
Looks like I'm shooting myself in the foot
If you do as you advise, then it’s better like this:
B4X:
If URL.ToLowerCase.StartsWith("https://") Or URL.ToLowerCase.StartsWith("http://") Then
 
Upvote 0

OliverA

Expert
Licensed User
Longtime User
then it’s better like this:
Yes, that would future-proof it in case java.net.Url supports some weird scheme that starts with HTTP, but is not HTTP or HTTPS. In reality, my solution may not be that good after all, since it would allow for some weird standard's compliant addresses that the OP (you - @Sergey_New ) may not want as valid:

"http://.com"
"http://com"
"http:// "
(see https://stackoverflow.com/a/5965755)

This seems odd, but they are RFC2396 (https://www.ietf.org/rfc/rfc2396.txt) compliant (see comments here https://stackoverflow.com/a/17894617). So in the end, it is recommended to either filter out these odd cases or come up with our own validation routine that performs to your satisfaction level, which in this case would the the accepted regex answer (post # 6 - https://www.b4x.com/android/forum/threads/regex-htpp.157380/post-966136) in this thread.

As another side note, even though the sub above is called isValidURL, it only works for the schemes FILE, FTP, JAR, HTTP, and HTTPS (see https://android.googlesource.com/pl...r/ojluni/src/main/java/java/net/URL.java#1277). For other schemes to be validated, "[y]ou need to implement your own handler and register it through URL.setURLStreamHandlerFactory(). Quite overkill if you just want to validate the URL syntax, a regexp seems to be a simpler solution." (see https://stackoverflow.com/a/4897124)
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…