URL validation is deceptively complex because the URL specification (RFC 3986) is more permissive than most developers expect. The right validation approach depends on your use case — a form input has different requirements than an API endpoint.
A Practical HTTP/HTTPS Pattern
`^https?://[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*(?::\d{1,5})?(?:/[^\s]*)?$` validates scheme, domain labels with DNS rules, optional port, and optional path.
URL Detection in Text
For detecting URLs in free text, `https?://\S+` catches virtually every URL with minimal complexity. Refine with `https?://[^\s<>"\)]+` to exclude common trailing punctuation.
Edge Cases
Handle IP address URLs (`http://192.168.1.1/path`), internationalized domain names (Punycode `xn--` prefixes), query strings, and fragments. Each adds complexity, so only handle what your application actually needs.
Security Considerations
Always whitelist accepted schemes rather than blacklisting dangerous ones. `javascript:` URLs in user-generated content execute code when clicked. Restrict to `http` and `https` to eliminate an entire class of injection attacks.
Frontend vs. Backend
Accept user input permissively on the frontend, normalize it (auto-add `https://`), validate the normalized form, then optionally verify domain resolution on the backend.
Testing URL Patterns
RegExpress provides instant feedback as you test URL patterns against various examples, helping you find the right balance between strictness and flexibility.