Url encoder/decoder

URL Encoding/Decoding Basics At A Glance

URL encoding, which is sometimes referred to as percent-encoding is an information encoding mechanism mainly used when submitting data in HTML format. In simpler terms, URL encoding is all about making sure that information is encoded in a Uniform Resource Identifier. It includes both Uniform Resource Name and Uniform Resource Locater.

Types of URI Characters

URI characters fall under two main categories. There are reserved and unreserved URI characters. Reserved characters often have the special meaning attached to them. A good example of a reserved character is the forward slash character which is used to separate different parts of the URI or URL. Unreserved characters, on the other hand, have no meanings. They are simply represented by special character sequences.

Note that URL encoding is dynamic. Rules change from time to time. That is in fact why the sets of reserved and unreserved hardly ever remain the same. There is also the fact that certain reserved characters often get revised from time to time. New meanings are then attached to them.

Percent Encoding Reserved Characters

The concept behind percent reserved encoding is simple. When a character under a reserved set has a special meaning attached to it in a given context, and another URI scheme demands that the character must be used for some other purpose, then the character has to be percent-encoded. The process involves converting a character to its actual and corresponding byte value. The value must then be represented in hexadecimal digits which are preceded by a percent sign.

The / or slash character as it is commonly known is another good example of a reserved character with a special meaning. When used in a path component of a URI, it acts as a delimiter between URI path segments. That is not everything, though. If a specific URI scheme demands that the / should be in a path segment, a new rule comes into the picture. Three characters %2F will be used in the segment instead of a direct /.

Characters before and after encoding/decoding

Percent Encoding Unreserved Characters

This is simple because characters from unreserved sets do not need URL encoding. This is, however, not the main difference between encoding reserved and unreserved characters. What stands out is the fact URI processors can sometimes distinguish between reserved and unreserved characters.

Url encoding example

Current URI Encoding Standards

URL encoding standards change from time to time as already hinted. There is a good reason why – the generic URI syntax demands that new URI schemes must represent unreserved characters without translating them. Reserved characters, on the other hand, should at all times be translated to bytes.

Url decoding example

Then there are nonstandard implementations for encoding Unicode characters like %XXX.XXX, in this case, is a UTF-16 code which is represented after encoding by four hexadecimal digits. It is important to note that W3C has rejected this format. Formats change from time to time as already hinted. A few things remain constant. Your best bet is to therefore always keep yourself updated with regard to what has changed and why and the rules behind each URL encoding mechanism.