URL Normalization

URL Normalization


URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs may be equivalent.

For our normalization we will use normalizations that preserve semantics. You should normalize a given url using the next rules (only these rules. They are slightly different from RFC).

Normalization Rules:
  • 1. Converting the scheme and host to lower case.
    HTTP://www.Example.com/ → http://www.example.com/
  • 2. Capitalizing letters in escape sequences. All letters within a percent-encoding triplet (e.g., "%3B") are case-insensitive, and should be capitalized.
    http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b
  • 3. Decoding percent-encoded octets of unreserved characters. For consistency, percent-encoded octets in the ranges of ALPHA (%41–%5A and %61–%7A), DIGIT (%30–%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by Uniform Resource Identifiers (URI) producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.
    http://www.example.com/%7Eusername/ → http://www.example.com/~username/
  • 4. Removing the default port. The default port (port 80 for the “http” scheme) should be removed from a URL.
    http://www.example.com:80/bar.html → http://www.example.com/bar.html
  • 5. Removing dot-segments. The segments “..” and “.” can be removed from a URL according to the algorithm described in RFC 3986 (or a similar algorithm). ".." is a parent directory, "." is the same directory.
    http://www.example.com/a/b/../c/./d.html → http://www.example.com/a/c/d.html

Additional links: If you are interested to know more about URL normalization (This is not necessarily for this task), then you can find more information here: Wikipedia , RFC3986

Input: URL, an unicode string.

Output: Normalized URL, a string.


checkio("Http://Www.Checkio.org") == "http://www.checkio.org"
checkio("http://www.checkio.org/%cc%b1bac") == "http://www.checkio.org/%CC%B1bac"
checkio("http://www.checkio.org/task%5F%31") == "http://www.checkio.org/task_1"
checkio("http://www.checkio.org:80/home/") == "http://www.checkio.org/home/"
checkio("http://www.checkio.org:8080/home/") == "http://www.checkio.org:8080/home/"
checkio("http://www.checkio.org/task/./1/../2/././name") == "http://www.checkio.org/task/2/name"

How it is used: This concept will help you in parsing and analytical processing. URL normalization is required if you need to compare the various URL addresses or you are running a system where letter-casing is sensitive.

Precondition: All input urls are valid.

Invalid hot key. Each hot key should be unique and valid
Hot keys:
CheckiO Extensions

CheckiO Extensions allow you to use local files to solve missions. More info in a blog post.

In order to install CheckiO client you'll need installed Python (version at least 3.8)

Install CheckiO Client first:

pip3 install checkio_client

Configure your tool

checkio --domain=py config --key=

Sync solutions into your local folder

checkio sync

(in beta testing) Launch local server so your browser can use it and sync solution between local file end extension on the fly. (doesn't work for safari)

checkio serv -d

Alternatevly, you can install Chrome extension or FF addon

checkio install-plugin
checkio install-plugin --ff
checkio install-plugin --chromium

Read more here about other functionality that the checkio client provides. Feel free to submit an issue in case of any difficulties.

Pair Programming (Beta-version)

Welcome to Pair Programming! Engage in real-time collaboration on coding projects by starting a session and sharing the provided unique URL with friends or colleagues. This feature is perfect for joint project development, debugging, or learning new skills together. Simply click 'Start Session' to begin your collaborative coding journey!

Waiting for Pair Programming to start...

You are trying to join a pair programming session that has not started yet.

Please wait for the session creator to join.

Waiting for Pair Programming to reconnect...

It looks like the creator of the pair programming session closed the editor window.

It might happen accidentally, so that you can wait for reconnection.

<< <
> >>
exec show

Whats Next?

Free accounts will see Best CheckiO solutions with some delay.
Best Solutions will be opened in a moment
Become Awesome and Don't wait
The next stage is ""
Will be activated in
View More Solutions Random Review Solutions Go to the next mission