VirusTotal - URI parsing errors

During my work, i stumbled across a phishing mail. Clearly malicious, bad grammar, weird link etc.

The link was to a PDF document hosted somewhere, with a link that then pointed to a URI that looked a little strange. More specifically it had html encoded chars in the URI, but not ones you would normally see.
hxxp://maliciousdomain.com/@$%23$%5e%25/
I put it into VT to see if it would detect something, but it was returned as a clean site.

However entering the URI in Urlscan.io returned a screenshot of what was clearly a Onedrive phishing page, and it was flagged as such. Strange difference...

Upon inspecting the virus total scan details, I noticed that VT got a 404 when trying to scan the site, hence the "clean" result. VT was scanning a different URI than Urlscan.io, confirmed by looking at the VT details again. The URI had become
hxxp://maliciousdomain.com/@$/
which is actually as expected! Because the "%23" after the "$" decodes to a hashtag, also known as the Fragment identifier.


So VT parses it correctly right? Nope. VT should parse URIs as the browser, as it is most often used to scan things visited by a user. The browser does not resolve the URL encoding here, and the "/@$%23$%5e%25/" is interpreted literally as a path instead!

According to the RFC 3986, "The path is terminated by the first question mark ("?") or number sign ("#") character, or by the end of the URI."

This is further clarified in section 2.4, "When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters."

So VT made a mistake, they decoded the percent-encoded octets before the URI de-referencing. Whereas chrome does not decode them until after the URI has been looked up.

VT returns a clean scan, i click the link and get to a malicious site. VT does not parse URIs correctly here, and this can be used to trick users who will scan a URL using VT and take it for what they see.

After silence from VT team for a long time, i now noticed the bug was fixed.

Show Comments