OSINT Blog > Post

Source Code: Investigating Websites Beneath the Surface

Source code is the programming language behind any web page or piece of software. Source code lies beneath the surface of a web page and contains embedded text, images, and videos. Source code can provide valuable insights to an investigator, providing a plethora of information hidden from public view.

The data and information contained in source code include everything visible on the web page and may also include hidden data. The anonymity offered by the Internet has made tracing digitally-focused criminal actors difficult. Internet investigators might be able to identify who is behind a website by exploring the source code of a website. The source code of a website may include relevant information, including the name of the individual or company that either; owns or maintains the website, owns the domain name, or registered the website. Personally identifying information is sometimes not visible or associated with a website as some owners may wish to hide their name when registering a domain name.

A web developer or development service builds a webpage by writing source code in programming languages. These programming languages include HTML, CSS, and Javascript, which appear as readable text in lines of code. The bulk of content available within the source code of a web page is the information that is visible on the web page itself, however, the source code will also identify the programs that are active in the background.

Examining the source code of relevant web pages should be part of every internet investigator's process, even those who do not understand the HTML and CSS programming languages. The text contained within a source code can provide investigators with opportunities to extract media, identify owners, and extract useful information hidden from view, such as social media account details. Web page source code may also contain plain text strings that investigators can utilize to identify new further lines of inquiry. 

Web browsers allow investigators to view website source code, enabling them to see the HTML and CSS code behind the page to understand its development. You can easily see the source code of any web page using your browser now by following these steps:

PC

Click CTRL + U on your keyboard, or right-click on a webpage and select either “View Source” or “Page Source” from the dropdown, depending upon your browser.

Mac

Click Option + Command + U (Safari and Chrome) or Command + U (Firefox) on your keyboard. Alternatively, right-click on the webpage and select either “Page Source”, “View Page Source”, or “View Source” from the dropdown, depending upon your browser.

Investigators can quickly search website source code for keywords using CTRL + F or Command + F and utilizing the ‘Find’ box to scan the current web page for any matches for words or phrases.

Lawrence Alexander of Bellingcat previously showed how investigators can extract a Google Analytics ID from the source code of a web page to attempt to link separate sites by locating the same Google Analytics ID in the source code of those pages. Google Analytics is a popular service that allows website administrators to track who visited the website, how long it took, what web pages were viewed, and which browser and operating system were used. Website administrators often manage multiple websites and frequently use the same Google Analytics account.

When a website owner or administrator is obscuring their identity on the website that an investigator is researching, locating another site with the same Google Analytics ID through services like SpyOnWeb can help unmask them.

Investigators can also analyze source code to prove who the author of a web page is. Embedded strings of code within the source code of a web page can act as a digital fingerprint that can link sites to their creators and owners. Source code author identification is a vital process for investigating cyber crimes and involves identifying the most likely author by comparing other previous undisputed code samples by that same author.

Skopenow is an analytical search engine that uses social media, surface web, deep web, and dark web data to generate actionable intelligence. Skopenow instantly and anonymously collects and analyzes web pages and social media activity, including scanning source code for actionable keywords to identify flag behaviors and hidden connections. Skopenow also produces automated court-ready reports, collating images, text, videos, and metadata. For more information, please e-mail us at sales@skopenow.com.