Google Search document leak reveals inner workings of ranking algorithm

Google Search document leak reveals inner workings of ranking algorithm

A trove of leaked documents exposed the inner workings of Google Search, revealing key elements and factors that Google uses to rank content.

On March 13, thousands of internal documents from Google’s Content API Warehouse were released on GitHub by an automated bot named yoshi-code-bot. This leak, shared with Rand Fishkin, co-founder of SparkToro, by Erfan Azimi, an SEO practitioner and the founder of EA Eagle Digital, offers detailed insights into the various components that generate Google's search results.

The leak, named “Google API Content Warehouse” contains over 2,500 pages of internal API documentation. These documents, though containing some descriptions of older systems, mostly appear to be current and detail the functioning of Google Search’s ranking algorithms.

The leaked documents were made publicly available on GitHub on March 27 and remained accessible until May 7. During this period, they were indexed by a third-party service, which means that a copy remains available even after Google removed the original files.

The modules detailed in the leak cover various aspects of Google's operations, including YouTube, Assistant, Books, video search, links, web documents, crawl infrastructure, an internal calendar system, and the People API, according to SEO expert Mike King. Each module is broken down into summaries, types, functions, and attributes, providing a comprehensive view of how these components interact to produce search results.

While the leaked data clarifies what factors Google Search might consider when ranking content, it does not specify the relative importance or “weight” of each factor in the final ranking.

Here's a short summary of the internal Google documents:

  • Twiddlers: Functions that adjust the ranking of documents in search results based on various factors.

  • Demotions: Content can be demoted for reasons like link mismatches, user dissatisfaction, product reviews, location, exact match domains, and pornography.

  • Change History: Google tracks every change ever made to a page but uses only the last 20 changes for link analysis. Various click metrics (badClicks, goodClicks, lastLongestClicks, unsquashedClicks) are used.

  • Author Information: Google stores author information to determine document authorship.

  • Site Authority: Google uses a concept called “siteAuthority” to assess content quality impact on site ranking but denies having a specific website authority score.

  • Chrome Data: Google uses data from its Chrome browser for ranking via a module called ChromeInTotal.

  • Whitelists: Google has whitelists for certain domains, particularly related to elections and COVID.

  • Small Sites: There is a feature called smallPersonalSite, speculated to boost or demote small personal sites.

  • Freshness: Google considers dates in bylines, URLs, and on-page content.

  • Core Topics: Uses page and site embeddings to determine if content is central to the website.

  • Domain Registration: Google stores registration information.

  • Page Titles: The titlematchScore measures how well a title matches a query.

  • Term Weight: Measures the average weighted font size of terms and anchor text in documents.

Some SEO experts who have reviewed the documentation allege that the insights contradict Google's public statements about how Search functions.

As of now, Google has not issued a public comment regarding the leak.


Back to the list

Latest Posts

Cyber Security Week in Review: April 25, 2025

Cyber Security Week in Review: April 25, 2025

In brief: A SAP NetWeaver zero-day bug exploited in the wild, DslogdRAT exploits a recent Ivanti flaw, and more.
25 April 2025
ToyMaker: Financially-motivated IAB that sells access to ransomware gangs

ToyMaker: Financially-motivated IAB that sells access to ransomware gangs

ToyMaker is believed to be behind the custom backdoor dubbed ‘LAGTOY.’
24 April 2025
DragonForce and Anubis ransomware ops use novel models to attract affiliates and boost profits

DragonForce and Anubis ransomware ops use novel models to attract affiliates and boost profits

DragonForce introduced a distributed affiliate branding model.
23 April 2025