Great News for SEOs : Google 2024 Algorithm Leaked

Last Modified: Jun 5th, 2024 - Category: News, SEO, SEO Blog
Cover Image for the article "Great news for SEOs: Google 2024 algo leaked"

In a surprising revelation, recent leaks of Google’s search algorithm and internal documentation have provided SEOs with an unprecedented glimpse into the mechanics behind search rankings.

The leak came from an accidental bot post by yoshi-code-bot on March 13, 2024. As you can imagine, given the magnitude of this discovery, it became the subject of scrutiny by many SEOs. The yoshi-code-bot posted a link to a Github repository you can find here, and which is a fork of this one.

These documents, dissected by Andrew Ansley, Director of Helium SEO and Founder of ContentSprout AI, unveil several critical components and methodologies used by Google. Ansley, who previously analyzed the Yandex leak, has meticulously translated and interpreted this latest leak, revealing significant insights for SEO professionals. His findings aim to demystify Google’s algorithm, which has often been shrouded in secrecy and speculation.

Article Takeaways

  1. Neural Search and PageRank_NS: Google has enhanced its classic PageRank algorithm with a neural search variant, PageRank_NS, which improves document understanding and incorporates neural network models for more relevant and accurate search results.
  2. Business Model Identification: The algorithm can identify and rank various business models (e.g., News Websites, YMYL, Personal Blogs, E-Commerce, Video Sites) using specific methods tailored to each type, ensuring appropriate evaluation metrics.
  3. Page Quality (PQ): Google uses a large language model to estimate the “effort” put into creating content, rewarding high-quality, original content with higher rankings. This metric evaluates elements like tools, images, videos, and the depth of information provided.
  4. Advanced Scoring Techniques: Utilizes page embeddings, site embeddings, site focus, and site radius to assess content’s context and relevance. Click data (good clicks, bad clicks, longest clicks, and site-wide impressions) plays a crucial role in determining search rankings.
  5. Google’s Use of Clicks and User Interaction: Confirms that user engagement metrics, such as clicks, are integral to the ranking algorithm, particularly through systems like NavBoost, which adjusts rankings based on user interaction and engagement.

One note of caution: Both the original and forked repositories on GitHub are public. This is, to say the least, incredibly naive. It’s highly doubtful that even the least experienced engineer at Google (and no, Google doesn’t employ incompetent engineers, just in case you’re wondering) would make such a mistake. Personally, I believe this was leaked from another source and then posted as an “original.” The other possibility is that everything is a fake, but I doubt anyone would invest the time to create such an elaborate hoax. There are over 14,000 variables and thousands of archives involved. While technically possible, it’s highly unlikely and borders on conspiracy theory.

Detail of the leaked Google algo for SEOs
A small detail of the leaked Google 2024 algorithm

But… why is it a nightmare for SEOs?

Once you read the content of this lengthy article, you’ll realize there are so many variables to keep track of that it’s virtually impossible to follow everything. Furthermore, you probably won’t be able to track even 10 or 20% of these variables unless you use automation and machine learning.

Google’s 2024 algorithm in detail

Key Discoveries

Neural Search and PageRank

The leak reveals that Google has modified its classic PageRank algorithm with a neural search variant, termed PageRank_NS. This new version enhances document understanding and incorporates neural network models. Interestingly, there are seven different types of PageRank, including the well-known ToolBarPageRank. PageRank_NS appears to be specifically associated with improving the understanding and ranking of documents, making the search results more relevant and accurate.

Google’s adaptation of PageRank through neural search marks a significant shift in how it processes and ranks content. This evolution underscores Google’s commitment to leveraging advanced AI technologies to refine its search capabilities. The neural search variant integrates deep learning techniques, allowing Google to analyze and understand the content at a much deeper level than traditional methods.

Business Model Identification

The leak also reveals that Google’s algorithm includes specific methods to identify various business models such as News Websites, YMYL (Your Money or Your Life), Personal Blogs, E-Commerce, and Video Sites. The rationale behind filtering Personal Blogs remains unclear, but it suggests a targeted approach to categorize and rank different types of content appropriately.

This identification mechanism indicates that Google tailors its ranking criteria based on the business model, ensuring that different types of websites are evaluated using relevant metrics. For instance, E-Commerce sites might be assessed more on transaction-related metrics, while News Websites might be judged on content freshness and relevance.

Core Components

SEOs : Image of Google 2024 Algo leaked
Detail of the leaked Google Algo (click to enlarge)

The algorithm’s most critical components include NavBoost, NSR (Neural Search Rank), and ChardScores. These components directly conflict with what Google has publicly disclosed, highlighting a significant gap between Google’s internal mechanisms and its public statements. NavBoost, for example, measures user engagement and rewards documents that generate better clicks. NSR utilizes neural search to rank documents based on their relevance and content quality, while ChardScores predict site and page quality based on various factors.

Google employs a sitewide authority metric, with traffic from Chrome browsers being one of the key site-wide signals. This metric assesses the overall authority of a website, taking into account factors such as user engagement and site performance. The inclusion of traffic data from Chrome browsers suggests that Google monitors user behavior at a granular level to gauge site credibility and relevance.

Advanced Scoring Techniques

Google utilizes page embeddings, site embeddings, site focus, and site radius in its scoring function. Page embeddings and site embeddings help Google understand the content’s context and relevance. Site focus measures how concentrated a website is on a specific topic, while site radius evaluates how much individual page content deviates from the site’s overall theme.

These metrics, along with various types of click data (good clicks, bad clicks, longest clicks, and site-wide impressions), play a crucial role in determining search rankings. By analyzing these factors, Google can accurately rank pages based on their relevance, quality, and user engagement.

Most Interresting Insights for SEOs

Page Quality (PQ)

One of the most important discoveries is the concept of PageQuality (PQ). Google uses a large language model (LLM) to estimate the “effort” put into article pages. This metric helps Google determine whether a page can be easily replicated. Elements that contribute to high scores in effort calculations include tools, images, videos, unique information, and the depth of information provided. These components are also proven to enhance user satisfaction.

The LLM-based effort estimation highlights Google’s focus on rewarding high-quality, original content. Pages that demonstrate significant effort in terms of content creation and presentation are likely to rank higher.

For example: while this article has a lot of content, it summarizes the findings from another page, so it will get a lower Page Quality ranking.

Topic Borders and Authority

Diagram of Google Algo Indexing System explained
Google Algo Indexing System explained

The concept of topical authority, based on reverse engineering Google Patents, is substantiated in the leak. Google uses metrics like SiteFocusScore, SiteRadius, SiteEmbeddings, and PageEmbeddings to rank pages. The focus measures how much a site concentrates on a specific topic, while the radius evaluates how much page embeddings deviate from the site embedding. Essentially, Google creates a topical identity for a website, against which each page is measured.

This detailed approach ensures that websites with a clear and consistent focus on specific topics are rewarded. By maintaining a strong topical identity, websites can improve their rankings and visibility in search results.

Image Quality and Host NSR

Image Quality

Google assesses image quality using click signals, including usefulness, presentation, appeal, and engagement. These metrics are part of Search CPS Personal data and play a role in ranking. High-quality images that engage users and enhance the content’s value are likely to improve a page’s ranking.

The focus on image quality underscores the importance of visual content in search rankings. Websites should prioritize using high-resolution, relevant images that contribute positively to the user experience.

Host NSR

Host NSR is a site rank computed for host-level (website) site chunks. It includes nsr, site_pr, and new_nsr data. This chunking system measures various site sections to create a site rank, similar to how Google assesses individual pages and topics. By evaluating different chunks of a domain, Google can more accurately determine the overall quality and relevance of a site.

Strategic Insights for SEOs

Image displaying a diagram of Google Mustang algorithm
The new Google Mustang algorithm in a nutshell

The new ranking system, internally known as Mustang, has some strategic insights any Search Engine Optimization expert should consider very carefully.

Website Design

Investing in a well-designed site with intuitive architecture is crucial for optimizing NavBoost. This component rewards documents that generate better user engagement through clicks. A user-friendly web design can significantly enhance navigation and interaction, leading to better engagement metrics and higher rankings.

Content Relevance

SEOs should remove or block pages that aren’t topically relevant. Establishing and reinforcing topical connections is essential. Each page should be optimized according to the components revealed in the leak to ensure high scores. Creating a clear topical focus and maintaining relevance across all pages can improve a site’s overall ranking.

Frequent Updates

Regularly updating content with unique information, new images, and videos is vital. This practice not only keeps content fresh but also enhances the “effort” scores, thereby improving rankings. Google values regularly updated content and prioritizes it in search results. SEOs should aim to keep their content current and engaging to maintain high rankings.

Ranking and Indexing

Image displaying How the new Google 2024 algorithm interacts with Panda.
How the new Google 2024 algo interacts with Panda.

Indexing System

Google’s indexing system, named Alexandria, categorizes documents into tiers using SegIndexer and TeraGoogle. Important and regularly updated content is stored on flash drives, while less significant content is stored on solid-state drives or hard drives. This tiered approach ensures that high-value content is readily accessible and prioritized in search results.

Memory Management

Google maintains a record of every page version but primarily uses the last 20 versions for active consideration. Frequent updates can effectively push out older versions, potentially improving current rankings. This memory management system allows Google to keep track of changes and ensure that the most current and relevant version of a page is ranked.

Ranking Factors and Demotions

Weight and Short Content

Google considers the literal weight (boldness and size) of text in its scoring. Short content can rank well if it is substantial and not considered thin, having a unique scoring system. This finding confirms that concise, high-quality content can perform well in search rankings.

Guest Posts vs. Niche Edits

Guest posts from newer pages receive a higher value multiplier compared to older niche edits, emphasizing the importance of fresh, relevant links. While niche edits are still valuable, guest posts offer a more significant boost due to their recency and relevance.

Twiddlers and NavBoost

Twiddlers

Twiddlers are re-ranking methods, like FreshnessTwiddler and Panda, used to adjust rankings based on various factors. These methods help Google refine search results and improve user satisfaction. By applying these twiddlers, Google can dynamically adjust rankings to reflect the most current and relevant content.

NavBoost

NavBoost is a critical component that rewards documents generating better clicks. It measures user engagement and compares performance against expected values. Google balances user signals with content understanding, using internal link counts, salient terms, and click data. This approach ensures that pages providing a good user experience are rewarded with higher rankings.

Site Interaction and Quality

Subchunks and Page Weight

Google assesses content clarity and relevance in smaller sections (subchunks), which impact the overall page quality. Ensuring each section is topically relevant and concise is crucial. By breaking down content into subchunks, Google can evaluate the quality and relevance of specific sections, leading to more accurate rankings.

Site Quality and Page Quality (PQ)

Google applies standard deviation limits to site quality variations, emphasizing the importance of consistency. Page quality is determined by factors such as effort estimation, incoming/outgoing links, and page relevance. Maintaining high-quality content across all pages can improve a site’s overall ranking.

Additional Insights from the Leaked Documentation

Internal Documentation and API References

The leaked internal documentation for Google’s Content Warehouse API reveals the extensive data Google stores for content, links, and user interactions. This documentation includes summaries, types, functions, and attributes for various features, offering a comprehensive look into the internal workings of Google’s systems.

Content Warehouse API and Document AI Warehouse

The documentation for Google’s now-deprecated Document AI Warehouse was accidentally published, providing insights into Google’s internal microservices. This API includes data on ranking systems, features, and modules, reflecting Google’s complex infrastructure for content storage and processing.

Google’s Use of Clicks and User Interaction

Contrary to public statements, the documentation confirms that Google uses clicks and user interactions as part of its ranking algorithm. The NavBoost system, in particular, employs click-driven measures to adjust rankings. This system, along with historical patents and recent testimonies, underscores the importance of user engagement in search rankings.

The Myth of No Domain Authority

Despite Google’s claims, the leaked documentation reveals that Google does calculate and use a site authority metric. This metric, stored as part of the Compressed Quality Signals, contradicts public statements denying the use of domain authority.

The Sandbox and Chrome Data

The concept of a “sandbox” for new websites is confirmed in the documentation, which mentions an attribute called hostAge used to sandbox fresh spam. Additionally, the use of Chrome data for ranking is also validated, despite previous denials by Google representatives.

The Architecture of Google’s Ranking Systems

Google’s ranking systems operate as a series of microservices, each handling specific aspects of the search process. These systems, outlined in the documentation, include components for crawling, indexing, rendering, processing, ranking, and serving search results. This architecture allows for scalable and efficient handling of search queries.

Implications for SEOs

Focus on Quality and Relevance

The documentation emphasizes the importance of high-quality, relevant content. SEOs should prioritize creating valuable content that engages users and aligns with Google’s ranking metrics.

Regular Updates and Fresh Content

Frequent updates and fresh content are critical for maintaining high rankings. SEOs should continuously update their content with new information, images, and videos to stay relevant and improve their rankings.

Understanding Twiddlers and Boost Systems

SEOs should be aware of the various twiddlers and boost systems that Google uses to adjust rankings. Understanding these systems can help SEOs optimize their strategies and improve their search performance.

And what about EEAT?

In my opinion, I think this content would likely have a significant impact on SEO practices, especially for those SEOs that have highly automated resources. Here’s a breakdown of how it might affect EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness):

Positive Impact on EEAT

  1. Detailed Explanation of Ranking Factors: The leak sheds light on various ranking factors like Page Quality (PQ), user engagement metrics (NavBoost), topical relevance, and content freshness. By understanding these factors, I can create content that clearly meets Google’s criteria for high-quality and trustworthy pages.
  2. Insights into Google’s Internal Workings: The leak reveals details about Google’s internal systems like the Content Warehouse API and Document AI Warehouse. This transparency allows me to gain insights into how Google processes and evaluates content, potentially leading to more informed SEO strategies.
  3. Confirmation of Existing SEO Best Practices: The leak validates many established SEO practices like creating high-quality content, maintaining a user-friendly website design, and regularly updating content. This reinforces the importance of these practices for building an authoritative and trustworthy website.

Negative Impact on EAAT

  1. Focus on Manipulation: The content emphasizes ranking factors like click-through rates (CTR) and user engagement metrics (NavBoost). Over-optimizing for these metrics could lead to tactics that prioritize short-term gains over creating genuinely informative and valuable content. This could potentially harm a website’s trustworthiness in the long run.
  2. Misinterpretation of Information: The leak is a complex document, and I might misinterpret some of the information. For instance, the emphasis on “effort” in content creation (judged by LLMs) could lead to practices like keyword stuffing to inflate content length, which would ultimately be detrimental to user experience and EEAT.

Overall

The impact of the leak on EEAT depends on how I interpret and utilize the information. If we use it strategically to create high-quality, relevant content that prioritizes user experience, the leak can be a valuable tool for improving a website’s EEAT. However, if we focus solely on manipulating ranking factors, I think it could have negative consequences since the leaked Google algorithm includes SEO manipulation scoring.

How can SEO professionals use this to improve results?

The leak of Google’s algorithm and internal documentation provides valuable insights into the secretive algorithms used for search rankings. As long as we understand how to use these algorithms, we can optimize content for quality, relevance, and consistent updates, staying competitive in search results.

In my opinion, the leak of Google’s 2024 algorithm not only demystifies the search ranking process but also offers actionable strategies to improve SEO performance. However, it won’t be universally applicable. Not every SEO can leverage this information, especially considering the involvement of highly automated systems using AI or Quantum UX. This material is likely most suitable for SEOs with a medium-to-high level of expertise. For beginners or site owners, it might be less useful. That being said, it could be a valuable learning tool for them to understand how search works and the tools available.

Nevertheless, the leak offers some important information for SEOs with fewer resources who still want to know hot to improve SEO rankings for their websites.

We can improve your business!

Let us help you with the best solutions for your business.

It only takes one step, you're one click away from getting guaranteed results!

I want to improve my business NOW!