In May, Google published guidelines for building high-quality sites following the Panda algorithm update.
Now, I don't profess to know the inner workings of the Google algorithm, but I do have some hypotheses for variables that google might be using to determine whether a site is high quality or not.
My theories are based on the recently published Google guidelines, as well as some brainstorming on what variables I might look at if I were developing the Panda algorithm.
Image credit: Kevin Dooley on Flickr
Here are my guesses as to some variables Google may be using:
- Google authorship—posts by an author with many Google+ followers and a history of strong posts are likely to be preferred
- Consistent voice—is the site written by one authority in the area (or several writers who are authorities), or is it outsourced to no-name writers?
- Top-level domain—pages on .edu and .gov sites are likely to be treated with more authority
- Domain age—Newly-launched sites are typically in a sort of "sandbox" for about 6 months before they start seeing significant organic traffic
- Word count—longer posts are likely to rank better. An analysis bo serpiq found that the average content length for a top-10 page is 2500 words.
- Spelling—spell-check before you hit publish. Google will likely ding sites with spelling and grammatical errors.
- Images—does the site have high-quality, relevant images with title tags, alt tags, and relevant filenames?
- Original images—is the site using original images the owner created from scratch, or does it use stock photos?
- Multimedia—unique video content really makes a website stand out.
- Original content—sites with original content, original information, original reporting, original research, or original analysis will rank better than sites that simply summarize research done by other sites.
- Links out—everybody knows that backlinks drive rankings. But I think that links out (sites that your site links to) are also an important factor. A website with many links to journal articles, professors' websites, wikipedia, New York Times articles, etc. is likely to be seen as high-quality content.
- SSL—Matt Cutts, head of webspam at Google, has indicated that he would like to see SSL-enabled sites get a rankings boost, although the timing of this is unclear.
- Reading level—Google could figure out the grade level of the article using a variety of automated techniques. (Here is a website that calculates various readability scores.) I think it might be nice if they gave extra credit to articles written with a wide vocabulary at an advanced level, compared to shallow articles written on content farm sites. In fact, they could take this a step further even, and figure out the preferred reading level for each searcher and then customize the search results accordingly. Although this is a tad creepy, and I could see some ethical concerns with this approach.
- Vocabulary variety—a site that uses a wide variety of different relevant words is likely going into more detail (and providing richer information) compared to a content-farm site that speaks in general, high-level terms. For example, a page about carburetors with the phrases "inlet manifeld" "venturi," and "butterfly valve" is likely to rank higher than a short page that does not go into much detail.
- Fair and balanced—Google has indicated that they are looking for articles that present "both sides of a story." Now developing an algorithm that will read text and figure out what side of a story is being presented is no trivial task. But you could imagine that Google clusters political websites into "Democratic-leaning" and "Republican-leaning," for example (with a clustering algorithm) and then would give bonus points to a news article that included links to both Democratic and Republican sites.
- Personal flavor—Articles written in the first person (like this one here) tend to be more personal and engaging than dry articles written in the third person. So it is possible that articles with a healthy sprinkling of "I" are ranked favorably.
- Social cues—especially the number of Google+ likes and shares. What did you think the point of Google+ was? It's to make Google's core cash cow (paid search) more accurate and more profitable.
- Ad content—Google will ding sites with too many ads on them, especially ads that are "above the fold."
- Time on page—if searchers stay on the page for a long time, that means it is likely to be more engaging and higher quality.