Open Penguin Data Project
mission: to leverage the SEO community to build a model of reasonable approximation of the likelihood a site will be flagged by Penguin 2.0 - style updates.
methodology: build a training set of URL/keyword pairs of those impacted and not impacted by Penguin 2.0 - style updates. Open up the training set to the community for creating features / variables. Allow the community to run various statistical methods on the cumulative data.
assumptions: as with any study, there were several assumptions that I made. First, I assume that any URL/Keyword pair that lost 7 or more position rankings on the 22nd, resulting in a page 2 or worse ranking, after holding steady first page rankings the 5 previous days, that was neither a local nor a time-sensetive posts was hit by Penguin. This means the data could be missing entries that lost fewer than 7 positions or stayed on page 1 that were hit by Penguin. This means the data could wrongfully include entries that were penalized for other reasons on the same day.
data:
- Keyword List: [csv]
- URL List: [csv]
- Ranking Data Set: [csv]
- Current Variable Set: [csv]
studies:
- Current Mean Spearman Correlations
providers:
Everyone who provides data to this project will receive recognition here.
Upload Data:
Instructions: Download the full data set from above and create a new CSV where the first 3 columns are still Penguin Status | Keyword | URL and the fourth and subsequent columns are your variable scores. The first row should include your variable name.