Sunday, July 15, 2012

Crowdsourcing journal/data information

In academia, researchers often work with and learn from data and methods that have been used before. I think are new projects that could fill existing gaps in this process and therefore speedup the learning and research process by reducing redundancy. I'll highlight three here. If they already exist, please let me know. If they don't, hopefully someone could start the project and allow crowd-sourcing of this information (could be as simple as setting up a wiki and collecting links to places where some of this data already exists):

  1. Data scripts: Publicly available datasets are often not in the best of shape. They often need to cleaned-up, labeled, linked to other data sources, or processed in standard ways. Additionally, there may be external information about the quality or other facts that should be documented and understood by researchers. Documentation and scripts (in multiple languages) would be the goal here. See for example
  2. Study replication: Researchers over try to replicate existing studies in order to understand a method, for a class project, or to see conduct extensions. As many authors do not contribute accompanying code for working with the data, there is a lot of reverse-engineering that has to be done. This would not only save time, but disseminate important information about the implicit assumptions in papers.
  3. Typo corrections: Ever puzzled over an equation in a journal article and gone through the trouble of finding that there is a type in it? Published material is never perfect, and unless it is a large mistake, authors normally don't post corrections. But noting small (non-controversial) corrections could still save a lot of time. Obvious spelling mistakes are not worth the time to correct, but even if conclusions aren't overturned it is still helpful to correct intermediate steps.
    Edit: PubPeer seems to provide this function.

No comments: