Data Management

Data Management

Data-driven research has become increasingly common lately, which has left researchers with questions about what to do with all their data. Federal grants in particular have requirements for making data accessible, and other funders and groups have similar rules in place. Even if access to data is not mandated by the terms of a grant or institution, a researcher may still want to consider sharing their data. As files and formats change over time, ensuring the long-term survival of research data is also a concern.

Creating a Data Management Plan
The first step to long-term data management is to create a data management plan. A librarian may be able to help you with this. You'll need to think about your data in order to determine the best format and method for storing it. For example, what is the context of your data? What format is it in? How could it be used in the future? Is there sensitive/confidential information in your data? There are many other considerations in creating a data management plan as well - all of them can inform you and anyone working with you on how best to keep your data. If you'd like to read more on data management plans or see examples, check out DMPTool.

Storing Research Data
Once you have a plan, you'll have a better idea of where and how to store your data. At Ithaca College, our Digital Commons is able to host data sets. Data on the Digital Commons is search-engine optimized, shared with the larger Digital Commons community, and easily accessible. The Digital Commons also tracks usage, so you will be able to tell how often people view your data. You can see an example data set on Digital Commons to see if that would work for you. If you prefer to go with another option, there are many available online. Sites like can help you find other data repositories.

Preparing Research Data

If you plan to make your data open, either by choice or due to a funding mandate, you may want to take a few steps to ensure your data is usable and understandable to others. Prepping your data doesn't mean altering it - you should always be sharing your raw data. However, there are some things you can do to increase its value. You can make a data set more usable by:
  • Using non-proprietary formats (.csv instead of Excel, for example)
  • Using clear names for files, tabs, and columns/rows
  • Keeping data and metadata terminology consistent throughout the data set (and using standardized terminology where possible)
  • Cleaning up notes that might be unclear to someone unfamiliar with your work
  • Eliminating or explaining any stray figures, notes, or calculations
For more on making data usable, check out "Some simple guidelines for effective data management" (Borer et al, 2009). 

Data Management Tools

Creating a Data Management Plan

DMP Tool 

Data Management Plan Resources and Examples

Guidelines for Effective Data Management Plans

Prepping your Data

Open Refine (formerly Google Refine)
An open source tool for data transformation and cleanup.

Nesstar Publisher
Data and metadata conversion tools to prepare your data for publication.

Analyzing your Data

Free statistical analysis program - with a bit of a learning curve. 

Statistical analysis program available on library computers.

Data Visualization

Tableau Public
Free data visualization software. Allows you to create interactive, embedd-able graphics.

A fairly simple web-based application for creating and customizing data visualizations.

Open source software to create interactive, data-rich websites. Best for location-related datasets.

Finding Data Sets

There are many, many sources of open data on the internet, but finding the type and level of data set you're looking for can be a challenge. Here are some suggestions of places to start:

DataOne is a data repository containing biological and environmental data sets. DataOne is easy to search and use. is a good place to go for data from the federal government. Contains lots of public data, although it's not always in the most usable format.

DataDryad provides free data sets and other educational resources. Primarily biology data.

Institutional respositories 
Many colleges and universities host data sets in their institutional repositories. It's a little more work, but looking in some IRs might lead to great finds. You can check out Michigan's Deep Blue Repository to see an example. 

Ithaca College Library guide to Statistics and Data Sets