CUL - Header

beta

Human Rights Web Archive

Center for Human Rights Documentation & Research at Columbia University

Frequently Asked Questions


  1. What is the Human Rights Web Archive?
  2. How are websites selected for inclusion in the archive?
  3. Can I suggest a website for archiving in this project?
  4. What is your copyright and permissions policy for archiving websites?
  5. How do you collect and store websites?
  6. Do website owners have to change or alter websites to be included in the crawls?
  7. Will your crawling interfere with access to our website? Who do I contact if your crawler causes problems?
  8. Are you able to capture media, audio and video files etc?
  9. How can I view the websites that have been archived?
  10. Why do some websites appear to be incomplete?
  11. I would like my organization's website to be removed from the HRWA. Who do I contact?
What is the Human Rights Web Archive?

The Human Rights Web Archive @ Columbia University is a searchable collection of archived copies of human rights websites created by non-governmental organizations, national human rights institutions, tribunals and individuals. The HRWA is an initiative of the Center for Human Rights Documentation & Research and is a key focus of the Columbia University Libraries' Web Resources Collection Program. The HRWA was made possible by generous support from the Andrew W. Mellon Foundation.

How are websites selected for inclusion in the archive?

Subject specialists at Columbia University Libraries (and from Cornell University Libraries via our 2CUL collaboration) with regional and language expertise select websites for inclusion in the archive. We also invite suggestions from researchers, students, scholars and human rights advocates. Criteria for selection include relevance of the website to current research, teaching, and advocacy, perceived risk of a website disappearing, and likelihood that a website will not be archived or preserved by other means. Organizations whose paper archives are held at Columbia are another priority for web archiving.

Can I suggest a website for archiving in this project?

Yes, we welcome your suggestions. Please use this form to nominate a website for inclusion in the archive. We are especially interested in hearing from human rights organizations that want to nominate their own websites. Website owners should use this form.

What is your copyright and permissions policy for archiving websites?

We follow principles and techniques of non-intrusive harvesting. We attempt to notify all organizations and/or website owners of our interest in archiving their websites. We will refrain from harvesting websites whose owners do not wish to participate in this project. Some websites may contain material that is produced by other parties who may claim copyright ownership of such materials. The HRWA reserves the right to remove any material that in our reasonable opinion may violate copyright or other intellectual property rights. Third-party copyright holders who believe their rights have been infringed by inclusion of their content in our archive may contact us at: culhrweb@libraries.cul.columbia.edu

How do you collect and store websites?

We use the open source web crawler, "Heritrix," to create archival quality copies of the websites. Currently we are managing Heritrix through the Internet Archive's "Archive-It" service. All data created using the Archive-It service (including the archived websites accessible via this portal) is hosted and stored by the Internet Archive. Eventually we will also store the data in Columbia University Libraries' digital repository.

Do website owners have to change or alter websites to be included in the crawls?

No, website owners do not have to change the content, structure, or appearance of their websites to be included in the crawls.

Will your crawling interfere with access to our website? Who do I contact if your crawler causes problems?

We crawl websites at a polite rate so as not to interfere with access to your website. Crawls will generally be run quarterly or semi-annually for actively updated websites, and last for a few days. Once a crawl is complete, the crawler no longer interacts with your server. If you encounter any issues or have any additional questions, please contact us at: culhrweb@libraries.cul.columbia.edu

Are you able to capture media, audio and video files etc?

Yes, downloadable media, audio and video files can usually be captured, although YouTube videos are challenging. Our crawler follows links in order to discover and capture content, so links to content must exist on a website in order for that content to be included in the archive. Streaming audio and video can't be captured at all by the current generation of web crawlers.

How can I view the websites that have been archived?

Archived websites can be viewed on this portal by date of capture by browsing or searching website descriptions on the portal home page and clicking on the "View Website Captures" link on website detail pages. Individual pages or documents from archived websites can also be viewed directly by clicking on results from searching “Page full text.” Please note that loading and rendering of archived websites may take longer than what is typical when viewing live websites. Archived websites are hosted at the Internet Archive, and access to the websites may occasionally be temporarily interrupted by server maintenance there.

Why do some websites appear to be incomplete?

There are several reasons why an archived website may appear to be incomplete. Some types of content are challenging or impossible to capture and/or reproduce, including JavaScript-driven navigation menus, streaming audio and video, and dynamic form and database-driven content.

We can't capture files that are not linked and have to be retrieved from a database via user query. (For example, a publications database that requires one to execute a search in order to access publications.)

This also means that archived websites do not retain all of their original functionalities; internal search boxes or comments functions will not work. In addition, portions of a website may be restricted or password-protected. We will only collect public content, so password protected material will not be crawled.

I would like my organization's website to be removed from the HRWA. Who do I contact?

We will honor requests to remove archived content. Please contact us at: culhrweb@libraries.cul.columbia.edu