This simple folder structure helps us to define our media processing workflow on each project in order to produce sustainable, archival, and accessible data.

When I first started working with CoDA on the Between the Caves (BtC) Project in 2012, I was introduced to a folder structure which we called the “Processing Pipeline”. This folder structure was designed with different folders from 00_Sources through to 08_Portfolio. As I worked with digitizing and cataloguing images from the 20+ year history of the BtC Project, the system helped me recall where my media were and at what step of “completeness” they were at in the workflow I followed from rote memory, but had no concept of why.


BtC Processing Pipeline in 2012

Later, when I continued on as an official media specialist (aka Metador) at CoDA after graduation, I learned more concepts of archival media processing and techniques of non-destructive editing, the ins and outs of master versus access formats, and digital curation. The biggest constant among all projects we work on, no matter the digital formats or end products, is this “Processing Pipeline”. This simple folder structure is set up at the beginning of all projects we work on and it helps us to define our workflow for each project in order to produce results that meet the needs of clients and funding institutions for sustainable, archival, and accessible data.

The media processing folder management system (aka Processing Pipeline) has evolved since 2012. It helps me to consult on any project we are working on in CoDA with little orientation. On any project, I know where the original media are, can understand which Digital Asset Management systems and local or online databases are in use, and access the end products with ease. Here is what the bare bones of our Processing Pipelines look like today:


A sample processing pipeline for managing images

There are a lot of parallels between the original Between the Caves and the image processing pipeline above. And this structure is very standard for most projects. You will notice that there are holes in the numbering system. This is to allow for flexibility with workflows, as no project is the same. I will attempt to explain the concepts below. This is the philosophy behind the folders:

  • 00_Sources is where files are gathered from various sources. We hope that in this step there is just enough metadata with the files to identify their provenance, but this will not be their permanent location or sort order.
  • 01 & 02 Are typically folders for working on staging data. For example, if a database needs to be set-up for a legacy project, we may store data sets and source sheets in these folders to clean up and organize data and entities to be ingested cleanly into. (in the BtC Folders above, 01_IN-staging is specifically used for this purpose!)
  • 03_Media-IN This is where the original files from sources get transferred to for sorting and organization. Every institution will have their own sorting method. At CoDA we typically sort according to source device or person, then into subfolders according to date of acquisition.
  • 04 & 05 Are saved for any other sorted inboxes. For example, data sets that are acceptable to go into Codifi databases or Mukurtu CMS.
  • 07_Lightroom is where our Lightroom Catalog lives. You can also have other folders for different DAMs you use. Lightroom is our preferred place to manage images locally. Some projects also use Extensis Portfolio or CatDV for other file types.
  • 10_Media is where you will copy your files as you import into your Lightroom Catalog or other DAM. They should still be in the same file structure here as you sorted them in Media_IN. This is where the original media will remain. Since we practice completely non-destructive processing, we never save any changes done within DAMs to the files in this folder. Any derivatives go into the next folder.
  • 12_Published is where your processed media will go. Any products, such as derivatives, archival formats with embedded metadata, crops, artistic representations, etc. will live in 12_Published.

The Processing Pipeline helps us comply with OAIS reference models, by making us define SIP: Submission information packets (folders 00-03, sometimes 12), DIP: Distribution information packets (folders 12 and beyond), and AIP: Archival information packets (a subfolder of 12) at the beginning of a project. It also follows the principles the DCC Curation Lifecycle Model to create digital objects for access and reuse (stored in 12).

If you like the Processing Pipeline and would like to see if you can adopt it for your office, we offer a zipped folder structure for you to download on GitHub. Let us know how it works out for you!

Author: Kelley Shanahan

Apr 18, 2016 | Tips | 1 comment

1 Comment

  1. Chris "The Digital Evangelist" Webster

    Now I understand! Nice. I’ve been using a standard folder structure for projects simply so I can find anything anywhere. I like this structure too. Thanks for the article, Kelly!


Submit a Comment

Your email address will not be published. Required fields are marked *

More Articles

Five Lessons from Hiring a Freelancer Online

Freelancer Developers can be an affordable option for small teams or projects with a tight budget. CoDA tested out Upwork, an online hiring platform, last year and worked with a few freelancers, some better than others! Here are 5 things we learned and how you can get the most for your project on a budget of any size.

read more

Moana: Ancient Folklore Goes Digital

For the ancient Polynesians, folklore was shared by word of mouth, today the most popular platform for telling stories are movies. Moana, Walt Disney Studios’ latest film, tells the story of a young girl raised to become chief of her people who comes to discover she...

read more