Tue. Apr 23rd, 2024
raw data folder

Frequently, research centers and labs have a single file share point with a shared username and password, or, if they do have distinct logins for each user, it is not uncommon to find READ/WRITE access enabled for all authenticated users. This is a poor file management practice since it allows for any user to modify all the data, most importantly, putting the integrity of the raw data at risk. It is critical for researchers to make sure that the collected raw data sets are properly stored in a systematized, secure fashion, with the goal of maintaining the trust, provenance and overall integrity of the raw data.

A better approach, upon acquisition, would be to designate one or a few lead curators whose responsibility would be to moderate the data sets 1. Delegate a special use account to the data manager and enable that account with WRITE permission. That individual would be tasked to ensure that data is placed into the proper directory structure exclusively for the raw data, editable only by the designated data manager and READ-Only to researchers based on granular role-based (RBAC) access control lists (ACL). All researchers’ daily user accounts would have READ permissions as needed according to the systems administration principle of least privilege (PoLP). Permissions should be granted based on the minimum privileges required to perform the needed function. When creating permission structures always use groups rather than assigning permissions based on the individual. This makes it easier to manage the access control lists of structures en masse.

* * *

Remember, curate the data storage for incoming raw data sets in order to preserve the data integrity. The key to success is incorporating RBAC considering PoLP early on in the research data pipeline. Storage systems and file structures will be explored in more detail in future posts.

Footnotes

  1. Wells, Dave. “What Is Data Curation?” Alation, https://www.alation.com/blog/what-is-data-curation/

Leave a Reply