Data Storage

policy
data

The Data Team hosted different kinds of data in several locations: data to be used by research scientists at the Center or beyond is on Box, and internal data to the Data Team needed for the data processing pipeline is on GIN. Sensitive credentials are stored on Box.

Box

The Data Team maintained two primary folders on Box: Data and data-team. All deputy directors maintain “Co-Owner” permissions for each of these folders.

Data

All datasets for end-use are located in Data. The folder structure for Data follows this scheme:

Data/
  [project-slug]/
    raw/
      ...
    exports/
      ...
    nightly/
      ...
    rc/
      ...
    reuse/
      ...

The nomenclature employed within its project folder follows the data stage scheme. Access guidelines for each data stage are also provided in the handbook.

Project IDs

All project slugs listed here are consistent with the slugs used in data-processing.

  • gobee: Gobee EdTech application pilot study in 2021
  • peru-md: Measurement development project in cooperation with the Peruvian Ministry of Education in 2020
  • ptl: 2023 Bangladeshi iRRRd longitudinal and CDA studies within the Play to Learn portfolio, in partnership with icddr,b
  • qitabi: QITABI and QITABI 2 project, in partnership with World Learning
  • rti-lego: PLAY 1.0
  • rul: Reach Up and Learn data archiving efforts

data-team

All Data Team auxiliary files (credentials, planning documents, reports, etc) are stored in data-team. The folder structure for data-team is as follows:

credentials/
  ...
data/
  ...
misc/
  ...
planning/
  okrs/
    ...
  scoping/
    ...
reports/
  ...

credentials

The most significant folder in data-team, credentials holds all sensitive data necessary for accessing or rendering data, e.g. passwords, secret keys, and access tokens. Two rules for this folder must be adhered to:

  1. Unrestricted access to this folder is unadvised. Instead, share individual files or subfolders on an as-needed basis.
  2. Moreover, unless you are working with data-processing, do not make any copies of the files contained within.
  3. Do not create any shared links to contents within this folder. Instead, use Box’s “Share” option to email-invite individuals to access contents.

Each folder’s contents are outlined:

  1. aura-db [inactive]: neo4j Aura database credentials used when investigating graph database backend for Item Bank project
  2. data-credentials_pele [inactive]: Data encryption keys for 3EA, Lebanon Teacher Professional Development, and Peru Measurement Development projects
  3. data-proc-creds: Credential files needed to operate the data-processing project
  4. iati-credentials [inactive]: User account credentials for IATI, needed for ERICC inception period
  5. ukds-credentials: UK Data Service credentials, needed for 3EA Niger year 2 data deposit

GIN

GIN is a data version control system freely available for researchers in neuroscience and related fields, hosted at the Ludwig-Maximilians University of Munich. This data store is primarily used for data version and intermediate processing for the data-processing project. Its primary audience is those who wish to contribute code to data-proc for the purpose of keeping all data asset production in the same location.

The data files are located in this repository. The directory structure follows that of the data scheme used above.

Dataverse

Published data, for reuse or replication, is located on the Harvard Dataverse following our data curation standards. Administrative access to TIES’ dataverse is provided to all members of the deputy directorate.

Title Categories
Data Classification policy, data
Data Management  
No matching items