Data Storage
The Data Team hosted different kinds of data in several locations: data to be used by research scientists at the Center or beyond is on Box, and internal data to the Data Team needed for the data processing pipeline is on GIN. Sensitive credentials are stored on Box.
Box
The Data Team maintained two primary folders on Box: Data
and data-team
. All deputy directors maintain “Co-Owner” permissions for each of these folders.
Data
All datasets for end-use are located in Data
. The folder structure for Data
follows this scheme:
Data/
[project-slug]/
raw/
...
exports/
...
nightly/
...
rc/
...
reuse/
...
The nomenclature employed within its project folder follows the data stage scheme. Access guidelines for each data stage are also provided in the handbook.
Project IDs
All project slugs listed here are consistent with the slugs used in data-processing
.
gobee
: Gobee EdTech application pilot study in 2021peru-md
: Measurement development project in cooperation with the Peruvian Ministry of Education in 2020ptl
: 2023 Bangladeshi iRRRd longitudinal and CDA studies within the Play to Learn portfolio, in partnership with icddr,bqitabi
: QITABI and QITABI 2 project, in partnership with World Learningrti-lego
: PLAY 1.0rul
: Reach Up and Learn data archiving efforts
data-team
All Data Team auxiliary files (credentials, planning documents, reports, etc) are stored in data-team
. The folder structure for data-team
is as follows:
credentials/
...
data/
...
misc/
...
planning/
okrs/
...
scoping/
...
reports/
...
credentials
The most significant folder in data-team
, credentials
holds all sensitive data necessary for accessing or rendering data, e.g. passwords, secret keys, and access tokens. Two rules for this folder must be adhered to:
- Unrestricted access to this folder is unadvised. Instead, share individual files or subfolders on an as-needed basis.
- Moreover, unless you are working with
data-processing
, do not make any copies of the files contained within. - Do not create any shared links to contents within this folder. Instead, use Box’s “Share” option to email-invite individuals to access contents.
Each folder’s contents are outlined:
aura-db
[inactive]: neo4j Aura database credentials used when investigating graph database backend for Item Bank projectdata-credentials_pele
[inactive]: Data encryption keys for 3EA, Lebanon Teacher Professional Development, and Peru Measurement Development projectsdata-proc-creds
: Credential files needed to operate thedata-processing
projectiati-credentials
[inactive]: User account credentials for IATI, needed for ERICC inception periodukds-credentials
: UK Data Service credentials, needed for 3EA Niger year 2 data deposit
GIN
GIN is a data version control system freely available for researchers in neuroscience and related fields, hosted at the Ludwig-Maximilians University of Munich. This data store is primarily used for data version and intermediate processing for the data-processing
project. Its primary audience is those who wish to contribute code to data-proc
for the purpose of keeping all data asset production in the same location.
The data files are located in this repository. The directory structure follows that of the data scheme used above.
Dataverse
Published data, for reuse or replication, is located on the Harvard Dataverse following our data curation standards. Administrative access to TIES’ dataverse is provided to all members of the deputy directorate.