Datasets and Data Access Statements

It is best practice, and is often a requirement by both publishers and funders that the underlying data to your research is made openly available for at least 10 years prior to the last time that it was accessed.

Datasets can have embargo periods, restricting the access to the dataset for a period of time, the generally accepted (though not by all funders) embargo period is 12 months. If you require the dataset to be embargoed for longer than this then you will need to provide evidence for the reason for this to the Head of REF & Research Support. If you’re dataset needs to be restricted due to the nature of the data, e.g., human trafficking data, then the data can be restricted or closed. Datasets, like research outputs can be licenced using creative commons licences, that let others know what they can, or cannot do with the data.

Once your dataset has been uploaded to Pure with relevant metadata (e.g., description of the data), the University will then provide you with a DOI for your dataset, that you can then use in your data access statement in any research output that you submit for publication that relates to the data within the dataset.

After your dataset has been checked and validated it will be pushed through to our digital preservation platform, Arkivum to ensure that the data is securely kept and to ensure there is no file degradation. Unless requested to do so, datasets will be kept within Arkivum for a period of 10 years from publication, or last access/use.

Sample Data Access Statements

Purpose of Data Statement

A data access statement should direct the reader of a publication to underpinning data, or explain how underpinning data can be accessed. Researchers should include a data statement in their publications for the purposes of clarity and transparency and in accordance with the open data ethos advocated by UKRI and other research funders.

Data statements should outline:

  • the data that underpins a publication/research (what would be needed to reproduce the research)
  • the source of the data (i.e. host repository and persistent link);
  • any access/use conditions (e.g. access, subject to completion of a non-disclosure agreement).
  • if data is restricted, the reason for the restriction should be intimated.
  • if there is no data underpinning the paper, a statement confirming this should be provided to aid transparency.
  • a digital object identifier should be included in the statement.
  • Data statements need to be included in any submitted manuscripts, so the best time to create your dataset is as soon as your research project is completed and before you begin thinking about publishing.  In order to obtain a DOI (Digital Object Identifier)you’re your dataset, it must first be validated in Pure.

    The data access statement (sometimes referred to as a data availability statement) should be included after the or before the acknowledgments section, before the referencing section within their publication.

    The list below provides some example data statements. Please note that data access statements should be tailored to suit each publication, checking that they meet all funder and publisher requirements.

    • Openly available data

    “All data underpinning this publication are openly available from the University of Northampton Research Explorer at http://doi.org/10.15000/a789457

    • Embargoed data

    “All data underpinning this publication will be available from the University of Northampton Research Explorer at http://doi.org/10.15002/a1234a56 from 01/02/2019 onwards, following the cessation of an embargo period.”

    • Restricted data

    “Due to ethical/commercial issues, data underpinning this publication cannot be made openly available. Further information about the data and conditions for access are available from the University of Northampton Research Explorer at http://doi.org/10.15000/a1234b56

    • Partially restricted data

    “Due to the sensitive nature of this research, only a subset of the participants consented to their anonymised data being retained and shared. Anonymised interview transcripts and survey results from participants who provided consent, other supporting data, and further details relating to the restricted data, are available from the University of Northampton Research Explorer at http://doi.org/10.15129/a1234b56

    • Physical data

    “Physical data supporting this publication are stored by the University of Northampton. Details of the data and how it can be accessed are available from the University of Northampton Research Explorer at http://doi.org/10.15129/a1234b56

    • Secondary data

    “Pre-existing data underpinning this publication are openly available from UKDS at http://doi.org/10.12345/54321. Further information about data processing, and additional new supporting data are available from the University of Northampton Research Explorer at http://doi.org/10.15129/a1234b56

    • No new data created

    “No new data were created during this study. Pre-existing data underpinning this publication were obtained from NPL and are subject to licence restrictions. Full details on how these data were obtained are available in the documentation available from the University of Northampton Research Explorer at http://doi.org/10.15129/a1234b56”

    • No data

    “This work is entirely theoretical, there is no data underpinning this publication.”

    Note that very few studies truly have no data. Email pure@northampton.ac.uk for further advice.

    Creating a dataset record in PURE

    • Select ‘+’ from the left hand side menu, or alternatively click on the green Add New button, and then select datasets
    • Complete all relevant metadata fields in the template, be as detailed as possible.
    • The description of the dataset should be the equivalent of an abstract for an article – e.g., data taken from seven highschools with 3000 – 3500 students in a rural setting, which consisted of interviews with 50 children from each school who were in year 10 and studying both maths and history.  Initial interviews of one hour asking a series of open ended questions in relation to the school curriculum, with a series of 2 follow up one hour interviews in consecutive years monitoring the change in views and opinions.  The data was analysed using SPSS version 11.2.
    • Upload all relevant data files from a networked university computer and set the required visibility, licence and access options
    • Set the status of the record to ‘for validation’ and save the record
    • The record will be reviewed by a member of the Research Support Team and, if it complies with minimum metadata requirements, a Digital Object Identifier (DOI) will be created for the dataset and the record validated. Unless requested, the dataset will remain closed until the date of the research output’s publication, or closed/embargoed if necessary (always respecting the publishers policies and meeting the requirements of any funder).
    • If you need additional support, please email pure@northampton.ac.uk with the subject “Datasets”.
  • Guidance on completing metadata fields in PURE

    Details

    This section collects basic information about the dataset. The information you provide here will help you and others find the record again in Pure and other systems.

    Title

    Provide a name for your dataset that would make sense to someone searching for data, for example ‘Dataset for [journal article name]’

    Description

    Describe your dataset so another researcher could decide if it was relevant to their research. Note, you will have an opportunity to provide a lay summary and detailed documentation when you edit the public facing record, so you do not have to provide them here.

    Collection date(s)

    Provide the start and end date of your period of data collection or of your study period.

    People

    This section collects information about the people responsible for the dataset.

    Select ‘Add person’, or ‘Add organisational unit’ if you want to add a whole organisation as a contributor.

    For each person you add you will be asked to provide their role and affiliation.

    Role

    Use ‘Creator’ to include that person’s name in the citation for the dataset, use another role (for example Data Collector, Project Leader, Rights Holder) to acknowledge a person’s contribution without including them in the citation

    You can add the same person multiple times if they had several roles.

    Affiliation on the dataset

    For each person provide their affiliation at the time that the dataset was created. This field should be automatically generated but you may need to update it if the person has since moved to a different department.

    Dataset managed by

    Check that the ‘Managing organisational unit’ is the department responsible for the dataset.

    Data availability

    This section collects information about where the data will be kept long term, and from where it will be available.

    Publisher

    This refers to where you will archive your dataset. If your data are deposited with an external repository, or you plan to do so, select ‘Change publisher’ and provide the repository name

    Digital Object Identifier (DOI)

    • if your dataset already has a DOI, include it here
    • if your dataset does not have a DOI leave this section blank
    • if you are depositing your data in Pure and it does not yet have a DOI, the Research Support Team will ensure a DOI is minted for your dataset after you have completed this form).

    Locally Held Data

    If you have non-digital or other local data that cannot be uploaded to Pure provide:

    • a description of the data
    • where that data will be kept
    • how that data can be accessed
    • the medium on which they are recorded (e.g. paper)
    • Provide links to any secondary versions of the dataset, such as copies on a project website or another repository or links to closely related resources.

    Publication date

    If your dataset has already been published, give the date here. Otherwise, input the current year.

    Access contact details

    This is the main contact person for the dataset and must be one of the people listed as a ‘Creator’ in the ‘People’ section. If this is left blank we will assume that the first named ‘Creator’ of the project is the main contact.

    Temporal coverage

    If your dataset relates to period of time before you collected it (for example, animal records from 1850-1905), enter the date range here.

    Geographical location

    The information in this section enables your dataset to be found in searches for data about a particular area/place. Leave this section blank if your dataset contains information about vulnerable groups  (for example,  endangered species) or if location is not relevant.

    Geographical coverage

    Use this field to describe the location of data in words (e.g. Northampton, UK).

    Geospatial point and Area

    If it is appropriate and if the data were collected at a specific point, select ‘Point’ and enter the latitude and longitude separated by a comma, do the same for area.

    Legal / ethical

    In this section you must tell us about any data protection, ethical, commercial or sensitivity issues that relate to your dataset. If you are unsure email pure@northampton.ac.uk with the subject Datasets – Ethics for advice before saving your item for validation. (Save dataset as entry in progress until matter resolved).

    Data protection

    Tick this box if your dataset contains personal data unless the data has been fully anonymised.

    Provide details about whether you have gained consent from participants for data sharing, and if so under what terms (for example, data anonymisation or data sharing agreements).

    Ethical approval

    Tick this box if your research required ethical approval.

    Provide details on whether this affects your ability to share the data.

    Commercial constraints

    Tick this box if commercial considerations constrain your ability to share the data.

    In the text box that appears provide further details of these constraints (e.g. embargo time to allow commercialisation). If you have been working with a commercial partner, the constraints should be clear from the collaboration agreements.

    Consult with RIFS (Research Information & Funding Services) for more information.

    Sensitive

    Tick this box if your dataset contains material which may be considered to be sensitive for any reason not already covered (for example, locations of endangered species, fragile ecosystems, genetically-modified crops or animal testing units or data subject to non-disclosure agreements).

    Provide details of the nature of this sensitivity and the restrictions that should be put in place for the dataset or dataset record.

    Relations to other content

    Use the ‘Project’ button to link your dataset to a project. We need to know this in order to tell who funded the work.

    You can also link your dataset to other outputs: – publications (articles, chapters) that make use of the data – related datasets (for example, from the same project or programme of work)

    Visibility

    This section determines whether the dataset will appear on your Research Profile. Set it to ‘Campus’ to begin with. We will change it to ‘Public’ for you later.

    If you have indicated in the ‘Legal/ethical’ section that the dataset is so sensitive that even the record describing it cannot be made public change this setting to ‘Confidential – restricted to associated users and editors’. This will mean that the record will remain in Pure but will not be publicly accessible.

    Does data have to be open?

    In most cases the default setting for deposited data is open access. However, there are circumstances where access to data can be restricted either permanently or for a limited embargo period.

    Common circumstances for restriction include:

    • Data Protection
    • Ethical approval
    • Commercial constraints
    • Sensitive content
    • Moratorium on thesis data
    • Period of exclusivity for data creators.

    The University of Northampton works to the ethos of FAIR (Findable, Accessible, Interoperable, Reusable) Data, ensuring that data is as open as possible, but as closed as necessary.

    Pure offers a range of data visibility and access options to support your data and research requirements, email the Research Support Team at pure@northampton.ac.ukfor further advice.

    External data deposit

    Research data produced at the University of Northampton should be deposited in PURE unless circumstances determine that a more appropriate repository is chosen, these include:

    • The funder of the research mandate a specific data repository
    • In joint research projects the data is deposited in the repository of a collaborator
    • A specialist or disciplinary focused repository exists that provides data with greater impact
    • Certain types of data, typically software or code, are better served in a version control repository such as GitHub.

    The only local requirement for external deposit is that a corresponding dataset record must be created in Pure which references and links to the external repository.

    If you are uncertain where to deposit your dataset/s please contact the Research Support Team at pure@northampton.ac.uk  with the subject Datasets.