<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
         xmlns:xs="https://www.w3.org/2001/XMLSchema"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd"
         system="hfr"
         scope="system"
         packageId="knb-lter-hfr.91.28">
   <access authSystem="knb" order="allowFirst" scope="document">
      <allow>
         <principal>uid=HFR,o=lter,dc=ecoinformatics,dc=org</principal>
         <permission>all</permission>
      </allow>
      <allow>
         <principal>public</principal>
         <permission>read</permission>
      </allow>
   </access>
   <dataset id="HF091">
      <alternateIdentifier system="https://doi.org">doi:10.6073/pasta/c622edd9114927f407cd55adb323fee7</alternateIdentifier>
      <title>Software Tools to Collect and Use Provenance in R</title>
      <creator>
         <individualName>
            <givenName>Emery</givenName>
            <surName>Boose</surName>
         </individualName>
         <userId directory="https://orcid.org">https://orcid.org/0000-0003-4820-0231</userId>
      </creator>
      <creator>
         <individualName>
            <givenName>Aaron</givenName>
            <surName>Ellison</surName>
         </individualName>
         <userId directory="https://orcid.org">https://orcid.org/0000-0003-4151-6081</userId>
      </creator>
      <creator>
         <individualName>
            <givenName>Elizabeth</givenName>
            <surName>Fong</surName>
         </individualName>
      </creator>
      <creator>
         <individualName>
            <givenName>Matthew</givenName>
            <surName>Lau</surName>
         </individualName>
         <userId directory="https://orcid.org">https://orcid.org/0000-0003-3758-2406</userId>
      </creator>
      <creator>
         <individualName>
            <givenName>Barbara</givenName>
            <surName>Lerner</surName>
         </individualName>
      </creator>
      <creator>
         <individualName>
            <givenName>Thomas</givenName>
            <surName>Pasquier</surName>
         </individualName>
         <userId directory="https://orcid.org">https://orcid.org/0000-0001-6876-1306</userId>
      </creator>
      <creator>
         <individualName>
            <givenName>Margo</givenName>
            <surName>Seltzer</surName>
         </individualName>
         <userId directory="https://orcid.org">https://orcid.org/0000-0002-2165-4658</userId>
      </creator>
      <pubDate>2024</pubDate>
      <language>English</language>
      <abstract>
         <section>
            <para>The software tools that scientists use to process and analyze data are typically optimized for performance and ease of use. Few if any such tools are designed to capture and record the details of what happens as the tool performs its task. This detailed information, and more generally the history of an item of data from its creation to its present state, is known as provenance. Provenance has the potential to make science more transparent, reliable, and reproducible.</para>
            <para>This project focused on collecting and using provenance for scripts written in the R statistical language, which is widely used by ecologists and environmental scientists for data analysis and visualization. Our tools include a provenance collector (rdtLite), which collects provenance as an R script executes (or during a console session), as well as other tools that use the collected provenance to document and visualize the execution or to support activites such as script debugging. The R packages included here are also available on CRAN. For more details, see the project website on GitHub (https://end-to-end-provenance.github.io).</para>
         </section>
      </abstract>
      <keywordSet>
         <keyword>analytical tools</keyword>
         <keyword>modeling</keyword>
         <keywordThesaurus>LTER controlled vocabulary</keywordThesaurus>
      </keywordSet>
      <keywordSet>
         <keyword>disturbance</keyword>
         <keywordThesaurus>LTER core area</keywordThesaurus>
      </keywordSet>
      <keywordSet>
         <keyword>Harvard Forest</keyword>
         <keyword>HFR</keyword>
         <keyword>LTER</keyword>
         <keyword>USA</keyword>
         <keywordThesaurus>HFR default</keywordThesaurus>
      </keywordSet>
      <intellectualRights>
         <section>
            <para>This dataset is released to the public under Creative Commons CC0 1.0 (No Rights Reserved). Please keep the dataset creators informed of any plans to use the dataset. Consultation with the original investigators is strongly encouraged. Publications and data products that make use of the dataset should include proper acknowledgement.</para>
         </section>
      </intellectualRights>
      <licensed>
         <licenseName>Creative Commons Zero v1.0 Universal</licenseName>
         <url>https://spdx.org/licenses/CC0-1.0.html</url>
         <identifier>CC0-1.0</identifier>
      </licensed>
      <distribution>
         <online>
            <url function="information">https://harvardforest.fas.harvard.edu/exist/apps/datasets/showData.html?id=hf091</url>
         </online>
      </distribution>
      <coverage>
         <geographicCoverage>
            <geographicDescription>Global. Coordinates based on WGS84 datum.</geographicDescription>
            <boundingCoordinates>
               <westBoundingCoordinate>-180</westBoundingCoordinate>
               <eastBoundingCoordinate>+180</eastBoundingCoordinate>
               <northBoundingCoordinate>+90</northBoundingCoordinate>
               <southBoundingCoordinate>-90</southBoundingCoordinate>
            </boundingCoordinates>
         </geographicCoverage>
      </coverage>
      <maintenance>
         <description>
            <para>complete</para>
         </description>
      </maintenance>
      <contact scope="document">
         <positionName>Information Manager</positionName>
         <organizationName>Harvard Forest</organizationName>
         <address scope="document">
            <deliveryPoint>324 North Main Street</deliveryPoint>
            <city>Petersham</city>
            <administrativeArea>MA</administrativeArea>
            <postalCode>01366</postalCode>
            <country>USA</country>
         </address>
         <phone phonetype="voice">(978) 724-3302</phone>
         <electronicMailAddress>hf-im@lists.fas.harvard.edu</electronicMailAddress>
      </contact>
      <publisher scope="document">
         <organizationName>Harvard Forest</organizationName>
         <address scope="document">
            <deliveryPoint>324 North Main Street</deliveryPoint>
            <city>Petersham</city>
            <administrativeArea>MA</administrativeArea>
            <postalCode>01366</postalCode>
            <country>USA</country>
         </address>
         <phone phonetype="voice">(978) 724-3302</phone>
         <phone phonetype="fax">(978) 724-3595</phone>
         <onlineUrl>https://harvardforest.fas.harvard.edu</onlineUrl>
      </publisher>
      <methods>
         <methodStep>
            <description>
               <section>
                  <para>1. rdtLite collects provenance as an R script executes (or during a console session) and saves it in extended PROV-JSON format to the file prov.json. By default, this file is written to the R session temporary directory (to meet CRAN requirements) and is overwritten in subsequent executions of the same script (or console session), but you can choose to save it elsewhere and to save time-stamped versions if desired. Simple data values are automatically saved in the prov.json file. Complex data values (e.g. R lists or data frames) may optionally be saved (wholly or in part) as separate snapshot files.</para>
                  <para>2. provDebugR uses the provenance collected by rdtLite to support time-traveling debugging of an R script without the need to set breakpoints or insert print statements and rerun the script.</para>
                  <para>3. provExplainR uses the provenance collected by rdtLite from two different executions of a script to help explain why the script results differ.</para>
                  <para>4. provGraphR creates an adjacency matrix from the provenance object created by provParseR. The adjacency matrix can then be used to quickly traverse the provenance graph. This package supports other packages and is not intended to be used directly.</para>
                  <para>5. provParseR facilitates access to the provenance information collected by rdtLite. The prov.parse function accepts this information as a string or file in extended PROV-JSON format and returns it as an R object. Access functions then extract the desired information from this object and returns it as a data frame. This package supports other packages and is not intended to be used directly.</para>
                  <para>6. provSummarizeR creates a concise high-level summary of the provenance collected by rdtLite, including information about computing environment, loaded libraries, sourced scripts, and inputs and outputs.</para>
                  <para>7. provTraceR uses the provenance collected by rdtLite for a single R script or a series of R scripts to identify input files, output files, and exchanged files based on file hash values.</para>
                  <para>8. provViz provides an R interface to a visualization tool, written in Java, that allows you to view and query the provenance graph directly. You will need to have Java installed for this to work.</para>
                  <para>Note: the R packages included here have been renamed for archival purposes. Please rename the package file after downloading to remove the archival prefix before installing as an R package (e.g. rename "hf091-01-rdtLite_1.4.tar.gz" to "rdtLite_1.4.tar.gz").</para>
               </section>
            </description>
         </methodStep>
      </methods>
      <project>
         <title>Harvard Forest Long-Term Ecological Research</title>
         <personnel>
            <organizationName>Harvard Forest</organizationName>
            <address>
               <deliveryPoint>324 North Main Street</deliveryPoint>
               <city>Petersham</city>
               <administrativeArea>MA</administrativeArea>
               <postalCode>01366</postalCode>
               <country>USA</country>
            </address>
            <phone phonetype="voice">(978) 724-3302</phone>
            <phone phonetype="fax">(978) 724-3595</phone>
            <onlineUrl>https://harvardforest.fas.harvard.edu</onlineUrl>
            <userId directory="https://ror.org">https://ror.org/059cpzx98</userId>
            <role>pointOfContact</role>
         </personnel>
         <abstract>The Harvard Forest Long-Term Ecological Research (LTER) program examines ecological dynamics in the New England region resulting from natural disturbances, environmental change, and human impacts.</abstract>
         <funding>National Science Foundation LTER grants: DEB-8811764, DEB-9411975, DEB-0080592, DEB-0620443, DEB-1237491, DEB-1832210.</funding>
      </project>
      <otherEntity id="HF091-01">
         <entityName>hf091-01-rdtLite_1.4.tar.gz</entityName>
         <entityDescription>rdtLite v. 1.4</entityDescription>
         <physical>
            <objectName>hf091-01-rdtLite_1.4.tar.gz</objectName>
            <size unit="byte">368269</size>
            <authentication method="MD5">7f1229f80b8c93159edb337d758c9909</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-01-rdtLite_1.4.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-02">
         <entityName>hf091-02-provDebugR_1.0.1.tar.gz</entityName>
         <entityDescription>provDebugR v. 1.0.1</entityDescription>
         <physical>
            <objectName>hf091-02-provDebugR_1.0.1.tar.gz</objectName>
            <size unit="byte">91110</size>
            <authentication method="MD5">cef27d4b0dc60f61f464b291b98d2526</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-02-provDebugR_1.0.1.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-03">
         <entityName>hf091-03-provExplainR_1.1.1.tar.gz</entityName>
         <entityDescription>provExplainR v. 1.1.1</entityDescription>
         <physical>
            <objectName>hf091-03-provExplainR_1.1.1.tar.gz</objectName>
            <size unit="byte">44552</size>
            <authentication method="MD5">112a656b2fd05f36f97cb5efb2f233ee</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-03-provExplainR_1.1.1.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-04">
         <entityName>hf091-04-provGraphR_1.0.1.tar.gz</entityName>
         <entityDescription>provGraphR v. 1.0.1</entityDescription>
         <physical>
            <objectName>hf091-04-provGraphR_1.0.1.tar.gz</objectName>
            <size unit="byte">24132</size>
            <authentication method="MD5">30edec6eb55d88f3ed56063fab7e9e49</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-04-provGraphR_1.0.1.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-05">
         <entityName>hf091-05-provParseR_1.0.tar.gz</entityName>
         <entityDescription>provParseR v. 1.0</entityDescription>
         <physical>
            <objectName>hf091-05-provParseR_1.0.tar.gz</objectName>
            <size unit="byte">57502</size>
            <authentication method="MD5">5055ea270449431a9b1b50763c499184</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-05-provParseR_1.0.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-06">
         <entityName>hf091-06-provSummarizeR_1.5.1.tar.gz</entityName>
         <entityDescription>provSummarize v. 1.5.1</entityDescription>
         <physical>
            <objectName>hf091-06-provSummarizeR_1.5.1.tar.gz</objectName>
            <size unit="byte">26021</size>
            <authentication method="MD5">9847833b38cd69d67942cdf6d3c592d9</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-06-provSummarizeR_1.5.1.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-07">
         <entityName>hf091-07-provTraceR_1.0.tar.gz</entityName>
         <entityDescription>provTraceR v. 1.0</entityDescription>
         <physical>
            <objectName>hf091-07-provTraceR_1.0.tar.gz</objectName>
            <size unit="byte">20232</size>
            <authentication method="MD5">41170c2260392d8364e67d143b5927c7</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-07-provTraceR_1.0.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
      <otherEntity id="HF091-08">
         <entityName>hf091-08-provViz_1.0.9.tar.gz</entityName>
         <entityDescription>provViz v. 1.0.9</entityDescription>
         <physical>
            <objectName>hf091-08-provViz_1.0.9.tar.gz</objectName>
            <size unit="byte">3163921</size>
            <authentication method="MD5">c48cff5b6eff1f03b71bbdc4f0c6800f</authentication>
            <compressionMethod>tar.gz</compressionMethod>
            <dataFormat>
               <externallyDefinedFormat>
                  <formatName>R package</formatName>
               </externallyDefinedFormat>
            </dataFormat>
            <distribution>
               <online>
                  <url function="download">https://harvardforest.fas.harvard.edu/data/p09/hf091/hf091-08-provViz_1.0.9.tar.gz</url>
               </online>
            </distribution>
         </physical>
         <entityType>script</entityType>
      </otherEntity>
   </dataset>
   <additionalMetadata>
      <metadata>
         <additionalClassifications>
            <researchTopic>informatics</researchTopic>
            <studyType>modeling</studyType>
         </additionalClassifications>
      </metadata>
   </additionalMetadata>
   <additionalMetadata>
      <metadata>
         <additionalLinks>
            <url name="Project website on Github">https://end-to-end-provenance.github.io</url>
         </additionalLinks>
      </metadata>
   </additionalMetadata>
</eml:eml>
