You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

 

Use Case #1 - Downloading data files 

The ONC researcher community is very diverse, with each researcher having a different set of questions that they want to be able to answer using ONC data. In order to answer those questions, they need to be able find out what data is available, what data products are available for it and be able to download it in a format they can use.

 

A typical workflow might be as follows

  1. Identify the type of instrument(s) that observe the phenomenon of interest (ie, water temperature or pressure)
  2. Narrow down the geographic area and time period
  3. Identify the data products and file formats that meet their research needs
  4. Identify the data product options (like QAQC or Resampling) that need to be applied to the data
  5. Download the data products as files
  6. Perform analysis on the data
  7. Repeat the process for another area, time range or instrument

 

The following provides a sample python script with 3 class/commands to support the following workflow

  1. createXLS class - Used to create a spreadsheet of data product download definitions for Locations and/or Devices, that meets the filter criteria. 
    1. Each line represents a unique data product delivery request.
    2. Contains a tab for configurations, where the researcher can modify the specific parameters used to run the download process, such file output location, size of queue, number of download threads or how to break up the requests into smaller pieces.
    3. Researcher can manually modify the spreadsheet to further refine the download criteria
    4. Can be called from the command line using downloadONC.py createXLS.
  2. process class - Used to create or update a json file that representing the dataProductDelivery requests and 
    1. The json file contains the configuration information needed to run the runDownloadProcess.py script and a list of data product requests, needed to download the data.
    2. Script is rerunnable - The script will identify new request definitions in the spreadsheet and add them to the json file as new data product requests.
    3. Can be called from the command line using downloadONC.py updateFromXLS
  3. download class
    1. The script will 
    2. Multithreaded - uses a request queue, allowing multiple 
    3. Script is rerunnable - It will pick up the process from the last time it was saved.

downloadONC.py classes and commands

Requirements

Requires the onc python library

Script

downloadONC.py

createXLS class & command

The purpose of this class & command is to discover ONC data for download and save the data product download input to a spreadsheet, which can be manually modified and then used with the updateFromXLS class or command to create a download process log file, which is used to request data products from the dataProductDownload service and track the download process using the download class or command.

CLI Usage:

downloadONC.py createXLS -t <token> -o <outpath> -b <begin> -e <end> [options]

 

parameterDescription
-h, --helpShow Help
-t, --token <token>Personal token can be obtained from the 'Web Services API' tab at http://dmas.uvic.ca/Profile
-o, --outpath <outpath>The folder that file(s) will be saved to
-b, --begin <begin>The beginning date/time of the data you would like to download
-e, --end <end> The ending date/time of the data you would like to download
[options] 
-a, --appendAppend configurations to an existing spreadsheet if it already exists, otherwise, create a new spreadsheet.
-f, --file <filename>The destination Excel spreadsheet file name. If omitted, the default file name is <outpath>\download.xlxs
-l, --locationCode <locationCode>

The code for the parent location you would like to use to search for instruments from.

 -c, --deviceCategoryCode <deviceCategoryCode>

The code for the device category of the instrument you would like to download data from.

-d, --deviceCode <deviceCode>

 The code for a specific device you would like to download data from

-v, --propertyCode <propertyCode>

The code for the property you would like to download data for

-p, --dataProductCode <dataProductCode>

The code for the data product you would like to download

-x, --extension <extension>

The file extension for the data product you would like to download

Example:
c:\onc>downloadONC.py createXLS --token=YOUR_TOKEN_HERE --outPath="c:/ONC/Data" --locationCode=USDDL --deviceCategoryCode=CTD --deviceCode=SBECTD19p4686 --begin="2017-05-01T00:00:00.000Z" --end="2017-06-01T00:00:00.000Z" 

Python Example:

from downloadONC import createXLS
 
x = createXLS('<YOUR_TOKEN_HERE>',"c:/ONC/Data")
 
x.locationCode='USDDL'
x.deviceCategoryCode='CTD'
x.deviceCode='SBECTD19p4686'
x.begin='2017-05-01T00:00:00.000Z'
x.end='2017-06-01T00:00:00.000Z'
 
x.createWorkbook('C:/ONC/Data/download.xlsx')

process class / updateFromXLS command

The purpose of this class and command is to take a list of root locations, a list of device categories and a list of date ranges and generate all of the unique request and store them in a process log json file. Once the process log file has been generated, it can be manually updated to include new root location codes, device category codes and time ranges and this script rerun, to update the download processes. The download processes can be executed and the generated files downloaded using the runDownloadProcess.py script.

CLI Usage:

downloadONC.py updateFromXLS -f <file>

 

parameterDescription
-h, --helpShow Help
-f, --file=<file>The Excel spreadsheet file name
Example:
c:\onc>downloadONC.py updateFromXLS --file="C:/ONC/Data/download.xlsx"

Python Example:

from downloadONC import process
p = process()
p.updateFile('C:/ONC/Data/download.xlsx')

download class / command

The purpose of this script is to read data product requests from a json file, run them, download the files, and save the process back to the file. The process runs on multiple threads for running and polling the request status while simultaneously downloading files on a pool of different threads. The process log file can be manually updated to include new locations, device categories and date ranges, and then the runDownloadProcessLogUpdate.py can be run to create new download processes, which will be picked up, next time this script is run.

CLI Usage:

downloadONC.py download -f <file>

 

parameterDescription
-h, --helpShow Help
-f, --file <file>The data product definition json file name

Example:
c:\onc>downloadONC.py download --file="C:/temp/download/download.json"

Python Example:

from downloadONC import download
d = download()
d.execute('C:/temp/download/download.json')

 

Examples

A typical whale researcher wants to be able to use the audio files from all of the ONC hydrophones in the North East Pacific to generate spectra for a complete migration cycle.

Example

  1. Root ONC Tree node location code: NEP - North East Pacific
  2. Device Category: HYDROPHONE
  3. Data Product: AD - Audio Data
  4. File format: mp3
  5. Date Range: June 1st, 2016 to May 31st, 2017
c:\onc>downloadONC.py createXLS --token=YOUR_TOKEN_HERE --outPath="c:/temp" --locationCode=NEP --deviceCategoryCode=HYDROPHONE --dataProductCode=AD --extension=mp3 --begin="2016-06-01T00:00:00.000Z" --end="2017-05-31T23:59:59.999Z" 

 A typical Marine Geoscience Researcher needs to be able to download all of the data, for all of the instruments at a specific location in MatLab format.

  1. Root ONC Tree node location code: LSDDL- Lower Slope Delta Dynamics Laboratory
  2. Device Category: all
  3. Data Product: all
  4. File format: mat
  5. Date Range: Oct 23, 2013 - May 31st, 2017

 

c:\onc>downloadONC.py createXLS --token=YOUR_TOKEN_HERE --outPath="c:/temp" --locationCode=LSDDL --extension=mat --begin="2013-20-23T00:00:00.000Z" --end="2017-05-31T23:59:59.999Z" 

A typical Ocean and Atmospheric Sciences Researcher wants to be able to download all available scalar data in json, for the Strait of Georgia in json format for the past year.

  1. Root ONC Tree node location code: SOG - Strait of Georgia
  2. Device Category: all
  3. Data Product: TSSD
  4. File format: all
  5. Date Range: June 1st, 2016 - May 31st, 2017

 

c:\onc>downloadONC.py createXLS --token=YOUR_TOKEN_HERE --outPath="c:/temp" --locationCode=SOG --dataProductCode=TSSD --begin="2016-06-01T00:00:00.000Z" --end="2017-05-31T23:59:59.999Z" 


A typical video researcher wants to be able to download 1 day's worth of videos and ancillary data from every deployed ONC video camera so that they can validate their algorithms.

  1. Root ONC Tree node location code: ONC - Ocean Networks Canada
  2. Device Category: VIDEOCAM
  3. Data Product: MP4V,TSSD
  4. File format: mp4,json
  5. Date Range: May 31st, 2017 00:00:00 - 23:59:59

c:\onc>downloadONC.py createXLS --token=YOUR_TOKEN_HERE --outPath="c:/temp" --locationCode=ONC --deviceCategoryCode=VIDEOCAM --dataProductCode=MP4V,TSSD --extension=mp4,json --begin="2017-05-31T00:00:00.000Z" --end="2017-05-31T23:59:59.999Z" 

 

Use Case #2 - Near Real-Time display

Use Case Scenarios

  •  A researcher wants to discover ONC, finds out about ONC's APIs and starts playing. He wants to build his own web application but doesn't know what is where, so he would like to receive a list of locations and device types with data available in ONC's archive. After discovering hydrophone data in the Arctic he wants to know if there were data from 2012 available, and what other sensors were recording at the same locations. After finding both temperature and ice-thickness data to be of interest, he wants to see what data formats these data could be delivered in. Finally, he wants to download one year's worth of wav hydrophone files, and also temperature and ice-thickness as CSV files.
  • A researcher has heard that many fin whale calls have been identified on ONC hydrophone data and would like to download the corresponding spectra, both as data and as PNG plots. Ideally his service can link the annotations to the download request, or else he would need to know the times and locations of the annotated data so his script can download the spectra automatically.
  • A researcher operates a mobile platform and wants it to navigate autonomously by having the mobile platform access nearby sonar data on a scheduled request, ideally in real time, in order to check the sonar data for the latest position of the mobile platform. A python script cron job on the mobile platform's system determines when the last sonar scan took place; if it was subsequent to the latest sonar data download, a new sonar scan data download is initiated.
  • A researcher has been conducting a long-running experiment in partnership with ONC. He has devised the instrument platform and gradually improved the design over the years. He maintains his own archive of data, which is populated daily by scripts that pull the latest data from his platform. He mashes up ONC data with other data pulled from other sources like Environment Canada and the Pacific Geosciences Centre. He has also created a sort of clunky dashboard, which shows a collection of latest pre-generated plots from ONC Data Preview. He wants his scripts to run without having to be updated after every expedition or configuration change – he doesn't really want to mess with his scripts once they are working. He also wants to know if data get reprocessed by ONC, so he can be sure he's not doing analyses on flawed data.
  • A researcher checks daily if his data are showing anything interesting and he has a Matlab script that every midnight automatically downloads the last 24 hours of data so he has them on his local machine when he comes to work every morning. His script does some processing on his machine which he rather would be doing on an ONC server so he doesn't have to download the entire raw data but could download spectra as PNG files or down-sampled data or only those full data snippets associated with detected events. This will enable him to see and react to events such as a large turbidity event so he can mobilize a field survey to collect additional data.

  • A researcher wants to see his data as near real-time as possible, both on the monitors in his office as well as on a public display he has running for outreach purposes. He has a python script running every 15 minutes to extract the raw scalar and complex data which then need to be processed and combined with some external data he accesses using ERDDAP, and then turned into to PNG files. This could all be saved if ONC would be able run his python code and ONC had a flexible dashboard that would show both these ONC data as well as non-ONC data. He usually is publishing new versions of his python code in GitHUB.
  • No labels