Gemini Observatory Archive APIs

Introduction

This page gives information on accessing the Gemini Observatory Archive programmatically - for example through scripts or on the command line. There are two main aspects to this. Firstly, it is easy to construct (either manually or programmatically) URLs that lead to useful searches or features within the archive system. These could return HTML that you might want to load directly into a browser for example. Secondly, you can construct URLs that return information in JSON that you can then easily process within your own software. We discourage processing HTML from the archive in your own software (i.e. scraping the HTML) - we offer no guarantee that we will maintain the format of the HTML so your software may stop working if we change it, and generally it's easier, more robust, and faster to use one of the JSON APIs to obtain that information.

Basic Concepts

URLs on the archive are of the form https://archive.gemini.edu/ FEATURE/SELECTION where FEATURE and SELECTION are the archive feature you want to access, and the data selection criteria respecively, as explained below. This applies to the human-oriented features of the archive as well as the APIs.

Generally you should fetch these URLs with an HTTP GET. A small number of API functions can accept an HTTP POST with additional information for the query, this will be described for those individual cases. The archive uses https:// URLS. If you try to access the archive with a http:// URL you will receive an HTTP REDIRECT to the https:// URL. It is more efficient to use https:// directly.

Archive Features

You have probbaly used the regular search form for the archive, and noticed that the URL for that is https://archive.gemini.edu/searchform. "searchform" is a good example of an archive feature in this context, and we'll use it with a lot of our examples below. We'll discuss a complete list of features later.

Selection Criteria

This is a key concept of the archive system. Search terms used to select data in the archive are placed directly in the URL separated by slashes, just like directories in a normal URL. Many things that you will want to search by in the archive, such as Gemini Program IDs or UT dates have a prescribed standard format, and the archive will recognise these automatically for what they are. For example 8 digit numbers that refer to valid dates such as 20140410 will be recognised automatically as a UT date, and strings such as GN-2010B-Q-123 will be recognized automatically as Gemini Program IDs.

So, using these examples, using GN-2010B-Q-123 as the selection would select data from that program ID, and using GN-2010B-Q-123/20140410 would select data from that program ID that was taken on that UT date. It does not matter what order you specify these in.

In this way, we can build up URLs that refer to the search or search results of various searches on the archive. All these should be accessed with regular HTTP GET, there's no need to HTTP POST search parameters to the system, we provide a RESTful interface to the data.

If you add /object=foobar to the URL (it's a selection criteria just like the others) then it will only return files with that object name. The problem is that the object name in that sense is whatever they called it in the OT, and it's free form, so if they want to call it pauls_kewl_galaxy or candidate_12 they can do, and they do... So if you did the phase 2 and you know what you called it, then it works great, or if you're lucky and your object only has one name that can be spelled one way...

But what you almost certainly really want to do is to resolve the name and search by ra and dec. The API doesn't do that directly. When you do this in the web searchform, it's basically the searchform that's doing that for you, not the backend search engine. So if you resolve the name yourself (or call some service to do that), you can then search by ra and dec in decimal degrees as simply as,

/ra=40.6698792/dec=-0.0132889.

This will default to a 3 arcmin radius cone search. You can specify the cone search radius by adding /sr=10 (units are arcsec)

Features

Here is a list of the available features on the system:

searchform

The regular search form for the archive. When you pass a selection criteria to the search form it will pre-populate the fields of the form with the selection criteria you have passed, and will execute that search. This is how it works when you do a search and then bookmark the resulting web page or copy the URL for future use etc - when you do the search, it simply places the search terms from the form into the URL and redirects your browser to that URL. If you re-visit this URL later, it will re-do the search and return you the filled out search form with the results. You can then of course modify the fields in the form and search again if you wish.

summary

This works a lot like searchform, except it just sends you the simple results table. You don't get the search form itself, nor do you get the tabs to see associated calibrations or observation logs, you simply get the main search results table showing the results of the search as specified by the selection criteria you specified in the URL. You can also use ssummary (short summary) and lsummary (long summary) to get versions with fewer and more columns respectively.

jsonfilelist

This returns you a JSON representation of the files in the archive that match your selection criteria. You should almost always include /canonical as one of your selection criteria (see note below). The JSON returned contains a list of dictionaries, where each dictionary describes a file in the archive. The keys in the dictionary and their meanings are as follows:

KeyDescription
name The name of the dataset. This does not include the .bz2 if the file is compressed.
filename the filename of the file. Most of the files in the archive are bzip2 compressed, so this will likely end in .bz2
compressed Boolean value saying if the file is compressed or not
lastmod last modification timestamp of the file that was ingested
path path to the file within the data store. This is normally empty on the archive
mdready Boolean value that says if the metadata of the file passes validation
file_size the size of the file as stored in the archive in bytes
data_size the size of the FITS data in the file. If the file is compressed, this represents the size of the uncompressed data. If the file is not compressed this will be the same as file_size
size the same as file_size. Deprecated
file_md5 the MD5 hash of the file as stored in the archive, in hexdigest format
data_md5 the MD5 hash of the FITS data in the file. If the file is compressed, this represents the MD5 hash of the uncompressed data. If the file is not compressed this will be the same as file_md5
md5 the same as file_md5. Deprecated

jsonsummary

This returns you a JSON representation of your search results that includes all the information (and more) that you get on the summary and searchform results tables. You should almost always include /canonical as one of your selection criteria (see note below). As with jsonfilelist, you get a list of dictionaries, one dictionary per file. You get all the keys from jsonfilelist plus a lot more that contain details of the data, as follows:

>
KeyDescription
telescope The telescope where the data were taken
instrument The name of the instrument that took the data
observation_class Gemini Observation Class
observation_type Gemini Observation Type
program_id Gemini Program ID
observation_id Gemini Observation ID
data_label Gemini Data Label
engineering Boolean that says if this is engineering data
science_verification Boolean that says if this is science verification data
procmode Type of data for processed science/calibrations. This is one of 'sq' for science quality, 'ql' for quick look, or '' for raw.
calibration_programBoolean that says if this is calibration program data
requested_iq
requested_cc
requested_bg
requested_wv
The requested site quality percentiles - IQ = Image Quality, CC = Cloud Cover, BG = Sky Background, WV = Water Vapor. The value is the integer percentile, 100 implies "Any"
raw_iq
raw_cc
raw_bg
raw_wv
The measured or estimated delivered site quality percentiles - IQ = Image Quality, CC = Cloud Cover, BG = Sky Background, WV = Water Vapor. The value is the integer percentile, 100 implies "Any"
ut_datetime UT Date and Time of the observation. ISO 8601 format
local_time Local Date and Time of the observation. ISO 8601 format
ra
dec
RA and Dec in decimal degrees
object Object name as specified in the OT
azimuth
elevation
Azimuth and Elevation in decimal degrees
cass_rotator_pa Position Angle of the Cassegrain Rotator, in decimal degrees
airmass Airmass of the observation
adaptive_optics Boolean to say if adaptive optics was in use
laser_guide_star Boolean to say if the laser guide star was in use
wavefront_sensor Says which wavefront sensor was in use
qa_state Quality State of the file - Pass, Usable, Fail, Undefined, CHECK
mode imaging, spectroscopy, LS (Longslit Spectroscopy), MOS (Multi Object Spectroscopy) or IFS (Integral Field Spectroscopy)
spectroscopy Boolean to say if this is spectrally dispersed data
types The AstroData types of this file. This is basically a tag list.
release The date on which the proprietary period of this data expires or expired
reduction The reduction state of the data. "RAW" for raw data.
phot_standard Boolean to say if a Gemini Standard star is in the field.
gcal_lamp Which GCAL lamp was being observed, if any
exposure_time Exposure Time in seconds. If the data are co-added, this is the total of all the coadds.
detector_roi_setting The requested Detector Region of Interest (subarray) name
detector_config A string summarizing the detector configuration
camera The camera in use
detector_binning A string representation of the detector binning in use
wavelength_band The wavelength band
central_wavelength Central Wavelength of spectroscopy data
filter_name Name of the Filter in use
focal_plane_mask Name of the focal plane mask (typically a slit mask for spectroscopy) in use
pupil_mask Name of the pupil plane mask (aka Apodizer for GPI) in use. Few instruments have these.
disperser Name of the disperser (usually a diffraction grating or grism) in use

Downloading Data

Two URLs are available to download data from the archive.

Firstly, the /download URL uses the same selection criteria as detailed above, and will return a tar archive containing the data that match the selection. The files inside the tar archive will be compressed with bzip2. Also inside the tar archive is a README.txt file with some details of the download, and an md5sums.txt file that contains the MD5 hash of each file in the download, so that file integrity can easily be checked using the md5sum program installed on most UNIX-like systems.

Secondly, the /file URL accepts a single filename and will return you just that file, uncompressed and ready to use.

Authentication

Note that these URLs will provide public data without any need to authenticate. However, they do require authentication in order to download proprietary data. In the case where you do not authenticate, the /download URL will send you any public files that match your selection criteria, and will include in the README.txt file a list of files to which you were denied access.

To authenticate to these services, you need to supply your archive session cookie. a web cookie is a small piece of data sent from a web server to your browser that your browser stores and presents back to the web server. When you log in to your account on the archive, the server sends you a cookie called gemini_archive_session. The value of this cookie identifies you to the system and verifies that you have sucessfully logged in. In order to authenticate yourself to the download service, you need to send the same cookie to the server with your request. If you want to do this from a script or the command line you will first have to find the value of that cookie from your browser, and then include it in your download request headers.

Finding the value of the cookie depends on your browser. In Firefox, navigate to any archive page, then go to Tools - Page Info, then inside Security there is a View Cookies button. In Safari first go to Preferences - Advanced and turn on "Show Develop menu" then navigate to an archive page and in the Develop menu click "Show Web Inspector" then select Cookies in the Resources tab. The cookie you need is called gemini_archive_session.

The value of the cookie you need will be a long apparently random string of characters. You should be able to copy / paste it into to your script. Be sure to keep that cookie value confidential, anyone who has it can access your archive account. If you need to reset it (for example because it became known to someone else), then simply use a web browser to Log Out of the archive and the old value will no longer be accepted by the server. When you log back in to the archive, a new value will be sent to the browser where you logged in.

An example of how you might use this cookie in python would be the following (thx Erik Dennihy):

import requests

cookies = dict(gemini_archive_session='my_archive_cookie')
r = requests.get('https://archive.gemini.edu/download/my_archive_search....',cookies=cookies,stream=True)
handle = open('data.tar','wb')
for chunk in r.iter_content(chunk_size=256):
     handle.write(chunk)

Selection Criteria

Here are details of the selection criteria you can use. Multiple ones are separated by /s in the URL just like directories would be.

The present and canonical selection criteria are worth of special explanation. When the archive ingests a new data file, that file is marked as both present (physically present in the file store) and canonical (it is the canonical version of that file). If the file is modified at Gemini (for example the QA state is updated) and it is re-ingested into the archive, the archive adds a new database record for the updated file but the entry for the old file is not deleted, it is simply marked as not present and not canonical. This means that the archive has some history information of previous file versions. We use two separate flags for these because we use the same software for internal data management, where old files may be scrubbed off disk to free up space for new ones - in which case the database record will reflect that file version can be canonical but no longer present on disk.

The search form and data summaries assume canonical as one of the search criteria as you are almost always looking for the most recent (canonical) version of the file. The JSON APIs do not do this so as to allow you to deliberately look for information on previous versions of the file. Assuming you don't want to do that, you should always have /canonical as one your selection criteria with the JSON APIs.

PropertyFormatExampleNotes
Single UT Date YYYYMMDD 20100401 Note that Chilean observing nights span multiple UT dates.
UT Date Range yyyymmdd-YYYYMMDD 20100401-20100420 Inclusive
Telescope Gemini-South or Gemini-North
Instrument GNIRS to get both GMOS-N and GMOS-S just use GMOS
Program ID (GN|GS)-(Semester)-(Program Type)-(Program Number) GN-2009B-Q-51 If the program ID is non-standard, you can use e.g. progid=GN-GNIRS-Engineering
Observation ID (Program ID)-(Observation Number) GN-2009B-Q-51-9 If the observation ID is non-standard, you can use e.g. obsid=GN-GNIRS-Engineering-003
Data Label (Observation ID)-(Dataset number) GN-2009B-Q-51-28-001
Observation Type OBJECT
Observation Class science
Filename S20091028S0097.fits Works with or without the .fits. For non-standard filenames, use filename=some_odd_filename.fits
Filename prefix N201203 Selects all files that have names beginning with... For non-standard filenames, use e.g. filepre=00AUG
QA state Pass Pass, Usable, Fail, NotFail, Win or Lucky - Win means Pass or Usable (ie not Fail or Undefined), NotFail is literally every state other than Fail, Lucky means Pass or Undefined.
Mode imaging imaging or spectroscopy.
Adaptive Optics AO AO means any adaptive optics in use. NOTAO means not AO. NGS means NGS AO, LGS means LGS AO
Science Quality sq Can be 'sq' for science quality or 'ql' for quick look
File curation canonical present, canonical, notpresent, notcanonical. See note above.
GMOS grating name B600 Selects only files using that GMOS grating
GMOS mask name GN2009BC009-04 Selects only files using that GMOS mask name
Binning NxM 2x2 Pixel binning. Unbinned data shows as 1x1
Detector config high The words low, high, slow, fast will be interpreted as detector configurations - ie read gain setting or read speed setting
Detector ROI FullFrame The words FullFrame, CentralSpectrum, CentralStamp are interpreted as detector Region of Interest (ROI). FullFrame will include instruments that do not have configurable ROIs.
Calibration Type ARC This is only useful with the calibration systems
Reduction State RAW e.g. RAW, PREPARED, PROCESSED_BIAS, PROCESSED_FLAT. Most data in the archive is RAW data, but this is how you specify if you are looking for a certain type or reduced data.
Photometric Standard photstandard photstandard Selects only data that have a Gemini photometric standard in the field of view.
Filter filter=(filtername) filter=g Select by filter name. Don't include any _G1234 type suffix.
Twilight observation Twilight Use Twilight or NotTwilight to select or exclude twilight sky observations (eg twilight flat fields)
exposure time exposure_time=(number or range) exposure_time=59-61 Select by exposure time. Due to floating point ambiguity with some of the instruments, it's best to use a range.
coadds coadds=(integer) coadds=10 Select by number of coadds. Integer exact match
Telescope sky position ra=(min-max)/dec=(min-max) ra=123.5-123.6/dec=21.1-22.2. For negative Dec, dec=-22.3--24.5. ICRS RA and Dec, decimal degrees, or HH:MM:SS.sss / [-]DD:MM:SS.sss
Telescope position az=(min-max)/el=(min-max)/crpa=(min-max) az=155.0-155.5/el=88.0-89.0 Telescope Azimuth, Elevation and Cassegrain Rotator Position Angle. Decimal degrees. You must supply two numbers (decimal point optional and negative sign) separated by a hyphen. Note that it does not do intelligent range wrapping, the smaller number must be first and there's currently no way to select everything in the 20 degree range between 350 and 10 degrees. Also it takes the numbers from the header literally, in the az and crpa systems these may not be limited to 0:360 or -180:180 due to the 540 degree range of these systems.

Sample python code

Simple file list using jsonfilelist

This is a simple example in python that constructs a jsonfilelist URL, fetches the JSON document text into a string, parses that into a list of dictionaries, and then loops through showing some details of the files we found.


import urllib
import json

# Construct the URL. We'll use the jsonfilelist service
url = "https://archive.gemini.edu/jsonfilelist/"

# List the files for GN-2010B-Q-22 taken with GMOS-N on 2010-12-31
url += "canonical/GN-2010B-Q-22/GMOS-N/20101231"

# Open the URL and fetch the JSON document text into a string
u = urllib.urlopen(url)
jsondoc = u.read()
u.close()

# Decode the JSON
files = json.loads(jsondoc)

# This is a list of dictionaries each containing info about a file
for f in files:
    print "Filename: %s" % f['filename']
    print "-- file size: %d, data size: %d" % (f['file_size'], f['data_size'])

Gives the following output:


Filename: N20101231S0338.fits.bz2
-- file size: 21314852, data size: 57548160
Filename: N20101231S0339.fits.bz2
-- file size: 21130627, data size: 57548160
Filename: N20101231S0340.fits.bz2
-- file size: 21022898, data size: 57548160
Filename: N20101231S0341.fits.bz2
-- file size: 21154419, data size: 57548160
Filename: N20101231S0342.fits.bz2
-- file size: 21086479, data size: 57548160
Filename: N20101231S0343.fits.bz2
-- file size: 21018470, data size: 57548160

More details of the files using jsonsummary

Similar to the example above, but getting more details about the observations using the jsonsummary API


import urllib
import json

# Construct the URL. We'll use the jsonsummary service
url = "https://archive.gemini.edu/jsonsummary/"

# List the OBJECT files taken with GMOS-N on 2010-12-31
url += "canonical/OBJECT/GMOS-N/20101231"

# Open the URL and fetch the JSON document text into a string
u = urllib.urlopen(url)
jsondoc = u.read()
u.close()

# Decode the JSON
files = json.loads(jsondoc)

# This is a list of dictionaries each containing info about a file
total_data_size = 0
print "%20s %22s %10s %8s %s" % ("Filename", "Data Label", "ObsClass",
                                 "QA state", "Object Name")
for f in files:
    total_data_size += f['data_size']
    print "%20s %22s %10s %8s %s" % (f['name'], f['data_label'],
                                     f['observation_class'], f['qa_state'],
                                     f['object'])

print "Total data size: %d" % total_data_size

Gives the following output:


            Filename             Data Label   ObsClass QA state Object Name
 N20101231S0222.fits  GN-2010B-Q-51-196-001        acq   Usable CDFS
 N20101231S0223.fits  GN-2010B-Q-51-196-002        acq   Usable CDFS
 N20101231S0224.fits  GN-2010B-Q-51-196-003        acq   Usable CDFS
 N20101231S0274.fits  GN-2010B-Q-51-196-004        acq     Pass CDFS
 N20101231S0275.fits  GN-2010B-Q-51-196-005        acq     Pass CDFS
 N20101231S0276.fits  GN-2010B-Q-51-196-006        acq     Pass CDFS
 N20101231S0278.fits  GN-2010B-Q-51-169-006    science     Pass CDFS
 N20101231S0333.fits  GN-2010B-C-10-103-001        acq     Pass SDSSJ0841+2042
 N20101231S0334.fits  GN-2010B-C-10-103-002        acq     Pass SDSSJ0841+2042
 N20101231S0335.fits  GN-2010B-C-10-103-003        acq     Pass SDSSJ0841+2042
 N20101231S0337.fits   GN-2010B-C-10-82-004    science     Pass SDSSJ0841+2042
 N20101231S0338.fits    GN-2010B-Q-22-4-001    science     Pass L5c02
 N20101231S0339.fits    GN-2010B-Q-22-4-002    science     Pass L5c02
 N20101231S0340.fits    GN-2010B-Q-22-4-003    science     Pass L5c02
 N20101231S0341.fits    GN-2010B-Q-22-4-004    science     Pass L5c02
 N20101231S0342.fits    GN-2010B-Q-22-4-005    science     Pass L5c02
 N20101231S0343.fits    GN-2010B-Q-22-4-006    science     Pass L5c02
 N20101231S0369.fits   GN-2010B-Q-37-56-001        acq     Pass Cluster F - East
 N20101231S0370.fits   GN-2010B-Q-37-56-002        acq     Pass Cluster F - East
 N20101231S0371.fits   GN-2010B-Q-37-56-003        acq     Pass Cluster F - East
 N20101231S0373.fits   GN-2010B-Q-37-17-008    science     Pass Cluster F - East
 N20101231S0374.fits   GN-2010B-Q-37-17-009    science     Pass Cluster F - East
 N20101231S0377.fits   GN-2010B-Q-37-17-012    science     Pass Cluster F - East
 N20101231S0378.fits    GN-2010B-Q-64-1-001    science     Pass PTF10cwr
 N20101231S0379.fits    GN-2010B-Q-64-1-002    science     Pass PTF10cwr
 N20101231S0380.fits    GN-2010B-Q-64-1-003    science     Pass PTF10cwr
 N20101231S0381.fits    GN-2010B-Q-64-1-004    science     Pass PTF10cwr
 N20101231S0382.fits    GN-2010B-Q-64-1-005    science     Pass PTF10cwr
 N20101231S0383.fits    GN-2010B-Q-64-1-006    science     Pass PTF10cwr
 N20101231S0397.fits  GN-CAL20101231-11-001 partnerCal     Pass PG1323-086
 N20101231S0398.fits  GN-CAL20101231-11-002 partnerCal     Pass PG1323-086
 N20101231S0399.fits  GN-CAL20101231-11-003 partnerCal     Pass PG1323-086
 N20101231S0400.fits  GN-CAL20101231-11-004 partnerCal     Pass PG1323-086
 N20101231S0401.fits  GN-CAL20101231-11-005 partnerCal     Pass PG1323-086
 N20101231S0402.fits  GN-CAL20101231-11-006 partnerCal     Pass PG1323-086
 N20101231S0403.fits  GN-CAL20101231-11-007 partnerCal     Pass PG1323-086
 N20101231S0404.fits  GN-CAL20101231-11-008 partnerCal     Pass PG1323-086
 N20101231S0405.fits  GN-CAL20101231-11-009 partnerCal     Pass PG1323-086
 N20101231S0406.fits  GN-CAL20101231-11-010 partnerCal     Pass PG1323-086
 N20101231S0407.fits  GN-CAL20101231-11-011 partnerCal     Pass PG1323-086
 N20101231S0408.fits  GN-CAL20101231-11-012 partnerCal     Pass PG1323-086
 N20101231S0409.fits  GN-CAL20101231-11-013 partnerCal     Pass PG1323-086
 N20101231S0410.fits  GN-CAL20101231-11-014 partnerCal     Pass PG1323-086
 N20101231S0411.fits  GN-CAL20101231-11-015 partnerCal     Pass PG1323-086
 N20101231S0412.fits  GN-CAL20101231-11-016 partnerCal     Pass PG1323-086
 N20101231S0413.fits  GN-CAL20101231-12-001     dayCal     Pass Twilight
 N20101231S0414.fits  GN-CAL20101231-12-008     dayCal     Pass Twilight
 N20101231S0415.fits  GN-CAL20101231-12-009     dayCal     Pass Twilight
 N20101231S0416.fits  GN-CAL20101231-12-010     dayCal     Pass Twilight
 N20101231S0417.fits  GN-CAL20101231-12-011     dayCal     Pass Twilight
 N20101231S0418.fits  GN-CAL20101231-12-012     dayCal     Pass Twilight
 N20101231S0419.fits  GN-CAL20101231-12-013     dayCal     Pass Twilight
 N20101231S0420.fits  GN-CAL20101231-12-014     dayCal     Pass Twilight
 N20101231S0421.fits  GN-CAL20101231-12-015     dayCal     Pass Twilight
 N20101231S0422.fits  GN-CAL20101231-12-016     dayCal     Pass Twilight
 N20101231S0423.fits  GN-CAL20101231-12-017     dayCal     Pass Twilight
 N20101231S0424.fits  GN-CAL20101231-12-018     dayCal     Pass Twilight
 N20101231S0425.fits  GN-CAL20101231-12-019     dayCal     Pass Twilight
 N20101231S0426.fits  GN-CAL20101231-12-020     dayCal     Pass Twilight
 N20101231S0427.fits  GN-CAL20101231-12-021     dayCal     Pass Twilight
 N20101231S0428.fits  GN-CAL20101231-12-022     dayCal     Pass Twilight
Total data size: 1856059200