This page gives information on accessing the Gemini Observatory Archive programmatically - for example through scripts or on the command line. There are two main aspects to this. Firstly, it is easy to construct (either manually or programmatically) URLs that lead to useful searches or features within the archive system. These could return HTML that you might want to load directly into a browser for example. Secondly, you can construct URLs that return information in JSON that you can then easily process within your own software. We discourage processing HTML from the archive in your own software (i.e. scraping the HTML) - we offer no guarantee that we will maintain the format of the HTML so your software may stop working if we change it, and generally it's easier, more robust, and faster to use one of the JSON APIs to obtain that information.
URLs on the archive are of the form https://archive.gemini.edu/ FEATURE/SELECTION
where
FEATURE and SELECTION are the archive feature you want to access, and the data selection criteria
respecively, as explained below. This applies to the human-oriented features of the archive as well as the APIs.
Generally you should fetch these URLs with an HTTP GET. A small number of API functions can accept an HTTP POST with
additional information for the query, this will be described for those individual cases. The archive uses
https://
URLS. If you try to access the archive with a http://
URL you will receive an
HTTP REDIRECT to the https://
URL. It is more efficient to use https://
directly.
You have probbaly used the regular search form for the archive, and noticed that the URL for that is
https://archive.gemini.edu/searchform
. "searchform" is a good example of an archive feature
in this context, and we'll use it with a lot of our examples below. We'll discuss a complete list of features
later.
This is a key concept of the archive system. Search terms used to select data in the archive are placed
directly in the URL separated by slashes, just like directories in a normal URL. Many things that you will want
to search by in the archive, such as Gemini Program IDs or UT dates have a prescribed standard format, and the
archive will recognise these automatically for what they are. For example 8 digit numbers that refer to valid
dates such as 20140410
will be recognised automatically as a UT date, and strings such as GN-2010B-Q-123 will
be recognized automatically as Gemini Program IDs.
So, using these examples, using GN-2010B-Q-123
as the selection would select data from that program ID, and
using GN-2010B-Q-123/20140410
would select data from that program ID that was taken on that UT date.
It does not matter what order you specify these in.
In this way, we can build up URLs that refer to the search or search results of various searches on the archive. All these should be accessed with regular HTTP GET, there's no need to HTTP POST search parameters to the system, we provide a RESTful interface to the data.
If you add /object=foobar to the URL (it's a selection criteria just like the others) then it will only return files with that object name. The problem is that the object name in that sense is whatever they called it in the OT, and it's free form, so if they want to call it pauls_kewl_galaxy or candidate_12 they can do, and they do... So if you did the phase 2 and you know what you called it, then it works great, or if you're lucky and your object only has one name that can be spelled one way...
But what you almost certainly really want to do is to resolve the name and search by ra and dec. The API doesn't do that directly. When you do this in the web searchform, it's basically the searchform that's doing that for you, not the backend search engine. So if you resolve the name yourself (or call some service to do that), you can then search by ra and dec in decimal degrees as simply as,
/ra=40.6698792/dec=-0.0132889.
This will default to a 3 arcmin radius cone search. You can specify the cone search radius by adding /sr=10 (units are arcsec)
Here is a list of the available features on the system:
The regular search form for the archive. When you pass a selection criteria to the search form it will pre-populate the fields of the form with the selection criteria you have passed, and will execute that search. This is how it works when you do a search and then bookmark the resulting web page or copy the URL for future use etc - when you do the search, it simply places the search terms from the form into the URL and redirects your browser to that URL. If you re-visit this URL later, it will re-do the search and return you the filled out search form with the results. You can then of course modify the fields in the form and search again if you wish.
This works a lot like searchform, except it just sends you the simple results table. You don't get the search form itself, nor do you get the tabs to see associated calibrations or observation logs, you simply get the main search results table showing the results of the search as specified by the selection criteria you specified in the URL. You can also use ssummary (short summary) and lsummary (long summary) to get versions with fewer and more columns respectively.
This returns you a JSON representation of the files in the archive that match your selection criteria. You should almost always include /canonical as one of your selection criteria (see note below). The JSON returned contains a list of dictionaries, where each dictionary describes a file in the archive. The keys in the dictionary and their meanings are as follows:
Key | Description |
---|---|
name | The name of the dataset. This does not include the .bz2 if the file is compressed. |
filename | the filename of the file. Most of the files in the archive are bzip2 compressed, so this will likely end in .bz2 |
compressed | Boolean value saying if the file is compressed or not |
lastmod | last modification timestamp of the file that was ingested |
path | path to the file within the data store. This is normally empty on the archive |
mdready | Boolean value that says if the metadata of the file passes validation |
file_size | the size of the file as stored in the archive in bytes |
data_size | the size of the FITS data in the file. If the file is compressed, this represents the size of the uncompressed data. If the file is not compressed this will be the same as file_size |
size | the same as file_size. Deprecated |
file_md5 | the MD5 hash of the file as stored in the archive, in hexdigest format |
data_md5 | the MD5 hash of the FITS data in the file. If the file is compressed, this represents the MD5 hash of the uncompressed data. If the file is not compressed this will be the same as file_md5 |
md5 | the same as file_md5. Deprecated |
This returns you a JSON representation of your search results that includes all the information (and more) that you get on the summary and searchform results tables. You should almost always include /canonical as one of your selection criteria (see note below). As with jsonfilelist, you get a list of dictionaries, one dictionary per file. You get all the keys from jsonfilelist plus a lot more that contain details of the data, as follows:
Key | Description |
---|---|
telescope | The telescope where the data were taken |
instrument | The name of the instrument that took the data |
observation_class | Gemini Observation Class |
observation_type | Gemini Observation Type |
program_id | Gemini Program ID |
observation_id | Gemini Observation ID |
data_label | Gemini Data Label |
engineering | Boolean that says if this is engineering data |
science_verification | Boolean that says if this is science verification data |
procmode | Type of data for processed science/calibrations. This is one of 'sq' for science quality, 'ql' for quick look, or '' for raw. |
calibration_program | >Boolean that says if this is calibration program data |
requested_iq requested_cc requested_bg requested_wv |
The requested site quality percentiles - IQ = Image Quality, CC = Cloud Cover, BG = Sky Background, WV = Water Vapor. The value is the integer percentile, 100 implies "Any" |
raw_iq raw_cc raw_bg raw_wv |
The measured or estimated delivered site quality percentiles - IQ = Image Quality, CC = Cloud Cover, BG = Sky Background, WV = Water Vapor. The value is the integer percentile, 100 implies "Any" |
ut_datetime | UT Date and Time of the observation. ISO 8601 format |
local_time | Local Date and Time of the observation. ISO 8601 format |
ra dec |
RA and Dec in decimal degrees |
object | Object name as specified in the OT |
azimuth elevation |
Azimuth and Elevation in decimal degrees |
cass_rotator_pa | Position Angle of the Cassegrain Rotator, in decimal degrees |
airmass | Airmass of the observation |
adaptive_optics | Boolean to say if adaptive optics was in use |
laser_guide_star | Boolean to say if the laser guide star was in use |
wavefront_sensor | Says which wavefront sensor was in use |
qa_state | Quality State of the file - Pass, Usable, Fail, Undefined, CHECK |
mode | imaging, spectroscopy, LS (Longslit Spectroscopy), MOS (Multi Object Spectroscopy) or IFS (Integral Field Spectroscopy) |
spectroscopy | Boolean to say if this is spectrally dispersed data |
types | The AstroData types of this file. This is basically a tag list. |
release | The date on which the proprietary period of this data expires or expired |
reduction | The reduction state of the data. "RAW" for raw data. |
phot_standard | Boolean to say if a Gemini Standard star is in the field. |
gcal_lamp | Which GCAL lamp was being observed, if any |
exposure_time | Exposure Time in seconds. If the data are co-added, this is the total of all the coadds. |
detector_roi_setting | The requested Detector Region of Interest (subarray) name |
detector_config | A string summarizing the detector configuration |
camera | The camera in use |
detector_binning | A string representation of the detector binning in use |
wavelength_band | The wavelength band |
central_wavelength | Central Wavelength of spectroscopy data |
filter_name | Name of the Filter in use |
focal_plane_mask | Name of the focal plane mask (typically a slit mask for spectroscopy) in use |
pupil_mask | Name of the pupil plane mask (aka Apodizer for GPI) in use. Few instruments have these. |
disperser | Name of the disperser (usually a diffraction grating or grism) in use |
Two URLs are available to download data from the archive.
Firstly, the /download
URL uses the same selection criteria as detailed above, and will return a tar
archive containing the data that match the selection. The files inside the tar archive will be compressed with bzip2.
Also inside the tar archive is a README.txt
file with some details of the download, and an
md5sums.txt
file that contains the MD5 hash of each file in the download, so that file integrity can
easily be checked using the md5sum
program installed on most UNIX-like systems.
Secondly, the /file
URL accepts a single filename and will return you just that file, uncompressed
and ready to use.
Note that these URLs will provide public data without any need to authenticate. However, they do require
authentication in order to download proprietary data. In the case where you do not authenticate, the
/download
URL will send you any public files that match your selection criteria, and will include
in the README.txt file a list of files to which you were denied access.
To authenticate to these services, you need to supply your archive session cookie.
a web cookie is a small piece of data sent from a web
server to your browser that your browser stores and presents back to the web server.
When you log in to your account on the archive, the server sends you a cookie called
gemini_archive_session
. The value of this cookie identifies you to the system and verifies that
you have sucessfully logged in. In order to authenticate yourself to the download service, you need to send the
same cookie to the server with your request. If you want to do this from a script or the command line you will
first have to find the value of that cookie from your browser, and then include it in your download request headers.
Finding the value of the cookie depends on your browser. In Firefox, navigate to any archive page, then go to
Tools - Page Info, then inside Security there is a View Cookies button. In Safari first go to Preferences - Advanced
and turn on "Show Develop menu" then navigate to an archive page and in the Develop menu click "Show Web Inspector" then
select Cookies in the Resources tab. The cookie you need is called gemini_archive_session
.
The value of the cookie you need will be a long apparently random string of characters. You should be able to copy / paste it into to your script. Be sure to keep that cookie value confidential, anyone who has it can access your archive account. If you need to reset it (for example because it became known to someone else), then simply use a web browser to Log Out of the archive and the old value will no longer be accepted by the server. When you log back in to the archive, a new value will be sent to the browser where you logged in.
An example of how you might use this cookie in python would be the following (thx Erik Dennihy):
import requests cookies = dict(gemini_archive_session='my_archive_cookie') r = requests.get('https://archive.gemini.edu/download/my_archive_search....',cookies=cookies,stream=True) handle = open('data.tar','wb') for chunk in r.iter_content(chunk_size=256): handle.write(chunk)
Here are details of the selection criteria you can use. Multiple ones are separated by /s in the URL just like directories would be.
The present
and canonical
selection criteria are worth of special explanation. When the archive ingests
a new data file, that file is marked as both present (physically present in the file store) and canonical (it is the canonical version
of that file). If the file is modified at Gemini (for example the QA state is updated) and it is re-ingested into the archive, the archive
adds a new database record for the updated file but the entry for the old file is not deleted, it is simply marked as not present and
not canonical. This means that the archive has some history information of previous file versions. We use two separate flags for these
because we use the same software for internal data management, where old files may be scrubbed off disk to free up space for new ones - in
which case the database record will reflect that file version can be canonical but no longer present on disk.
The search form and data summaries assume canonical as one of the search criteria as you are almost always looking for the most recent (canonical) version of the file. The JSON APIs do not do this so as to allow you to deliberately look for information on previous versions of the file. Assuming you don't want to do that, you should always have /canonical as one your selection criteria with the JSON APIs.
Property | Format | Example | Notes |
---|---|---|---|
Single UT Date | YYYYMMDD | 20100401 | Note that Chilean observing nights span multiple UT dates. |
UT Date Range | yyyymmdd-YYYYMMDD | 20100401-20100420 | Inclusive |
Telescope | Gemini-South | or Gemini-North | |
Instrument | GNIRS | to get both GMOS-N and GMOS-S just use GMOS | |
Program ID | (GN|GS)-(Semester)-(Program Type)-(Program Number) | GN-2009B-Q-51 | If the program ID is non-standard, you can use e.g. progid=GN-GNIRS-Engineering |
Observation ID | (Program ID)-(Observation Number) | GN-2009B-Q-51-9 | If the observation ID is non-standard, you can use e.g. obsid=GN-GNIRS-Engineering-003 |
Data Label | (Observation ID)-(Dataset number) | GN-2009B-Q-51-28-001 | |
Observation Type | OBJECT | ||
Observation Class | science | ||
Filename | S20091028S0097.fits | Works with or without the .fits. For non-standard filenames, use filename=some_odd_filename.fits | |
Filename prefix | N201203 | Selects all files that have names beginning with... For non-standard filenames, use e.g. filepre=00AUG | |
QA state | Pass | Pass, Usable, Fail, NotFail, Win or Lucky - Win means Pass or Usable (ie not Fail or Undefined), NotFail is literally every state other than Fail, Lucky means Pass or Undefined. | |
Mode | imaging | imaging or spectroscopy. | |
Adaptive Optics | AO | AO means any adaptive optics in use. NOTAO means not AO. NGS means NGS AO, LGS means LGS AO | |
Science Quality | sq | Can be 'sq' for science quality or 'ql' for quick look | |
File curation | canonical | present, canonical, notpresent, notcanonical. See note above. | |
GMOS grating name | B600 | Selects only files using that GMOS grating | |
GMOS mask name | GN2009BC009-04 | Selects only files using that GMOS mask name | |
Binning | NxM | 2x2 | Pixel binning. Unbinned data shows as 1x1 |
Detector config | high | The words low, high, slow, fast will be interpreted as detector configurations - ie read gain setting or read speed setting | |
Detector ROI | FullFrame | The words FullFrame, CentralSpectrum, CentralStamp are interpreted as detector Region of Interest (ROI). FullFrame will include instruments that do not have configurable ROIs. | |
Calibration Type | ARC | This is only useful with the calibration systems | |
Reduction State | RAW | e.g. RAW, PREPARED, PROCESSED_BIAS, PROCESSED_FLAT. Most data in the archive is RAW data, but this is how you specify if you are looking for a certain type or reduced data. | |
Photometric Standard | photstandard | photstandard | Selects only data that have a Gemini photometric standard in the field of view. |
Filter | filter=(filtername) | filter=g | Select by filter name. Don't include any _G1234 type suffix. |
Twilight observation | Twilight | Use Twilight or NotTwilight to select or exclude twilight sky observations (eg twilight flat fields) | |
exposure time | exposure_time=(number or range) | exposure_time=59-61 | Select by exposure time. Due to floating point ambiguity with some of the instruments, it's best to use a range. |
coadds | coadds=(integer) | coadds=10 | Select by number of coadds. Integer exact match |
Telescope sky position | ra=(min-max)/dec=(min-max) | ra=123.5-123.6/dec=21.1-22.2. For negative Dec, dec=-22.3--24.5. | ICRS RA and Dec, decimal degrees, or HH:MM:SS.sss / [-]DD:MM:SS.sss |
Telescope position | az=(min-max)/el=(min-max)/crpa=(min-max) | az=155.0-155.5/el=88.0-89.0 | Telescope Azimuth, Elevation and Cassegrain Rotator Position Angle. Decimal degrees. You must supply two numbers (decimal point optional and negative sign) separated by a hyphen. Note that it does not do intelligent range wrapping, the smaller number must be first and there's currently no way to select everything in the 20 degree range between 350 and 10 degrees. Also it takes the numbers from the header literally, in the az and crpa systems these may not be limited to 0:360 or -180:180 due to the 540 degree range of these systems. |
This is a simple example in python that constructs a jsonfilelist URL, fetches the JSON document text into a string, parses that into a list of dictionaries, and then loops through showing some details of the files we found.
import urllib
import json
# Construct the URL. We'll use the jsonfilelist service
url = "https://archive.gemini.edu/jsonfilelist/"
# List the files for GN-2010B-Q-22 taken with GMOS-N on 2010-12-31
url += "canonical/GN-2010B-Q-22/GMOS-N/20101231"
# Open the URL and fetch the JSON document text into a string
u = urllib.urlopen(url)
jsondoc = u.read()
u.close()
# Decode the JSON
files = json.loads(jsondoc)
# This is a list of dictionaries each containing info about a file
for f in files:
print "Filename: %s" % f['filename']
print "-- file size: %d, data size: %d" % (f['file_size'], f['data_size'])
Gives the following output:
Filename: N20101231S0338.fits.bz2
-- file size: 21314852, data size: 57548160
Filename: N20101231S0339.fits.bz2
-- file size: 21130627, data size: 57548160
Filename: N20101231S0340.fits.bz2
-- file size: 21022898, data size: 57548160
Filename: N20101231S0341.fits.bz2
-- file size: 21154419, data size: 57548160
Filename: N20101231S0342.fits.bz2
-- file size: 21086479, data size: 57548160
Filename: N20101231S0343.fits.bz2
-- file size: 21018470, data size: 57548160
Similar to the example above, but getting more details about the observations using the jsonsummary API
import urllib
import json
# Construct the URL. We'll use the jsonsummary service
url = "https://archive.gemini.edu/jsonsummary/"
# List the OBJECT files taken with GMOS-N on 2010-12-31
url += "canonical/OBJECT/GMOS-N/20101231"
# Open the URL and fetch the JSON document text into a string
u = urllib.urlopen(url)
jsondoc = u.read()
u.close()
# Decode the JSON
files = json.loads(jsondoc)
# This is a list of dictionaries each containing info about a file
total_data_size = 0
print "%20s %22s %10s %8s %s" % ("Filename", "Data Label", "ObsClass",
"QA state", "Object Name")
for f in files:
total_data_size += f['data_size']
print "%20s %22s %10s %8s %s" % (f['name'], f['data_label'],
f['observation_class'], f['qa_state'],
f['object'])
print "Total data size: %d" % total_data_size
Gives the following output:
Filename Data Label ObsClass QA state Object Name
N20101231S0222.fits GN-2010B-Q-51-196-001 acq Usable CDFS
N20101231S0223.fits GN-2010B-Q-51-196-002 acq Usable CDFS
N20101231S0224.fits GN-2010B-Q-51-196-003 acq Usable CDFS
N20101231S0274.fits GN-2010B-Q-51-196-004 acq Pass CDFS
N20101231S0275.fits GN-2010B-Q-51-196-005 acq Pass CDFS
N20101231S0276.fits GN-2010B-Q-51-196-006 acq Pass CDFS
N20101231S0278.fits GN-2010B-Q-51-169-006 science Pass CDFS
N20101231S0333.fits GN-2010B-C-10-103-001 acq Pass SDSSJ0841+2042
N20101231S0334.fits GN-2010B-C-10-103-002 acq Pass SDSSJ0841+2042
N20101231S0335.fits GN-2010B-C-10-103-003 acq Pass SDSSJ0841+2042
N20101231S0337.fits GN-2010B-C-10-82-004 science Pass SDSSJ0841+2042
N20101231S0338.fits GN-2010B-Q-22-4-001 science Pass L5c02
N20101231S0339.fits GN-2010B-Q-22-4-002 science Pass L5c02
N20101231S0340.fits GN-2010B-Q-22-4-003 science Pass L5c02
N20101231S0341.fits GN-2010B-Q-22-4-004 science Pass L5c02
N20101231S0342.fits GN-2010B-Q-22-4-005 science Pass L5c02
N20101231S0343.fits GN-2010B-Q-22-4-006 science Pass L5c02
N20101231S0369.fits GN-2010B-Q-37-56-001 acq Pass Cluster F - East
N20101231S0370.fits GN-2010B-Q-37-56-002 acq Pass Cluster F - East
N20101231S0371.fits GN-2010B-Q-37-56-003 acq Pass Cluster F - East
N20101231S0373.fits GN-2010B-Q-37-17-008 science Pass Cluster F - East
N20101231S0374.fits GN-2010B-Q-37-17-009 science Pass Cluster F - East
N20101231S0377.fits GN-2010B-Q-37-17-012 science Pass Cluster F - East
N20101231S0378.fits GN-2010B-Q-64-1-001 science Pass PTF10cwr
N20101231S0379.fits GN-2010B-Q-64-1-002 science Pass PTF10cwr
N20101231S0380.fits GN-2010B-Q-64-1-003 science Pass PTF10cwr
N20101231S0381.fits GN-2010B-Q-64-1-004 science Pass PTF10cwr
N20101231S0382.fits GN-2010B-Q-64-1-005 science Pass PTF10cwr
N20101231S0383.fits GN-2010B-Q-64-1-006 science Pass PTF10cwr
N20101231S0397.fits GN-CAL20101231-11-001 partnerCal Pass PG1323-086
N20101231S0398.fits GN-CAL20101231-11-002 partnerCal Pass PG1323-086
N20101231S0399.fits GN-CAL20101231-11-003 partnerCal Pass PG1323-086
N20101231S0400.fits GN-CAL20101231-11-004 partnerCal Pass PG1323-086
N20101231S0401.fits GN-CAL20101231-11-005 partnerCal Pass PG1323-086
N20101231S0402.fits GN-CAL20101231-11-006 partnerCal Pass PG1323-086
N20101231S0403.fits GN-CAL20101231-11-007 partnerCal Pass PG1323-086
N20101231S0404.fits GN-CAL20101231-11-008 partnerCal Pass PG1323-086
N20101231S0405.fits GN-CAL20101231-11-009 partnerCal Pass PG1323-086
N20101231S0406.fits GN-CAL20101231-11-010 partnerCal Pass PG1323-086
N20101231S0407.fits GN-CAL20101231-11-011 partnerCal Pass PG1323-086
N20101231S0408.fits GN-CAL20101231-11-012 partnerCal Pass PG1323-086
N20101231S0409.fits GN-CAL20101231-11-013 partnerCal Pass PG1323-086
N20101231S0410.fits GN-CAL20101231-11-014 partnerCal Pass PG1323-086
N20101231S0411.fits GN-CAL20101231-11-015 partnerCal Pass PG1323-086
N20101231S0412.fits GN-CAL20101231-11-016 partnerCal Pass PG1323-086
N20101231S0413.fits GN-CAL20101231-12-001 dayCal Pass Twilight
N20101231S0414.fits GN-CAL20101231-12-008 dayCal Pass Twilight
N20101231S0415.fits GN-CAL20101231-12-009 dayCal Pass Twilight
N20101231S0416.fits GN-CAL20101231-12-010 dayCal Pass Twilight
N20101231S0417.fits GN-CAL20101231-12-011 dayCal Pass Twilight
N20101231S0418.fits GN-CAL20101231-12-012 dayCal Pass Twilight
N20101231S0419.fits GN-CAL20101231-12-013 dayCal Pass Twilight
N20101231S0420.fits GN-CAL20101231-12-014 dayCal Pass Twilight
N20101231S0421.fits GN-CAL20101231-12-015 dayCal Pass Twilight
N20101231S0422.fits GN-CAL20101231-12-016 dayCal Pass Twilight
N20101231S0423.fits GN-CAL20101231-12-017 dayCal Pass Twilight
N20101231S0424.fits GN-CAL20101231-12-018 dayCal Pass Twilight
N20101231S0425.fits GN-CAL20101231-12-019 dayCal Pass Twilight
N20101231S0426.fits GN-CAL20101231-12-020 dayCal Pass Twilight
N20101231S0427.fits GN-CAL20101231-12-021 dayCal Pass Twilight
N20101231S0428.fits GN-CAL20101231-12-022 dayCal Pass Twilight
Total data size: 1856059200