XMM-13hr master catalogue¶

This notebook presents the merge of the various pristine catalogues to produce HELP mater catalogue on XMM-13hr.

from herschelhelp_internal import git_version
print("This notebook was run with herschelhelp_internal version: \n{}".format(git_version()))

This notebook was run with herschelhelp_internal version: 
04829ed (Thu Nov 2 16:57:19 2017 +0000)

%matplotlib inline
#%config InlineBackend.figure_format = 'svg'

import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))

import os
import time

from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.table import Column, Table
import numpy as np
from pymoc import MOC

from herschelhelp_internal.masterlist import merge_catalogues, nb_merge_dist_plot, specz_merge
from herschelhelp_internal.utils import coords_to_hpidx, ebv, gen_help_id, inMoc

TMP_DIR = os.environ.get('TMP_DIR', "./data_tmp")
OUT_DIR = os.environ.get('OUT_DIR', "./data")
SUFFIX = os.environ.get('SUFFIX', time.strftime("_%Y%m%d"))

try:
    os.makedirs(OUT_DIR)
except FileExistsError:
    pass

I - Reading the prepared pristine catalogues¶

uhs = Table.read("{}/UHS.fits".format(TMP_DIR))
legacy = Table.read("{}/LegacySurvey.fits".format(TMP_DIR))

II - Merging tables¶

We first merge the optical catalogues and then add the infrared ones: WFC, DXS, SpARCS, HSC, PS1, SERVS, SWIRE.

At every step, we look at the distribution of the distances separating the sources from one catalogue to the other (within a maximum radius) to determine the best cross-matching radius.

UHS¶

master_catalogue = uhs
master_catalogue['uhs_ra'].name = 'ra'
master_catalogue['uhs_dec'].name = 'dec'

Add Legacy Survey¶

nb_merge_dist_plot(
    SkyCoord(master_catalogue['ra'], master_catalogue['dec']),
    SkyCoord(legacy['legacy_ra'], legacy['legacy_dec'])
)

# Given the graph above, we use 0.8 arc-second radius
master_catalogue = merge_catalogues(master_catalogue, legacy, "legacy_ra", "legacy_dec", radius=0.8*u.arcsec)

Cleaning¶

When we merge the catalogues, astropy masks the non-existent values (e.g. when a row comes only from a catalogue and has no counterparts in the other, the columns from the latest are masked for that row). We indicate to use NaN for masked values for floats columns, False for flag columns and -1 for ID columns.

for col in master_catalogue.colnames:
    if "m_" in col or "merr_" in col or "f_" in col or "ferr_" in col or "stellarity" in col:
        master_catalogue[col].fill_value = np.nan
    elif "flag" in col:
        master_catalogue[col].fill_value = 0
    elif "id" in col:
        master_catalogue[col].fill_value = -1
        
master_catalogue = master_catalogue.filled()

master_catalogue[:10].show_in_notebook()

III - Merging flags and stellarity¶

Each pristine catalogue contains a flag indicating if the source was associated to a another nearby source that was removed during the cleaning process. We merge these flags in a single one.

flag_cleaned_columns = [column for column in master_catalogue.colnames
                        if 'flag_cleaned' in column]

flag_column = np.zeros(len(master_catalogue), dtype=bool)
for column in flag_cleaned_columns:
    flag_column |= master_catalogue[column]
    
master_catalogue.add_column(Column(data=flag_column, name="flag_cleaned"))
master_catalogue.remove_columns(flag_cleaned_columns)

Each pristine catalogue contains a flag indicating the probability of a source being a Gaia object (0: not a Gaia object, 1: possibly, 2: probably, 3: definitely). We merge these flags taking the highest value.

flag_gaia_columns = [column for column in master_catalogue.colnames
                     if 'flag_gaia' in column]

master_catalogue.add_column(Column(
    data=np.max([master_catalogue[column] for column in flag_gaia_columns], axis=0),
    name="flag_gaia"
))
master_catalogue.remove_columns(flag_gaia_columns)

Each prisitine catalogue may contain one or several stellarity columns indicating the probability (0 to 1) of each source being a star. We merge these columns taking the highest value. We keep trace of the origin of the stellarity.

stellarity_columns = [column for column in master_catalogue.colnames
                      if 'stellarity' in column]

print(", ".join(stellarity_columns))

uhs_stellarity, legacy_stellarity

# We create an masked array with all the stellarities and get the maximum value, as well as its
# origin.  Some sources may not have an associated stellarity.
stellarity_array = np.array([master_catalogue[column] for column in stellarity_columns])
stellarity_array = np.ma.masked_array(stellarity_array, np.isnan(stellarity_array))

max_stellarity = np.max(stellarity_array, axis=0)
max_stellarity.fill_value = np.nan

no_stellarity_mask = max_stellarity.mask

master_catalogue.add_column(Column(data=max_stellarity.filled(), name="stellarity"))

stellarity_origin = np.full(len(master_catalogue), "NO_INFORMATION", dtype="S20")
stellarity_origin[~no_stellarity_mask] = np.array(stellarity_columns)[np.argmax(stellarity_array, axis=0)[~no_stellarity_mask]]

master_catalogue.add_column(Column(data=stellarity_origin, name="stellarity_origin"))

master_catalogue.remove_columns(stellarity_columns)

IV - Adding E(B-V) column¶

master_catalogue.add_column(
    ebv(master_catalogue['ra'], master_catalogue['dec'])
)

V - Adding HELP unique identifiers and field columns¶

master_catalogue.add_column(Column(gen_help_id(master_catalogue['ra'], master_catalogue['dec']),
                                   name="help_id"))
master_catalogue.add_column(Column(np.full(len(master_catalogue), "XMM-13hr", dtype='<U18'),
                                   name="field"))

# Check that the HELP Ids are unique
if len(master_catalogue) != len(np.unique(master_catalogue['help_id'])):
    print("The HELP IDs are not unique!!!")
else:
    print("OK!")

OK!

VI - Cross-matching with spec-z catalogue¶

There is currently no specz available

#specz =  Table.read("../../dmu23/dmu23_SA13/data/SA13-specz-v2.1.fits")

#nb_merge_dist_plot(
#    SkyCoord(master_catalogue['ra'], master_catalogue['dec']),
#    SkyCoord(specz['ra'] * u.deg, specz['dec'] * u.deg)
#)

#master_catalogue = specz_merge(master_catalogue, specz, radius=1. * u.arcsec)

VII - Choosing between multiple values for the same filter¶

There are no duplicate filers

VIII.a Wavelength domain coverage¶

We add a binary flag_optnir_obs indicating that a source was observed in a given wavelength domain:

1 for observation in optical;
2 for observation in near-infrared;
4 for observation in mid-infrared (IRAC).

It's an integer binary flag, so a source observed both in optical and near-infrared by not in mid-infrared would have this flag at 1 + 2 = 3.

Note 1: The observation flag is based on the creation of multi-order coverage maps from the catalogues, this may not be accurate, especially on the edges of the coverage.

Note 2: Being on the observation coverage does not mean having fluxes in that wavelength domain. For sources observed in one domain but having no flux in it, one must take into consideration de different depths in the catalogue we are using.

uhs_moc = MOC(filename="../../dmu0/dmu0_UHS/data/UHS-DR1_XMM-13hr_MOC.fits")
legacy_moc = MOC(filename="../../dmu0/dmu0_LegacySurvey/data/LegacySurvey-dr4_XMM-13hr_MOC.fits")

was_observed_optical = inMoc(
    master_catalogue['ra'], master_catalogue['dec'],
    legacy_moc) 

was_observed_nir = inMoc(
    master_catalogue['ra'], master_catalogue['dec'],
    uhs_moc
)

was_observed_mir = np.zeros(len(master_catalogue), dtype=bool)

master_catalogue.add_column(
    Column(
        1 * was_observed_optical + 2 * was_observed_nir + 4 * was_observed_mir,
        name="flag_optnir_obs")
)

VIII.b Wavelength domain detection¶

We add a binary flag_optnir_det indicating that a source was detected in a given wavelength domain:

1 for detection in optical;
2 for detection in near-infrared;
4 for detection in mid-infrared (IRAC).

It's an integer binary flag, so a source detected both in optical and near-infrared by not in mid-infrared would have this flag at 1 + 2 = 3.

Note 1: We use the total flux columns to know if the source has flux, in some catalogues, we may have aperture flux and no total flux.

To get rid of artefacts (chip edges, star flares, etc.) we consider that a source is detected in one wavelength domain when it has a flux value in at least two bands. That means that good sources will be excluded from this flag when they are on the coverage of only one band.

# SpARCS is a catalogue of sources detected in r (with fluxes measured at 
# this prior position in the other bands).  Thus, we are only using the r
# CFHT band.
# Check to use catalogue flags from HSC and PanSTARRS.
nb_optical_flux = (
    1 * ~np.isnan(master_catalogue['f_bass_g']) +
    1 * ~np.isnan(master_catalogue['f_bass_r']) +
    1 * ~np.isnan(master_catalogue['f_bass_z'])
)

nb_nir_flux = (
    1 * ~np.isnan(master_catalogue['f_wfcam_j']) 
)

nb_mir_flux = np.zeros(len(master_catalogue), dtype=float)

has_optical_flux = nb_optical_flux >= 2
has_nir_flux = nb_nir_flux >= 2
has_mir_flux = nb_mir_flux >= 2

master_catalogue.add_column(
    Column(
        1 * has_optical_flux + 2 * has_nir_flux + 4 * has_mir_flux,
        name="flag_optnir_det")
)

IX - Cross-identification table¶

We are producing a table associating to each HELP identifier, the identifiers of the sources in the pristine catalogues. This can be used to easily get additional information from them.

There is no SDSS on XMM-13hr.

id_names = []
for col in master_catalogue.colnames:
    if '_id' in col:
        id_names += [col]
    if '_intid' in col:
        id_names += [col]
        
print(id_names)

['uhs_id', 'legacy_id', 'help_id']

master_catalogue[id_names].write(
    "{}/master_list_cross_ident_xmm-13hr{}.fits".format(OUT_DIR, SUFFIX), overwrite=True)
id_names.remove('help_id')
master_catalogue.remove_columns(id_names)

X - Adding HEALPix index¶

We are adding a column with a HEALPix index at order 13 associated with each source.

master_catalogue.add_column(Column(
    data=coords_to_hpidx(master_catalogue['ra'], master_catalogue['dec'], order=13),
    name="hp_idx"
))

XI - Saving the catalogue¶

columns = ["help_id", "field", "ra", "dec", "hp_idx"]

bands = [column[5:] for column in master_catalogue.colnames if 'f_ap' in column]
for band in bands:
    columns += ["f_ap_{}".format(band), "ferr_ap_{}".format(band),
                "m_ap_{}".format(band), "merr_ap_{}".format(band),
                "f_{}".format(band), "ferr_{}".format(band),
                "m_{}".format(band), "merr_{}".format(band),
                "flag_{}".format(band)]    
    
columns += ["stellarity", "stellarity_origin", "flag_cleaned", "flag_merged", "flag_gaia", "flag_optnir_obs", 
            "flag_optnir_det", "ebv"]

# We check for columns in the master catalogue that we will not save to disk.
print("Missing columns: {}".format(set(master_catalogue.colnames) - set(columns)))

Missing columns: set()

master_catalogue[columns].write("{}/master_catalogue_xmm-13hr{}.fits".format(OUT_DIR, SUFFIX), overwrite=True)

idx	uhs_id	ra	dec	uhs_stellarity	m_wfcam_j	merr_wfcam_j	m_ap_wfcam_j	merr_ap_wfcam_j	f_wfcam_j	ferr_wfcam_j	flag_wfcam_j	f_ap_wfcam_j	ferr_ap_wfcam_j	uhs_flag_cleaned	uhs_flag_gaia	flag_merged	legacy_id	f_bass_g	ferr_bass_g	f_ap_bass_g	ferr_ap_bass_g	f_bass_r	ferr_bass_r	f_ap_bass_r	ferr_ap_bass_r	f_bass_z	ferr_bass_z	f_ap_bass_z	ferr_ap_bass_z	legacy_stellarity	m_bass_g	merr_bass_g	flag_bass_g	m_ap_bass_g	merr_ap_bass_g	m_bass_r	merr_bass_r	flag_bass_r	m_ap_bass_r	merr_ap_bass_r	m_bass_z	merr_bass_z	flag_bass_z	m_ap_bass_z	merr_ap_bass_z	legacy_flag_cleaned	legacy_flag_gaia
		deg	deg															uJy	uJy			uJy	uJy			uJy	uJy
0	459962597806	203.634257096	37.6128091648	0.993865	9.52459	0.000225584	10.774	0.000403111	562555.0	116.882	False	177993.0	66.085	False	3	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
1	459759173072	203.561302458	37.8588342513	0.993865	10.4768	0.00035526	10.8724	0.0004206	234041.0	76.5798	False	162576.0	62.9799	False	3	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
2	459794088554	203.891486635	37.7029211578	0.993865	10.102	0.000294178	11.185	0.000474269	330527.0	89.5556	False	121901.0	53.2487	False	2	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
3	459794088563	203.891334451	37.7022947104	0.993865	11.2589	0.000502479	11.4634	0.000540217	113881.0	52.7039	False	94324.5	46.9319	False	0	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
4	459962597575	203.523072396	37.6367612303	0.993865	9.19147	0.000192638	11.4839	0.000556324	764563.0	135.653	False	92564.5	47.4294	False	3	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
5	460022925494	203.7808129	38.3064108558	0.993865	11.9473	0.000729989	11.695	0.000612743	60405.2	40.6131	False	76208.6	43.0089	False	2	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
6	460022925486	203.781228794	38.3061050376	0.993865	10.7917	0.000445594	11.8857	0.000670084	175108.0	71.8656	False	63930.8	39.4562	False	2	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
7	459804401241	203.520581489	38.4084631756	0.993865	12.2818	0.000857591	12.2877	0.000793781	44388.7	35.0614	False	44149.7	32.2778	False	3	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
8	459794088709	203.714055763	37.7719889488	0.993865	11.7202	0.000693575	12.3384	0.000825777	74460.1	47.5656	False	42135.0	32.0466	False	2	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0
9	459804401268	203.606207668	38.4192543636	0.993865	12.386	0.000907918	12.3914	0.000832182	40327.9	33.7232	False	40128.3	30.757	False	3	False	-1	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	nan	nan	False	nan	nan	False	0