Skip to content

sxs.load#

One of the main goals of the sxs package is to provide a simple and consistent interface to the data provided by the SXS Collaboration. The sxs.load function is the primary way to access the data, ensuring that users do not need to understand the underlying data formats, while also allowing the collaboration to change those formats without affecting the user interface.

Load an SXS-format dataset, optionally downloading and caching

The dataset can be the full catalog of all SXS simulations, or metadata, horizon data, or a waveform from an individual simulation.

Parameters:

Name Type Description Default
location (str, Path)

A local file path, URL, SXS path, or SXS path pattern. See Notes below.

str
download (None, bool)

If this is True and the data is recognized as starting with an SXS ID but cannot be found in the cache, the data will be downloaded automatically. If this is None (the default) and an SXS configuration file is found with a download key, that value will be used. If this is False, any configuration will be ignored, and no files will be downloaded. Note that if this is True but cache is None, cache will automatically be switched to True.

None
cache (None, bool)

The cache directory is determined by sxs.sxs_directory, and any downloads will be stored in that directory. If this is None (the default) and download is True it will be set to True. If this is False, any configuration will be ignored and any files will be downloaded to a temporary directory that will be deleted when python exits.

None
progress (None, bool)

If True, full file names will be shown and, if a nonzero Content-Length header is returned, a progress bar will be shown during any downloads. Default is None, which just reads the configuration value with read_config("download_progress", True), defaulting to True.

None
truepath (None, str)

If the file is downloaded, this allows the output path to be overridden, rather than selected automatically. The output path will be stored in truepath relative to the cache directory.

None
Keyword Parameters

All remaining parameters are passed to the load function responsible for the requested data.

See Also

sxs.sxs_directory : Locate configuration and cache files sxs.write_config : Set defaults for download and cache parameters

Notes

This function can load data in various ways.

1) If truepath is set, and points to a file that exists — whether absolute, relative to the current working directory, or relative to the cache directory — that file will be loaded.

2) Given an absolute or relative path to a local file, it just loads the data directly.

3) If location is one of "simulations" or "dataframe" (or the deprecated "catalog"), the corresponding catalog data is loaded. The "simulations" option returns a dictionary mapping SXS IDs to raw metadata, while "dataframe" returns a pandas DataFrame indexed by SXS ID, but the metadata is processed into more consistent form.

4) If location is a valid URL including the scheme (https://, or http://), it will be downloaded regardless of the download parameter and optionally cached.

5) Given an SXS simulation specification — like "SXS:BBH:1234", "SXS:BBH:1234v2.0", "SXS:BBH:1234/Lev5", or "SXS:BBH:1234v2.0/Lev5" — the simulation is loaded as an sxs.Simulation object.

6) Given an SXS path — like "SXS:BBH:1234/Lev5/h_Extrapolated_N2.h5" — the file is located in the catalog for details. This function then looks in the local cache directory and loads it if present.

7) If the SXS path is not found in the cache directory and download is set to True (when this function is called, or in the sxs config file) this function attempts to download the data. Note that download must be explicitly set in this case, or a ValueError will be raised.

If the file is downloaded, it will be stored in the cache according to the location, unless truepath is set as noted above, in which case it is stored there. Note that downloading is switched off by default, but if it is switched on (set to True), the cache is also switched on by default.

Source code in sxs/handlers.py
def load(location, download=None, cache=None, progress=None, truepath=None, **kwargs):
    """Load an SXS-format dataset, optionally downloading and caching

    The dataset can be the full catalog of all SXS simulations, or
    metadata, horizon data, or a waveform from an individual
    simulation.

    Parameters
    ----------
    location : {str, pathlib.Path}
        A local file path, URL, SXS path, or SXS path pattern.  See
        Notes below.
    download : {None, bool}, optional
        If this is True and the data is recognized as starting with an
        SXS ID but cannot be found in the cache, the data will be
        downloaded automatically.  If this is None (the default) and
        an SXS configuration file is found with a `download` key, that
        value will be used.  If this is False, any configuration will
        be ignored, and no files will be downloaded.  Note that if
        this is True but `cache` is None, `cache` will automatically
        be switched to True.
    cache : {None, bool}, optional
        The cache directory is determined by `sxs.sxs_directory`, and
        any downloads will be stored in that directory.  If this is
        None (the default) and `download` is True it will be set to
        True.  If this is False, any configuration will be ignored and
        any files will be downloaded to a temporary directory that
        will be deleted when python exits.
    progress : {None, bool}, optional
        If True, full file names will be shown and, if a nonzero
        Content-Length header is returned, a progress bar will be
        shown during any downloads.  Default is None, which just reads
        the configuration value with `read_config("download_progress",
        True)`, defaulting to True.
    truepath : {None, str}, optional
        If the file is downloaded, this allows the output path to be
        overridden, rather than selected automatically.  The output
        path will be stored in `truepath` relative to the cache
        directory.

    Keyword Parameters
    ------------------
    All remaining parameters are passed to the `load` function
    responsible for the requested data.

    See Also
    --------
    sxs.sxs_directory : Locate configuration and cache files
    sxs.write_config : Set defaults for `download` and `cache`
        parameters

    Notes
    -----
    This function can load data in various ways.

      1) If `truepath` is set, and points to a file that exists —
         whether absolute, relative to the current working directory,
         or relative to the cache directory — that file will be
         loaded.

      2) Given an absolute or relative path to a local file, it just
         loads the data directly.

      3) If `location` is one of "simulations" or "dataframe" (or the
         deprecated "catalog"), the corresponding catalog data is
         loaded.  The "simulations" option returns a dictionary mapping
         SXS IDs to raw metadata, while "dataframe" returns a pandas
         DataFrame indexed by SXS ID, but the metadata is processed
         into more consistent form.

      4) If `location` is a valid URL including the scheme (https://,
         or http://), it will be downloaded regardless of the
         `download` parameter and optionally cached.

      5) Given an SXS simulation specification — like "SXS:BBH:1234",
         "SXS:BBH:1234v2.0", "SXS:BBH:1234/Lev5", or
         "SXS:BBH:1234v2.0/Lev5" — the simulation is loaded as an
         `sxs.Simulation` object.

      6) Given an SXS path — like
         "SXS:BBH:1234/Lev5/h_Extrapolated_N2.h5" — the file is
         located in the catalog for details.  This function then looks
         in the local cache directory and loads it if present.

      7) If the SXS path is not found in the cache directory and
         `download` is set to `True` (when this function is called, or
         in the sxs config file) this function attempts to download
         the data.  Note that `download` must be explicitly set in
         this case, or a ValueError will be raised.

    If the file is downloaded, it will be stored in the cache
    according to the `location`, unless `truepath` is set as noted
    above, in which case it is stored there.  Note that downloading is
    switched off by default, but if it is switched on (set to True),
    the cache is also switched on by default.

    """
    import pathlib
    import urllib.request
    from . import Simulations, Simulation, read_config, sxs_directory, Catalog
    from .utilities import url, download_file, sxs_path_to_system_path, sxs_id_version_lev_exact_re, lev_path_re, sxs_identifier_re

    # Note: `download` and/or `cache` may still be `None` after this
    if download is None:
        download = read_config("download", True)
    if cache is None:
        cache = read_config("cache")
    if progress is None:
        progress = read_config("download_progress", True)

    # We set the cache path to be persistent if `cache` is `True` or `None`.  Thus,
    # we test for whether or not `cache` literally *is* `False`, rather than just
    # if it casts to `False`.
    cache_path = sxs_directory("cache", persistent=(cache is not False))

    path = pathlib.Path(sxs_path_to_system_path(location)).expanduser()  # .resolve()
    h5_path = path.with_suffix('.h5')
    json_path = path.with_suffix('.json')

    if not path.exists():
        if truepath and (testpath := pathlib.Path(sxs_path_to_system_path(truepath)).expanduser()).exists():
            path = testpath

        elif truepath and (testpath := cache_path / sxs_path_to_system_path(truepath)).exists():
            path = testpath

        elif _safe_resolve_exists(path):
            pass  # We already have the correct path

        elif location == "catalog":
            return Catalog.load(download=download)

        elif location in ["simulations", "dataframe"]:
            return sxscatalog.load(location, download=download, **kwargs)

        elif _safe_resolve_exists(h5_path):
            path = h5_path

        elif _safe_resolve_exists(json_path):
            path = json_path

        elif "scheme" in url.parse(location):
            m = url.parse(location)
            truepath = truepath or urllib.request.url2pathname(f"{m['host']}/{m['port']}/{m['resource']}")
            path = cache_path / sxs_path_to_system_path(truepath)
            if not path.resolve().exists():
                if download is False:  # Again, we want literal False, not casting to False
                    raise ValueError(f"File '{truepath}' not found in cache, but downloading turned off")
                download_file(location, path, progress=progress)

        elif sxs_id_version_lev_exact_re.match(location):
            return Simulation(location, download=download, cache=cache, progress=progress, **kwargs)

        else:
            # Try to find an appropriate SXS file in the simulations
            simulations = Simulations.load(
                download=download,
                local=kwargs.get("local", False),
                annex_dir=kwargs.get("annex_dir", None)
            )
            # If we chop off any "/LevN", and it's in the simulations, load it as a simulation
            if lev_path_re.sub("", location) in simulations:
                return Simulation(
                    location, download=download, cache=cache, progress=progress, **kwargs
                )

            # Now we look for a file in `simulations`
            if sxs_identifier_re.match(location):
                split_location = sxs_identifier_re.split(location)
                # Currently we can only handle unversioned SXS ID files
                if split_location[5] is None:
                    sxs_id = split_location[1]
                    file = split_location[-1].lstrip("/").lstrip("\\")
                    if sxs_id in simulations:
                        simulation = simulations[sxs_id]
                        if "files" in simulation:
                            file_info = simulation["files"][file]
                            location = file_info["link"]
                            truepath = truepath or (pathlib.Path(sxs_id) / file)
                            return load(
                                location, truepath=truepath,
                                download=download, cache=cache, progress=progress, **kwargs
                            )

            # Try to find an appropriate SXS file in the catalog
            catalog = Catalog.load(download=download)
            selections = catalog.select_files(location)
            if not selections:
                raise ValueError(f"Nothing found matching '{location}'")
            if progress:
                print("Found the following files to load from the SXS catalog:")
                print("    " + "\n    ".join(selections))
            paths = []
            for sxs_path, file_info in selections.items():
                truepath = truepath or sxs_path_to_system_path(file_info.get("truepath", sxs_path))
                path = cache_path / sxs_path_to_system_path(truepath)
                if not path.resolve().exists():
                    download_url = file_info["download"]
                    download_file(download_url, path, progress=progress)
                paths.append(path)
            loaded = [load(path, download=False, progress=progress, **kwargs) for path in paths]
            if len(loaded) == 1:
                return loaded[0]
            else:
                return loaded

    loader = sxs_loader(path, kwargs.get("group", None))

    loaded = loader(path, **kwargs)
    try:
        loaded.__file__ = str(path)
    except:
        pass
    return loaded