Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV export to NX Line (and other export improvements) #1763

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Feb 28, 2025

CSV export for NX Line

image

The following columns are exported, in order, when the corresponding datasets are present in the NXdata group: values, abscissas, errors, aux1, aux1 errors, aux2, aux2 errors, ... For instance, with mock dataset /nexus_entry/spectrum_with_aux, the CSV file looks like this:

image

Note that the column names are NeXus labels (potentially with long name and unit) rather than dataset names. I will try to solve this in a subsequent PR.

Additional improvements

Built-in, provider-agnostic CSV and JSON exports

CSV and JSON exports are now built into the viewer, which means that consumer applications (VS Code, myHDF5, etc.) no longer have to implement them themselves. All that's needed is for providers to include a very basic implementation of the getExportURL method internally.

Thanks to this, the four provider demos (h5grove, h5wasm, HSDS and mock) now all support exporting to CSV/JSON and do so in a consistent way — i.e. the CSV is always generated locally (it used to be generated remotely in the h5grove demo).

Export compound scalar/array datasets to CSV

image

image

Respect transposition when exporting 2D dataset to CSV in Matrix vis

image

image

Summary

The tables below summarise export capabilities out of the box before/after this PR for every visualization/format/provider combination. "Out of the box" means without any consumer-provided export implementation.

  • ⚠️ only the signal is exported
  • 🚩 transposition is not respected
BEFORE
Visualization Format H5GroveProvider H5WasmProvider HsdsProvider MockProvider
Raw JSON
Matrix CSV ✅ 🚩 ✅ 🚩
NPY ✅ 🚩
Compound CSV
Line CSV
NPY
NX Line CSV ⚠️
NPY ⚠️
Heatmap NPY ✅ 🚩
TIFF ✅ 🚩
NX Heatmap NPY ⚠️ 🚩
TIFF ⚠️ 🚩
AFTER
Visualization Format H5GroveProvider H5WasmProvider HsdsProvider MockProvider
Raw JSON
Matrix CSV
NPY ✅ 🚩
Compound CSV
Line CSV
NPY
NX Line CSV
NPY ⚠️
Heatmap NPY ✅ 🚩
TIFF ✅ 🚩
NX Heatmap NPY ⚠️ 🚩
TIFF ⚠️ 🚩

Implementation

Challenges and early attempts

Providers shouldn't know about NeXus, so I discarded the following solutions quickly:

  • implementing a specific endpoint in h5grove;
  • sending axis/auxiliary/errors values to the provider (e.g. via getExportURL) along with the main dataset's value;
  • sending the NXdata metadata and letting the provider get the values again internally.

👉 NeXus exports have to be generated in the viewer (well, at least the export generation functions have to be prepared in the viewer...)

Currently, providers all accept a getExportURL prop, which enables a lot of different scenarios: it allows consumers to generate the exports themselves, to generate one-time download links, etc. It's also what allows exports to work in the VS Code extension, as the webview has to generate the export payload and send it to the extension host so it can be saved to disk.

Being able to implement NPY and TIFF exports in the viewer to get rid of getExportURL entirely would have made things a lot easier. After all, we already retrieve the data from the providers, so in a way, they've already done their job. I would then have provided another kind of API to allow consumers of the App component to customise the exports and hook onto the exports process — providers would no longer have had anything to do with exports... Of course I discarded this option as well, as generating NPY and TIFF in JS is not trivial, and it's not really something we want to maintain long-term.

So there wasn't really any other option than to find a way to mix both viewer-side generation and provider-side generation, while continuing to provide flexibility for consumers.

I tried generating export entries both in the providers and in the viewers and then merging them, but this made it really hard to comprehend which exports would end up available in each visualization.

Proposed solution

In the end, I settled on a solution that involves passing an "exporter" function to getExportURL instead of the raw value array.

// BEFORE
public getExportURL(..., value: Value): ExportURL;

// AFTER
public getExportURL(..., builtInExporter?: () => string): ExportURL;

I introduce a hook called useExportEntries that I call from the mapped components. Internally, it calls getExportURL, passing the exporter for the given format if any:

// `MappedHeatmapVis`: no viewer-side exporter
const exportEntries = useExportEntries(['tiff', 'npy'], dataset, selection);

// `MappedRawVis`: simple JSON exporter
const exportEntries = useExportEntries(['json'], dataset, undefined, {
  json: () => JSON.stringify(value, null, 2),
});

// `MappedLineVis`: CSV exporter
const exportEntries = useExportEntries(['npy', 'csv'], dataset, selection, {
  csv: () => generateCsv(...),
});

In the providers, I then make sure to return an ExportURL function that calls builtInExporter if it's defined:

public override getExportURL(...): ExportURL | undefined {
    const url = this._getExportURL?.(...); // consumer can still override (e.g. for VS Code)

    if (url) {
      return url;
    }

    if (builtInExporter) {
      return async () => new Blob([builtInExporter()]);
    }
    
    // Provider-side exports go here — e.g.:
    // return new URL(...);

@@ -66,6 +66,15 @@ export interface Dims {

export type IgnoreValue = (val: number) => boolean;

export interface ExportEntry {
format: string;
url: ExportURL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url used to be option; now it isn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming ExportEntry to ExportFunction or Exporter ?

Copy link
Contributor Author

@axelboc axelboc Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExportEntry was already defined in ExportMenu before. It's meant to type an entry in the export menu, so I think the naming is good as is.

BuiltInExporter is the term I used for functions that generate the export payload in the mapped components. The provider then generates an ExportURL async function that invokes this exporter, and the mapped component puts that ExportURL in an ExportEntry to pass to the toolbar.

@axelboc axelboc requested a review from loichuder February 28, 2025 16:38
@axelboc
Copy link
Contributor Author

axelboc commented Feb 28, 2025

/approve

Copy link
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks quite good!

There are a few things that bother me a bit:

  • The implicit coupling between the formats and the record of built-in exporters passed to useExportEntries
  • Depending on where you look in the code, it can be difficult to know the priority between getExportURL, _getExportURL and the built-in exporter.

But all and all, the fact that this PR addresses several long-standing and difficult issues at once is a testimony of its relevance.

@@ -66,6 +66,15 @@ export interface Dims {

export type IgnoreValue = (val: number) => boolean;

export interface ExportEntry {
format: string;
url: ExportURL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming ExportEntry to ExportFunction or Exporter ?

: new URL(`data:text/plain,${encodeURIComponent(finalCsv)}`);
};
}
return async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you not check that format is csv here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no because the mock provider relies entirely on the viewer providing a builtInExporter. It doesn't care which format the exporter is for; it just invokes it.

To make that clearer, I've renamed the local variable csv (used to store the result of calling builtInExporter) to payload.

@axelboc axelboc changed the title Add CSV export to _NX Line_ (and other export improvements) Add CSV export to NX Line (and other export improvements) Mar 10, 2025
@axelboc
Copy link
Contributor Author

axelboc commented Mar 10, 2025

I've updated the documentation of the getExportURL prop, if you don't mind having a look again @loichuder: b886444


The implicit coupling between the formats and the record of built-in exporters passed to useExportEntries

Agreed, not perfect, but the types ensure that the exporter record cannot contain formats that are not in the formats array. I think it's better than to try to force the two parameters into a single Record, and to use null or false values as a way to identify formats that may be handled by the provider.

Depending on where you look in the code, it can be difficult to know the priority between getExportURL, _getExportURL and the built-in exporter.

Totally agree as well, it gets quite confusing. This actually prompted me to update the documentation of the getExportURL prop, so thanks! 😁

Eventually I'm hoping to at least remove the builtInExporter parameter from getExportURL. It's only done this way to support the VS Code use case for now. A better way would be to pass an onExportPayload callback to the viewer somehow to get direct access to the built-in export payload when it's generated.

However, simply adding an onExportPayload prop to the App component and drilling it down to all the mapped components was out of the question. It's part of the broader goal of making the viewer more customisable/extendable. I have a few ideas in mind...

@loichuder loichuder self-requested a review March 10, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants