Skip to content

Non-fancy scattergl to work with dates #1021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 24, 2016
Merged

Conversation

monfera
Copy link
Contributor

@monfera monfera commented Oct 7, 2016

Performance improvement - it avoids the slower fancy scattergl that used to be triggered by the presence of the Date X axis type.
Fixes issue #413

@monfera
Copy link
Contributor Author

monfera commented Oct 11, 2016

I added tests, including an implementation-dependent way of checking which gl renderer (fancy vs fast) is being used; interested in a review.

@@ -118,7 +118,7 @@ proto.handlePick = function(pickResult) {
trace: this,
dataCoord: pickResult.dataCoord,
traceCoord: [
this.pickXData[index],
Number(this.pickXData[index]), // non-fancy scattergl has Dates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monfera while pure-js users of plotly.js can use Date objects, plot.ly (and plotly.js users that may want to serialize their plots) use date strings via Lib.dateTime2ms (here is what regular cartesian does). I realize that this will be slower, but can we at least support it for compatibility with the rest of our dates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alex, good point. Currently, scatter works with ISO formatted dates e.g. 2016-10-13 but I'll need to add a conversion from string to date for scattergl. It'll be done upfront. The line you refer to above probably won't need to change as by the time the tooltip gets data it's already in the standard JS Date format, but the front-end part needs some logic. To avoid having to loop through a possibly large number of array elements, I'm thinking about just checking the type of the first element of the array; if it's Date it's fine; if it's a string, then convert all elements to Date. Is it OK in your opinion and also @etpinard ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me a little wary, but I can't think of a reason users would mix Dates and date strings (at least within one trace - definitely different traces should be checked independently). The first element may not be enough though, in case it's null or something, but the first valid element seems OK. Especially since this trace type is specifically built for performance, I guess we can impose some sane restrictions on the users.

I haven't really looked at the architecture here, but if you're already converting the values you can just go straight to milliseconds, ignore Date objects altogether. Also notice that it's not exactly ISO 8601 we use, I don't know its name but I think of it as SQL date format - ie maximally it's 2016-10-13 12:34:56.789123 but we're permissive about letting you truncate it after any part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex I'd probably reuse the date conversion routine you pointed to above, as if at all possible, I'd shy away from rolling a new string -> Date parser. Yes I agree with using epoch milliseconds as a general principle esp. on greenfield projects, although in plotly it looks like it's not typically done. There's already conversion by default, as WebGL buffers cover float arrays, so it does get converted to epoch ms already.

@etpinard etpinard added this to the v1.19.0 milestone Oct 13, 2016
Copy link
Contributor

@etpinard etpinard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monfera Looking good!

Let's see by how much Lib.isDateTime will slow down scattergl in fast mode.

@@ -4,6 +4,7 @@ var Plots = require('@src/plots/plots');
var Lib = require('@src/lib');
var Color = require('@src/components/color');
var tinycolor = require('tinycolor2');
var hasWebGLSupport = require('../assets/has_webgl_support');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep axes_test.js for test cases in relation to src/plots/cartesian/axes.js only.

@monfera can you move your new test cases to one of the gl2d_ suite of your choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test are now separated from axes_test.

expect(gd._fullData[0].type).toBe('scattergl');
expect(gd._fullData[0]._module.basePlotModule.name).toBe('gl2d');

// one way of check which renderer - fancy vs not - we're using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find 👍

@@ -279,7 +279,8 @@ proto.updateFast = function(options) {
yy = y[i];

// check for isNaN is faster but doesn't skip over nulls
if(!isNumeric(xx) || !isNumeric(yy)) continue;
if(!isNumeric(yy)) continue;
if(!isNumeric(xx) && !(xx instanceof Date)) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lib.isDateTime is what you need here.

I'm afraid it will be significantly slower than xx instanceof Date though.

Copy link
Contributor Author

@monfera monfera Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard @alexcjohnson I'd just like to clarify something here. Testing with Lib.isDateTime is useful here as long as we actually do something with the broader set of options, in particular, string representations of dates. So there's got to be logic to check the type and dispatch to either straight-through processing (no conversion, i.e. Date objects that are handled now) or conversion from the string representations of dates to Date objects. Should I go ahead and do these things? As mentioned above, I'd keep the fast path fast by testing the first legitimate value only, and if it's suitable for straight-through processing (STP) then the entire array would be assumed to be so. (As @alexcjohnson mentioned I'd check for the first non-null value).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised that epoch milliseconds are accepted as date data right now, I don't think our svg cartesian axes accept that, do they? But anyway I think your plan sounds good. Lib.dateTime2ms does short-circuit Date objects, but you could certainly shave off a little extra overhead by not going through it at all, and I think it's fine to assume (at least for these performance-oriented types) that all valid values in a given array have the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson sorry I conflated two things in the spur of the moment, acceptance of numeric values and the rendering of dates. If I pass on epoch milliseconds it'll render fine but it won't render the numbers as dates; it'll simply show the epoch milliseconds as numbers. I'll revise my comment.

@@ -135,7 +135,7 @@ proto.handlePick = function(pickResult) {

// check if trace is fancy
proto.isFancy = function(options) {
if(this.scene.xaxis.type !== 'linear') return true;
if(this.scene.xaxis.type !== 'linear' && this.scene.xaxis.type !== 'date') return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@monfera
Copy link
Contributor Author

monfera commented Oct 13, 2016

Note to self: make it work without the valueOf() calls

            'xaxis': {
                'range': [
                    new Date(2016, 0, 1, 2, 0, 0, 0).valueOf(),
                    new Date(2016, 0, 1, 2, 0, 0, 11).valueOf()
                ],
                'autorange': false
            }

@alexcjohnson
Copy link
Collaborator

Note to self: make it work without the valueOf() calls

I'm actually working right now on (for cartesian axes) having ranges (and a few other things that currently work like ranges) use date strings rather than milliseconds. You may want to hold off on that one until mine is done so we don't duplicate efforts.

@monfera
Copy link
Contributor Author

monfera commented Oct 13, 2016

@alexcjohnson also the Date type? I don't know if it's desired, just a thought.

@alexcjohnson
Copy link
Collaborator

also the Date type?

Yes. Pretty sure the way I'm doing it (by hooking into the same machinery we use to convert date data) will make that work automatically for cartesian, but in any event I'll make sure it works.

positions[ptr++] = yy;
if(!fastType) {
xx = Lib.dateTime2ms(xx);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard @alexcjohnson as you see I ended up not skimping on tests, i.e. not just checking the first value, as we already had a check with relatively fast tests (number or Date). It's only when this fails is when the slower path is taken. If you think I can optimize away the isDateTime part, i.e. just assume that once we hit one of those, the rest of them will be like it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might worth exploring with placing the instanceOf Date vs ms vs Lib.isDateTime check as part of the autotype routine here.

Maybe, we could add a Lib.isDateTime alternative that would track what kind of date time the x are input at the defaults step and store that datetime type in e.g. fullLayout.xaxis._datetimeType ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I think the autotype routine does it right. It samples not-more than 1000 pts (that takes ~ 3 ms) to determine the axis type.

In the cartesian code, with that axis type, the data-to-calcData routines (i.e. ax.d2c) simply assume a lone axis type for all items in x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard thanks for testing and making a decision over whether this approach is fast enough or not. The current code in master already incurred the expense of two isNumeric calls per loop execution and admittedly I haven't attempted to make it faster than that. I agree that we can take this Date rework as an opportunity to speed up the loop a bit. For now, it can only be done probabilistically (for future options, I added a few lines of comment), as you said. In my previous commit I didn't shoot for such speedup, in part because it would be based on a heuristic, and in part because, if we want to increase speed, we might be able to do other things:

  • if we want to speed up plotly.js time, then permitting a typed array input (= no array copy), or at least letting the user specify the value types can speed things up
  • if we want to speed it up when called via Python, then either the Python interface could directly convert the date strings into JavaScript Date objects (or better, directly populate a typed array), or, the Python -> JS WebSockets interface can directly deserialize into a JS typed array (this of course assumes that Python sends epoch milliseconds).

Back to present time: the fast path can be provided if all values can be assumed to be number (or Date) rather than a majority, so I ended up not tampering with the autotype logic (it could be more DRY but wanted to keep loop bodies simple, and direct, for speed).

@etpinard
Copy link
Contributor

@monfera nice job 🎉

Using:

var x = fill(N, (_, i) => (new Date(2016, 8, i % 31)));
var y = fill(N, () => Math.random());

console.time('gl2d dates')
Plotly.plot(Tabs.fresh(), [{
  type: 'scattergl',
  mode: 'markers',
  x: x,
  y: y,
  opacity: 0.6
}], {
 title: `scattergl date coordinates with ${N} pts`
})
.then(() => console.timeEnd('gl2d dates'));

function fill(N, func) {
  var out = new Array(N);
  for(var i = 0; i < N; i++) {
    out[i] = func(null, i);
  }
  return out;
}

and N = 1e6, Plotly.plot clocks in at ~ 2300 ms. At N = 2e5, we get ~ 800ms. That's roughly a factor of 2 greater than with numeric x/y coordinates.

@etpinard
Copy link
Contributor

Looks like your last commit (3c79068) adds about 500ms at N = 1e6 - which is pretty significant.

It's not obvious to me why. Investigating.

@etpinard
Copy link
Contributor

Ignore my last comment, looks like that ~ 500ms at N = 1e6 is part of the run-to-run spread.

@monfera
Copy link
Contributor Author

monfera commented Oct 20, 2016

@etpinard the last two commits made the perf discussion "outdated" in the eyes of github; these commits and this is my answer to your suggestions.

Copy link
Contributor

@etpinard etpinard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking great to me. 💃

Looks like we're now able to plot 1e6 dates in 1.5 to 2 seconds . Nice job 🐎

// (for the future, typed arrays can guarantee it, and Date values can be done with
// representing the epoch milliseconds in a typed array;
// also, perhaps the Python / R interfaces take care of String->Date conversions
// such that there's no need to check for string dates in plotly.js)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.

such that there's no need to check for string dates in plotly.js

That won't happen any time soon unfortunately ...

}
}

return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nicely done.

@etpinard etpinard mentioned this pull request Oct 20, 2016
@etpinard etpinard merged commit e1d440d into plotly:master Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants