Suggest the correct name when no key matches in the dataset#9943
Suggest the correct name when no key matches in the dataset#9943Illviljan merged 9 commits intopydata:mainfrom
Conversation
| except KeyError as e: | ||
| message = f"No variable named {key!r}. Variables on the dataset include {shorten_list_repr(list(self.variables.keys()), max_items=10)}" | ||
|
|
||
| best_guess = utils.did_you_mean(key, self.variables.keys()) |
There was a problem hiding this comment.
Amazing idea. I would print the best guess first, and then any others so that's it's easy to see
There was a problem hiding this comment.
Maybe should just remove the "Variables on the dataset include ..." ? They try to do the same thing I think.
There was a problem hiding this comment.
Yeah you could sort the whole list by similarity and then print that (truncated as above)
There was a problem hiding this comment.
Now it prioritizes best_guess. If best_guess is empty you could be working in the wrong dataset, so it's still nice to get some kind of clue which dataset you're using.
|
Big +1 on this; I'd also enjoy this as a user. Is there any concern that some processes might be running |
We could add an LRU cache over |
Though I'm thinking that someone could query whether different keys exist; i.e. Overall I say let's go ahead and we can reassess if we hear reports of slowdowns. Folks can use |
I thought about this case as well, my initial idea was to just use |
|
Extremely cool! Would definitely be up for doing more like this. |
* main: (79 commits) fix mean for datetime-like using the respective time resolution unit (#9977) Add `time_unit` argument to `CFTimeIndex.to_datetimeindex` (#9965) remove gate and add a test (#9958) Remove repetitive that (replace it with the) (#9994) add shxarray to the xarray ecosystem list (#9995) Add `shards` to `valid_encodings` to enable sharded Zarr writing (#9948) Use flox for grouped first, last (#9986) Bump the actions group with 2 updates (#9989) Fix some typing (#9988) Remove unnecessary a article (#9980) Fix test_doc_example on big-endian systems (#9949) fix weighted polyfit for arrays with more than 2 dimensions (#9974) Use zarr-fixture to prevent thread leakage errors (#9967) remove dask-expr from CI runs, fix related tests (#9971) Update time coding tests to assert exact equality (#9961) cast type to PDDatetimeUnitOptions (#9963) Suggest the correct name when no key matches in the dataset (#9943) fix upstream dev issues (#9953) Relax nanosecond datetime restriction in CF time decoding (#9618) Remove outdated quantile test. (#9945) ...
* main: (79 commits) fix mean for datetime-like using the respective time resolution unit (#9977) Add `time_unit` argument to `CFTimeIndex.to_datetimeindex` (#9965) remove gate and add a test (#9958) Remove repetitive that (replace it with the) (#9994) add shxarray to the xarray ecosystem list (#9995) Add `shards` to `valid_encodings` to enable sharded Zarr writing (#9948) Use flox for grouped first, last (#9986) Bump the actions group with 2 updates (#9989) Fix some typing (#9988) Remove unnecessary a article (#9980) Fix test_doc_example on big-endian systems (#9949) fix weighted polyfit for arrays with more than 2 dimensions (#9974) Use zarr-fixture to prevent thread leakage errors (#9967) remove dask-expr from CI runs, fix related tests (#9971) Update time coding tests to assert exact equality (#9961) cast type to PDDatetimeUnitOptions (#9963) Suggest the correct name when no key matches in the dataset (#9943) fix upstream dev issues (#9953) Relax nanosecond datetime restriction in CF time decoding (#9618) Remove outdated quantile test. (#9945) ...
* main: (79 commits) fix mean for datetime-like using the respective time resolution unit (pydata#9977) Add `time_unit` argument to `CFTimeIndex.to_datetimeindex` (pydata#9965) remove gate and add a test (pydata#9958) Remove repetitive that (replace it with the) (pydata#9994) add shxarray to the xarray ecosystem list (pydata#9995) Add `shards` to `valid_encodings` to enable sharded Zarr writing (pydata#9948) Use flox for grouped first, last (pydata#9986) Bump the actions group with 2 updates (pydata#9989) Fix some typing (pydata#9988) Remove unnecessary a article (pydata#9980) Fix test_doc_example on big-endian systems (pydata#9949) fix weighted polyfit for arrays with more than 2 dimensions (pydata#9974) Use zarr-fixture to prevent thread leakage errors (pydata#9967) remove dask-expr from CI runs, fix related tests (pydata#9971) Update time coding tests to assert exact equality (pydata#9961) cast type to PDDatetimeUnitOptions (pydata#9963) Suggest the correct name when no key matches in the dataset (pydata#9943) fix upstream dev issues (pydata#9953) Relax nanosecond datetime restriction in CF time decoding (pydata#9618) Remove outdated quantile test. (pydata#9945) ...
I found the error when I make a typo on the dataset keys not so helpful. The truncated list of variables hides all the ones that I wanted to see. Instead, add a fuzzy matching function that does the typical "Did you mean X".
whats-new.rstFurther reading:
python/cpython#16850
matplotlib/matplotlib#28115
https://en.wikipedia.org/wiki/Levenshtein_distance