I guess I’m saying the problem with testing or linting a notebook is not technical.
Writing a tool that suppresses output cells, infers global parameter blocks, etc., is trivial. Writing a linter with enough configurability to account for notebook presentation styling might be harder, but still straightforward.
Creating the raw tools that can do it is the easy part.
The hard part is that writing code for re-use and testability and with reasonable low-effort best practices at separating concerns and having modularity — all that is antithetical to the whole purpose of the notebook.
So why bother contorting the testing apparatus to accomodate testing something that is created with throw-away design principles from the start?
As soon as you start using the design principles from the beginning of the first prototype or first data exploration plot, then the value of putting them in notebook format goes away, and you’re better off using testing tools that were meant for testing proper modules, than to shoe-horn notebooks into testing with notebook-specific testing tools.
I’d also argue that the benefits of starting out from a craftsmanship-first approach from the beginning, even in exploratory data analysis, has compounding benefits and you quickly reach a state where the extra craftsmanship leads to less time spent debugging, backtracking to understand a plotting error or diagnostic bug, and faster convergence on successful output artifacts, whether it’s a report on model accuracy or production-ready code.
>I’d also argue that the benefits of starting out from a craftsmanship-first approach from the beginning, even in exploratory data analysis, has compounding benefits and you quickly reach a state where the extra craftsmanship leads to less time spent debugging, backtracking to understand a plotting error or diagnostic bug, and faster convergence on successful output artifacts, whether it’s a report on model accuracy or production-ready code
Assuredly, my point is that none of these are incompatible with a notebook-like environment. You can have well crafted, well designed, good code in a notebook, and get the advantages of both craftsmanship and presentation.
Good tooling allows you to focus on the craftsmanship.
>The hard part is that writing code for re-use and testability and with reasonable low-effort best practices at separating concerns and having modularity
These are all hard normally, its just that we have tooling that makes it somewhat less difficult. To be reductive, you're saying "well crafted software is difficult", which I agree with, "and so since we don't have the tooling to make well crafted software as easy in a notebook environment as in the environment we've used for 20-50 years now, we should not use the notebook", which I disagree with since you can also just say "and so we should create the tooling to mature the notebook environment".
Basically, to answer your question:
>So why bother contorting the testing apparatus to accomodate testing something that is created with throw-away design principles from the start?
Don't write notebooks with throw away design principles from the start. Treat them like mature parts of a workflow and in all likelyhood, they'll perform like one. Use good tooling, good design, and good craftsmanship when writing your notebooks (much as you would with any other piece of code you wrote) and they won't be created with "throw-away design principles".
Yes, if you treat notebooks like an unstructured second class citizen you'll get bad results, but that's true of any tool. So don't do that.
It seems to me like this is just a debate over what words mean.
Essentially when you say the phrase “treat notebooks like first-class citizens” you’re baking in all kinds of statements about the design-level thinking that should be used for good craftsmanship when coding in the notebook.
This still won’t address the intrinsic mixing of concerns (especially units of display), but overall it roughly means that “treating notebooks as first class citizens” translates to “treat the notebook like a thin execution environment / IDE, but develop code in exactly the way you would in more standard settings.”
To me this falls flat because that’s not why people want to use a notebook. Generally they want to use it because it’s superficially easier to jumble all concerns into a single context and not think about coupling or separation, and just disregard testing and other best practices.
The notebook is optimized for this way of working, and I’m trying to call into question the underlying claim that it’s ever worthwhile to write code that way if there’s even slightest need for re-use or reproducibility.
Separately, a huge bunch of this sort of notebook usage specifically is for expressly presenting the notebook, almost like interactive slides, to other people (in which case the goals are completely antithetical to good software practices, implying implementation units should be factored out if the priority is presentation).
Basically I’m saying you’re sweeping a bunch of stuff under the rug by lumping testing, linting, tooling, and software craftsmanship all under the term “first-class citizen.”
The other reality is that notebooks aren’t first-class units, at least in Python. You can’t import a notebook like a module, unless you do a lossy export to a .py file (in which case, why weren’t you just writing the .py file to begin with and only putting units of display in the notebook that imports the .py file?) — not to mention that you’d need custom tooling instead of mature tooling to apply linting, testing, packaging, etc., like we discussed above.
>To me this falls flat because that’s not why people want to use a notebook. Generally they want to use it because it’s superficially easier to jumble all concerns into a single context and not think about coupling or separation, and just disregard testing and other best practices.
But this is not unique to notebooks. You can very easily do the exact same things with raw python files, its just that in the ecosystem you work in, raw text files are treated more maturely.
I find notebooks very useful as a form of main method. You don't generally go importing your main methods anyway.
You shouldn't go developing all your code in notebooks, much as you shouldn't go developing all of your code in main methods that don't have classes. Shared infrastructure should be factored out no matter what.
Your complaints appear to come down to "people can apply bad software development practices in notebooks, therefore notebooks shouldn't be used". And my point is that no, you can just not apply the bad software practices, and that solves the problem too.
> "You can very easily do the exact same things with raw python files, its just that in the ecosystem you work in, raw text files are treated more maturely."
This is non-sequitur to the whole discussion. You can write bad code in any tool. That has no bearing on this.
Instead we should ask, "what does it require to write good code in a given tool."
In plain source files, we know the answer, with lots of theory of design, decoupling, architectural rules, refactoring etc. As well as mature tools for code review, viewing diffs easily, automated testing.
In notebooks, the answer is that you have to jump through a lot of hoops to write things in a non-notebook-way -- that is, specifically in a way where you factor things out into the text files anyway -- if you want those good patterns.
For example, you mention:
> "I find notebooks very useful as a form of main method. You don't generally go importing your main methods anyway."
I totally agree. Viewed this way, the notebook-based "main" method is just a driver of other code. Meaning, you don't do much work in the notebook at all. You factor things out into other modules, etc., and then put as little as is needed to drive the code into the notebook.
Which is what I have been saying all along. It reveals the notebook to be an anti-pattern (because that driver code doesn't need or benefit from any aspects of the notebook environment that are expressly designed to act like a messy linear script of ad hoc implementation units mixed with ad hoc display units).
I would say your suggested way of using notebooks is exactly an example of what reveals that notebooks aren't very good for the intended use cases (like people defining global variables for experiment parameters at the top, and then "running an experiment" becomes changing those values and re-running the cells of the notebook).
This is among the most commonly advertised and praised ways of using a notebook, so it's not like some extremely rare situation that only arises in a place with bad software practices. It's practically the intended use of notebooks.
That's why I'm saying they are self-defeating. Once you take an approach where you factor things out and leave the notebook to be just a simplistic driver script, it's immediately clear that driver scripts can just be scripts, not notebooks, and don't benefit from all the intended ad-hoc-ery that is a first-class, intended workflow of the notebook design.
>In notebooks, the answer is that you have to jump through a lot of hoops to write things in a non-notebook-way -- that is, specifically in a way where you factor things out into the text files anyway -- if you want those good patterns.
I guess I still consider this a "notebook-y" way. Its just good practice, instead of bad practice. But still notebooky.
You seem to be using notbooky to mean sloppy and disorganized, whereas I mean it as interactive and literate. Those need not be correlated, and I think that "driver scripts" still benefit a lot from the interativity and literateness
> "You seem to be using notbooky to mean sloppy and disorganized"
Not exactly. I'm using "notebooky" to mean whatever the prominent, advertised, praised and recommended usage patterns and workflows are for a large body of notebooks and from prominent presentations using the notebook as the lingua franca, especially in circles where the notebook is claimed to be central to "reproducible science" or where the notebook is described like a software equivalent of a "lab notebook."
The way that the notebook community, from academics and prominent leaders, to people who give talks this way, on down to data science practitioners, recommends using notebooks seems to inherently result in what you call "sloppy and disorganized" code. That code is the intended type of workflow, which is why I am trying to distill out principles for why it's arguably not a good idea. (Meaning why the intended way to use notebooks is self-defeating.)
Writing a tool that suppresses output cells, infers global parameter blocks, etc., is trivial. Writing a linter with enough configurability to account for notebook presentation styling might be harder, but still straightforward.
Creating the raw tools that can do it is the easy part.
The hard part is that writing code for re-use and testability and with reasonable low-effort best practices at separating concerns and having modularity — all that is antithetical to the whole purpose of the notebook.
So why bother contorting the testing apparatus to accomodate testing something that is created with throw-away design principles from the start?
As soon as you start using the design principles from the beginning of the first prototype or first data exploration plot, then the value of putting them in notebook format goes away, and you’re better off using testing tools that were meant for testing proper modules, than to shoe-horn notebooks into testing with notebook-specific testing tools.
I’d also argue that the benefits of starting out from a craftsmanship-first approach from the beginning, even in exploratory data analysis, has compounding benefits and you quickly reach a state where the extra craftsmanship leads to less time spent debugging, backtracking to understand a plotting error or diagnostic bug, and faster convergence on successful output artifacts, whether it’s a report on model accuracy or production-ready code.