Ananth, I enjoyed reading your post. After many years in various roles across the software engineering profession I find that most "Interactive Development Environments" are trending to include the features necessary for the segment of the community they target. That said there remains a fairly big gap in what is defined as "Production" and that is probably the first term to attempt to clarify as many different solution characteristics define different "Production" systems. Solution characteristics determine the people, process and technologies that should be applied to each Production system. Just like if you build a tent to stay in on camping trip you use a different tool set than when building a multi-story building and ever more different when building a super-collider. Solution characteristics such as reliability, maintainability, security, deliverability, scalability, performance, cost and customer value usually define the type of "Production" environment and the SDLC (rad, dev, test, stage, prod) that is applicable to the business goal. Most SDLC governance is focused on ensuring efficiency of the overall SDLC process and therefore focuses on balance people skills and capabilities with process with technology. In summary if notebook development meets the SDLC goal for solution characteristics that is the most efficient then that is probably ok. If there is a strong governance for those IDE requirements that notebooks don't provide then they shouldn't be used. We use many IDE's across different teams data science predominantly use Notebooks. It is when you begin to allow any IDE to directly interact with business production asset that you begin to cross the governance line and that is the core issue.
Notebook is great for experimentation and adhoc work.
Overall great points.
Have a look at https://github.com/mage-ai/mage-ai
Ananth, I enjoyed reading your post. After many years in various roles across the software engineering profession I find that most "Interactive Development Environments" are trending to include the features necessary for the segment of the community they target. That said there remains a fairly big gap in what is defined as "Production" and that is probably the first term to attempt to clarify as many different solution characteristics define different "Production" systems. Solution characteristics determine the people, process and technologies that should be applied to each Production system. Just like if you build a tent to stay in on camping trip you use a different tool set than when building a multi-story building and ever more different when building a super-collider. Solution characteristics such as reliability, maintainability, security, deliverability, scalability, performance, cost and customer value usually define the type of "Production" environment and the SDLC (rad, dev, test, stage, prod) that is applicable to the business goal. Most SDLC governance is focused on ensuring efficiency of the overall SDLC process and therefore focuses on balance people skills and capabilities with process with technology. In summary if notebook development meets the SDLC goal for solution characteristics that is the most efficient then that is probably ok. If there is a strong governance for those IDE requirements that notebooks don't provide then they shouldn't be used. We use many IDE's across different teams data science predominantly use Notebooks. It is when you begin to allow any IDE to directly interact with business production asset that you begin to cross the governance line and that is the core issue.
This just isn’t true:
“All the legacy ETL tools offer UI-driven ETL solutions that lack version control, a review process, or software development methodologies.”
I can name two legacy ETL tools from major vendors (SSIS and Ab Initio) that do exactly that.