Is Differentiable Programming Actually Necessary? Can’t you just train the neural networks separately?
October 4 2022 in Scientific ML | Tags: | Author: Christopher Rackauckas
Is differentiable programming actually necessary, or can you just train the neural network in isolation against data and then stick the trained neural network into the simulation? We looked at this problem in detail in our new manuscript titled Capturing missing physics in climate model parameterizations using neural differential equations.
The goal of this project is to understand temperature mixing in large eddy simulations, essentially columns of water in the ocean. I.e., can we take a “true” 3D Navier-Stokes and use that to build very quick and accurate models for how heat flows up and down in the water?
This isn’t a new problem: climate scientists have been using approximations with convective adjustments for years. The SciML question is whether integrating machine learning into these adjustments can improve it. For example, here’s the convective adjustment equation that is derived in the paper, but in many papers before it:
To do scientific machine learning effectively here, the neural network should capture the terms dropped off by the convective-adjustment derivation. Hence tada our ML-embedded equation:
In other words, the neural network should capture the residual of the unknown parts of the heat flux. But here’s a question: should you train the neural network to match the heat flux at each time, or the temperature at each time? Either way, it’s capturing the residual, like as imaged below, but it’s really the training process that is different.
This is key because in order to match the heat flux, you only need to make a neural network match a dataset. That’s easy, people have done this for decades now. Making a neural network match the temperature requires solving and differentiating the equation. What’s better?
In every category, the approach of training on time series (i.e. using differentiable programming) gave a much better fit than just fitting the derivatives. Orders of magnitude: the plots are log scale!
Oh and finally, did SciML outperform the other techniques? Of course it did, but you knew that would be the case before reading. However, I think the key takeway is that even when you can train a neural network in isolation, differentiable programming / simulation is valuable.
For more discussion, see the original Twitter thread. Thanks!