Chapter V: Improving Sequential Latent Variable Models with Autoregressive
5.2 Method
We start by motivating the use of autoregressive flows to reduce temporal dependen- cies, thereby simplifying dynamics. We then show how this simple technique can be incorporated within sequential latent variable models.
Motivation: Temporal Redundancy Reduction
Normalizing flows, while often utilized for density estimation, originated from data pre-processing techniques (Friedman, 1987; HyvΓ€rinen and Oja, 2000; S. S. Chen and Gopinath, 2001), which remove dependencies between dimensions, i.e.,redundancy reduction(Barlow et al., 1961). Removing dependencies simplifies the resulting probability distribution by restricting variation to individual dimensions, generally simplifying downstream tasks (Laparra, Camps-Valls, and Malo, 2011). Normalizing flows improve upon these procedures using flexible, non-linear functions (Deco and Brauer, 1995; Dinh, Krueger, and Bengio, 2015). While flows have been used for spatial decorrelation (Agrawal and Dukkipati, 2016; Winkler et al., 2019) and with other models (Huang et al., 2017), this capability remains under-explored.
Data sequences contain dependencies in time, for example, in the redundancy of video pixels (Figure 5.3), which are often highly predictable. These dependencies define thedynamics of the data, with the degree of dependence quantified by the multi-information,
I (x1:π) =Γ
π‘
H (xπ‘) β H (x1:π), (5.1) whereH denotes entropy. Normalizing flows are capable of reducing redundancy, arriving at a new sequence,y1:π, withI (y1:π) β€ I (x1:π), thereby reducing temporal dependencies and simplifying dynamics. Thus, rather than fit the data distribution directly, we can first simplify the dynamics by pre-processing sequences with a
β5.0 β2.5 0.0 2.5 5.0 x2
0.0 0.2 0.4 0.6
p(x2|x1)
(a)
β5.0 β2.5 0.0 2.5 5.0 x2
0.0 0.2 0.4 0.6
p(x2)
(b)
β5.0 β2.5 0.0 2.5 5.0 y2
0.0 0.2 0.4 0.6
p(y2|y1)
(c)
β5.0 β2.5 0.0 2.5 5.0 y2
0.0 0.2 0.4 0.6
p(y2)
(d)
Figure 5.1: Redundancy Reduction. (a) Conditional densities for π(π₯2|π₯1). (b) The marginal, π(π₯2)differs from the conditional densities, thus,I (π₯1;π₯2) > 0. (c) In the normalized space ofπ¦, the corresponding densities π(π¦2|π¦1)are identical. (d) The marginal π(π¦2) is identical to the conditionals, soI (π¦1;π¦2) =0. Thus, in this case, a conditional affine transform removed the dependencies.
normalizing flow, then fit the resulting sequence. Through training, the flow will attempt to remove redundancies to meet the modeling capacity of the higher-level dynamics model,ππ(y1:π).
Example To visualize this procedure for an affine autoregressive flow, consider a one-dimensional input over two time steps,π₯1andπ₯2. For each value ofπ₯1, there is a conditional density, π(π₯2|π₯1). Assume that these densities take one of two forms, which are identical but shifted and scaled, shown in Figure 5.1. Transforming these densities through their conditional means,π2=E[π₯2|π₯1], and standard deviations, π2 =E
(π₯2β π2)2|π₯11/2
, creates a normalized space, π¦2 = (π₯2βπ2)/π2, where the conditional densities are identical. In this space, the multi-information is
I (π¦1;π¦2) =Eπ(π¦1, π¦2) [logπ(π¦2|π¦1) βlogπ(π¦2)] =0,
whereasI (π₯1;π₯2) > 0.Indeed, if π(π₯π‘|π₯<π‘)is linear-Gaussian, inverting an affine autoregressive flow exactly corresponds to Cholesky whitening (Pourahmadi, 2011;
Kingma, Salimans, et al., 2016), removing all linear dependencies.
In the example above,π2andπ2act as aframe of referencefor estimatingπ₯2. More generally, in the special case where Β΅π(x<π‘) = xπ‘β1 and Ο(x<π‘) = 1, we recover yπ‘ =xπ‘βxπ‘β1= Ξxπ‘. Modeling finite differences (orgeneralized coordinates(Friston, 2008)) is a well-established technique, (see, e.g., (Chua et al., 2018; Kumar et al., 2020)), which is generalized by affine autoregressive flows.
Modeling Dynamics with Autoregressive Flows
We now discuss utilizing autoregressive flows to improve sequence modeling, highlighting use cases for modeling dynamics in the data and latent spaces.
zt
zt 1
zt 2 zt+1 zt+2
xt xt+1 xt+2
xt 2 xt 1
yt 1 yt yt+1
yt 2 yt+2
β’
β’
Autoregressive
<latexit sha1_base64="ZKOsX4nA+FCRGgg3ahsu1NSlNPA=">AAACjnicdVHLSgMxFE3Hd321unDhJlgEV2VGRd2IVTcuK7QqtEPJpLdtMJkMyR2xDPM1bvWD/BsztQtr9ULgcO7j3JwbJVJY9P3PkrewuLS8srpWXt/Y3NquVHcerE4NhzbXUpuniFmQIoY2CpTwlBhgKpLwGD3fFvnHFzBW6LiF4wRCxYaxGAjO0FG9yl4X4RWNyq5T1AaGBqwVL5D3KjW/7k+CzoNgCmpkGs1etdTr9jVPFcTIJbO2E/gJhhkzKLiEvNxNLSSMP7MhdByMmQIbZpMf5PTQMX060Ma9GOmE/dmRMWXtWEWuUjEc2d+5gvwr10lxcBFmIk5ShJh/Cw1SSVHTwg7aFwY4yrEDjBvhdqV8xAzj6EybmdQKwqxYrhgzIx+pvHxIfzLFGgmq19k6JofaKYzUP7Tg8w2JhdS5qvvOQHeS4PcB5sHDcT04qR/fn9YaN9PjrJJ9ckCOSEDOSYPckSZpE05y8kbeyYdX8c68S+/qu9QrTXt2yUx4d1+ZJcxr</latexit>
Flow
<latexit sha1_base64="9c2S84X+Yjx7lgHsBqZAnAQM7c8=">AAACgnicdVHLSgMxFE3Hd321unQTLIK4KDNW0IWLoiAuK1gV2qFk0ts2mEzG5I62DP0Ot/pZ/o2Ztgtr9cKFw7mvw7lRIoVF3/8qeEvLK6tr6xvFza3tnd1See/B6tRwaHIttXmKmAUpYmiiQAlPiQGmIgmP0fN1Xn98BWOFju9xlECoWD8WPcEZOipsIwzRqOxG6rdxp1Txq/4k6CIIZqBCZtHolAuddlfzVEGMXDJrW4GfYJgxg4JLGBfbqYWE8WfWh5aDMVNgw2yiekyPHNOlPW1cxkgn7M+JjClrRypynYrhwP6u5eRftVaKvYswE3GSIsR8eqiXSoqa5hbQrjDAUY4cYNwIp5XyATOMozNqbtN9EGa5uHzN3PlIjYtH9CeTy0hQDef7mOxrd2Gg/qEFXxxILKTOVd11BrqXBL8fsAgeTqtBrXp6d1apX82es04OyCE5JgE5J3VySxqkSTh5Ie/kg3x6y96JF3i1aatXmM3sk7nwLr8BgEbHlg==</latexit>
Sequential<latexit sha1_base64="//Wehlur2MY6EVVm2lvbA5j++Zw=">AAACinicdVHLSgMxFE3Hd31VXboJFsFVmamCipuiLlwqtlVoh5JJb9tgMhmTO2IZ+i1u9ZP8GzO1iz70woXDua/DuVEihUXf/y54S8srq2vrG8XNre2d3dLeftPq1HBocC21eY6YBSliaKBACc+JAaYiCU/Ry01ef3oDY4WO6zhMIFSsH4ue4Awd1SkdtBHe0ajsEV5TiFEwOeqUyn7FHwddBMEElMkk7jt7hU67q3mq3AIumbWtwE8wzJhBwSWMiu3UQsL4C+tDy8GYKbBhNlY/oseO6dKeNi5jpGN2eiJjytqhilynYjiw87Wc/KvWSrF3EWYiTlKEmP8e6qWSoqa5FbQrDHCUQwcYN8JppXzADOPoDJvZVA/CLBeXr5k5H6lR8ZhOM7mMBNX7bB+Tfe0uDNQ/tOCLA4mF1Lmqu85A95Jg/gGLoFmtBKeV6sNZuXY9ec46OSRH5IQE5JzUyB25Jw3CyZB8kE/y5W15Ve/Su/pt9QqTmQMyE97tD0YWyoY=</latexit>
Latent Variable
<latexit sha1_base64="GapF9uF0mvCTWqE3sPAHUh/6PLE=">AAACj3icdVHBSiNBEO2Mu65mXU3Wi+Cl2SB4CjPugp4kuBcFDxFMFJIh1HQ6SWP39NBdsxiG+DVe9X/2b7YmycEYt6Dh8apeVfWrJNPKYxj+rQQbnz5vftnarn7d+ba7V6t/73qbOyE7wmrr7hPwUqtUdlChlveZk2ASLe+Sh99l/u6PdF7Z9BanmYwNjFM1UgKQqEHtoI/yEZ0prgFlirwLTgGJZ4NaI2yG8+DrIFqCBltGe1CvDPpDK3JDbYQG73tRmGFcgEMlqGG1n3uZgXiAsewRTMFIHxfzL8z4ETFDPrKOHq0xZ98qCjDeT01ClQZw4t/nSvKjXC/H0VlcqDTL6X9iMWiUa46Wl37woXJSoJ4SAOEU7crFBBwIJNdWOt1GcVEuV7ZZGZ+YWfWIv2XKNTI0j6t1oMeWJkzMf2gl1gWZlzm5aodkIJ0ken+AddA9aUY/myc3vxqti+Vxttgh+8GOWcROWYtdsjbrMMGe2DN7Ya9BPTgNzoPWojSoLDX7bCWCq39tJMxL</latexit>
Model<latexit sha1_base64="CzgMrFKkGOsa2Onmbgn1zXOc4xo=">AAACg3icdVHLSgMxFE3HV61vXboJFkEQykwt6EYQ3bgRKthWaMeSSW/bYDIZkjtiGeY/3Opf+TdmahfW6oULh3Nfh3uiRAqLvv9Z8paWV1bXyuuVjc2t7Z3dvf221anh0OJaavMYMQtSxNBCgRIeEwNMRRI60fNNUe+8gLFCxw84SSBUbBSLoeAMHfXUQ3hFo7I7PQCZ93erfs2fBl0EwQxUySya/b1SvzfQPFUQI5fM2m7gJxhmzKDgEvJKL7WQMP7MRtB1MGYKbJhNZef02DEDOtTGZYx0yv6cyJiydqIi16kYju3vWkH+VeumOLwIMxEnKULMvw8NU0lR0+IHdCAMcJQTBxg3wmmlfMwM4+g+NbfpIQizQlyxZu58pPLKMf3JFDISVK/zfUyOtLswVv/Qgi8OJBZS91VnSV5xlgS/DVgE7XotOKvV7xvVq+uZOWVySI7ICQnIObkit6RJWoQTQ97IO/nwVrxTr+41vlu90mzmgMyFd/kFY4rH+Q==</latexit>
(Inference)<latexit sha1_base64="xVZXZbpac/CCiRQmU/qW9oZB2m4=">AAACi3icdVHLihNBFK20OsbMK6Pu3BSGwLgJ3ZkBRVwMI4LuRpg8IGma6srtpEg9mqrbMrHJv7gd/8i/sTrpxWSiFwoO575OnZvmUjgMwz+N4MnTZwfPmy9ah0fHJ6fts5dDZwrLYcCNNHacMgdSaBigQAnj3AJTqYRRuvxc5Uc/wDph9C2ucogVm2uRCc7QU0n79RThDq0qz7/pDCxoDu/WSbsT9sJN0H0Q1aBD6rhJzhrJdGZ4oUAjl8y5SRTmGJfMouAS1q1p4SBnfMnmMPFQMwUuLjfy17TrmRnNjPVPI92wDztKppxbqdRXKoYL9zhXkf/KTQrMPsSl0HmB/mPbRVkhKRpaeUFnwgJHufKAcSu8VsoXzDKO3rGdSbdRXFbiqjE761O1bnXpQ6aSkaO6261jcm78hoX6Dy34fkPuoPCumpk30J8kenyAfTDs96KLXv/7Zefquj5Ok7whb8k5ich7ckW+khsyIJz8JL/IPfkdHAcXwcfg07Y0aNQ9r8hOBF/+AvWqylU=</latexit>
(a)
xt 3
<latexit sha1_base64="A+NwkXFLy8dEXhO4lC/Y2RBrA7Y=">AAACh3icdVHLSsNAFJ3GZ+sr6tLNYBHcWJMq2qWPjUsFq0IbwmQ6aQdnMmHmRlpC/sSt/pN/46Ttwli9cOFw7utwbpQKbsDzvmrO0vLK6tp6vbGxubW94+7uPRmVacq6VAmlXyJimOAJ6wIHwV5SzYiMBHuOXm/L+vMb04ar5BEmKQskGSY85pSApULX7UsCoyjOx0WYw8lZEbpNr+VNAy8Cfw6aaB734W4t7A8UzSRLgApiTM/3UghyooFTwYpGPzMsJfSVDFnPwoRIZoJ8Kr3AR5YZ4FhpmwngKftzIifSmImMbGcp1PyuleRftV4GcSfIeZJmwBI6OxRnAoPCpQ94wDWjICYWEKq51YrpiGhCwbpV2fToB3kprlxTOR/JonGEfzKljBTkuNpHxFDZCyP5D83p4kBqWGZdVQNroH2J//sBi+Cp3fLPWu2H8+bVzfw56+gAHaJj5KNLdIXu0D3qIore0Dv6QJ9O3Tl1LpzOrNWpzWf2USWc629L3Mi+</latexit> xt 2<latexit sha1_base64="FQVlMEFQvZFJH0/AEMcNlsO9I7E=">AAACh3icdVHLTsJAFB3qC/AFunQzkZC4EVs06tLHxiUmoibQNNNhChNnOs3MrYE0/RO3+k/+jVNgIaA3ucnJua+Tc8NEcAOu+11y1tY3NrfKler2zu7efq1+8GxUqinrUiWUfg2JYYLHrAscBHtNNCMyFOwlfLsv6i/vTBuu4ieYJMyXZBjziFMClgpqtb4kMAqjbJwHGZy286DWcFvuNPAq8OaggebRCeqloD9QNJUsBiqIMT3PTcDPiAZOBcur/dSwhNA3MmQ9C2MimfGzqfQcNy0zwJHSNmPAU/b3REakMRMZ2s5CqFmuFeRftV4K0bWf8ThJgcV0dihKBQaFCx/wgGtGQUwsIFRzqxXTEdGEgnVrYdOT52eFuGLNwvlQ5tUm/s0UMhKQ48U+IobKXhjJf2hOVwcSw1LrqhpYA+1LvOUHrILndss7b7UfLxo3d/PnlNEROkYnyENX6AY9oA7qIore0Qf6RF9OxTlzLp3rWatTms8cooVwbn8AScnIvQ==</latexit> xt 1<latexit sha1_base64="Uoe094Ebd5E0MjBKCabrmYfr41Y=">AAACh3icdVHJTsMwEHXD1patwJGLRYXEhZIUBBxZLhxB6ia1UeS4Tmthx5E9qaii/AlX+Cf+Bqf00FIYaaSnN9vTmzAR3IDrfpWctfWNza1ypbq9s7u3Xzs47BiVasraVAmleyExTPCYtYGDYL1EMyJDwbrh62NR706YNlzFLZgmzJdkFPOIUwKWCmq1gSQwDqPsLQ8yOPfyoFZ3G+4s8Crw5qCO5vEcHJSCwVDRVLIYqCDG9D03AT8jGjgVLK8OUsMSQl/JiPUtjIlkxs9m0nN8apkhjpS2GQOesYsTGZHGTGVoOwuh5netIP+q9VOIbv2Mx0kKLKY/h6JUYFC48AEPuWYUxNQCQjW3WjEdE00oWLeWNrU8PyvEFWuWzocyr57iRaaQkYB8W+4jYqTshbH8h+Z0dSAxLLWuqqE10L7E+/2AVdBpNrzLRvPlqn73MH9OGR2jE3SGPHSD7tATekZtRNEEvaMP9OlUnAvn2rn9aXVK85kjtBTO/TdHtsi8</latexit> xt<latexit sha1_base64="/9eKoYxPE1ZizRUsfEHodALxXRM=">AAACg3icdVHLSgMxFE1H66O+Wl26CZaCIJSZWtCNILpxqdAXtGPJpJk2mEyG5I60DPMfbvWv/BsztYs+9MKFw7mvw7lBLLgB1/0uOFvbxZ3dvf3SweHR8Um5ctoxKtGUtakSSvcCYpjgEWsDB8F6sWZEBoJ1g7fHvN59Z9pwFbVgFjNfknHEQ04JWOp1IAlMgjCdZsMUsmG56tbdeeBN4C1AFS3ieVgpDAcjRRPJIqCCGNP33Bj8lGjgVLCsNEgMiwl9I2PWtzAikhk/ncvOcM0yIxwqbTMCPGeXJ1IijZnJwHbmMs16LSf/qvUTCG/9lEdxAiyiv4fCRGBQOPcAj7hmFMTMAkI1t1oxnRBNKFinVja1PD/NxeVrVs4HMivV8DKTy4hBTlf7iBgre2Ei/6E53RyIDUusq2pkDbQv8dYfsAk6jbp3XW+8NKv3D4vn7KFzdIEukYdu0D16Qs+ojSjS6AN9oi+n6Fw5Daf52+oUFjNnaCWcux+lOcgZ</latexit><latexit sha1_base64="YlEnZegu6BMRciM97uIB7y4yLos=">AAACdnicdVHLTgIxFC3jC/EFujQxjYToRjKDJrokunEJCa8EJqRTLtDQTidtx0gmfIFb/Tj/xKUdYMFDb9Lk5NzHOb03iDjTxnW/M87O7t7+QfYwd3R8cnqWL5y3tIwVhSaVXKpOQDRwFkLTMMOhEykgIuDQDiYvab79BkozGTbMNAJfkFHIhowSY6n6XT9fdMvuPPA28JagiJZR6xcy/d5A0lhAaCgnWnc9NzJ+QpRhlMMs14s1RIROyAi6FoZEgPaTudMZLllmgIdS2RcaPGdXOxIitJ6KwFYKYsZ6M5eSf+W6sRk++QkLo9hASBdCw5hjI3H6bTxgCqjhUwsIVcx6xXRMFKHGLmdtUsPzk9RcOmZNPhCzXAmvMqmNyIj39TrCR9IqjMU/NKPbDZGG2G5VDuwC7Um8zQNsg1al7N2XK/WHYvV5eZwsukTX6BZ56BFV0SuqoSaiCNAH+kRfmR/nyik5N4tSJ7PsuUBr4bi/nK/Cug==</latexit>
Γ·
<latexit sha1_base64="zU1t23bQbWEUhSwGFZ+dhUDLseE=">AAACeXicdVHLTgIxFO2ML8QX6NJNlZAQF2QGTHRJdOMSE14JTEin04HGdjppO0Qy4Rfc6q/5LW7sAAsG9CZNTs59nNN7/ZhRpR3n27L39g8OjwrHxZPTs/OLUvmyp0QiMeliwYQc+EgRRiPS1VQzMoglQdxnpO+/PWf5/oxIRUXU0fOYeBxNIhpSjHRGjQI6G5cqTt1ZBtwF7hpUwDra47I1HgUCJ5xEGjOk1NB1Yu2lSGqKGVkUR4kiMcJvaEKGBkaIE+WlS7MLWDVMAEMhzYs0XLKbHSniSs25byo50lO1ncvIv3LDRIePXkqjONEkwiuhMGFQC5j9HAZUEqzZ3ACEJTVeIZ4iibA2+8lN6rhempnLxuTkfb4oVuEmk9mINX/P1yE2EUZhyv+hKd5tiBVJzFZFYBZoTuJuH2AX9Bp1t1lvvN5XWk/r4xTANbgFNeCCB9ACL6ANugCDKfgAn+DL+rFv7Jp9tyq1rXXPFciF3fwFDZDESg==</latexit>
Β΅β(x<t)
<latexit sha1_base64="wy1WYPqqnkkv22S/qgVxrO+jOM4=">AAACmHicdVFNT9tAEN24tNC0lAC39rJthEQvkQ1I9MAhggP0BogAUmxZ6804WbHrtXbHVSLL5/4arvBb+DeskxwIgZFWenrz9fZNkkth0fefGt6HlY+fVtc+N798Xf+20drcura6MBx6XEttbhNmQYoMeihQwm1ugKlEwk1yd1Lnb/6BsUJnVzjJIVJsmIlUcIaOils/w0SVoSqqOMQRIKO7oWI4StJyXMXlEVa/41bb7/jToMsgmIM2mcd5vNmIw4HmhYIMuWTW9gM/x6hkBgWXUDXDwkLO+B0bQt/BjCmwUTn9S0V3HDOgqTbuZUin7MuOkilrJypxlbVQ+zpXk2/l+gWmf6JSZHmBkPHZorSQFDWtjaEDYYCjnDjAuBFOK+UjZhhHZ9/CpKsgKmtx9ZiF9Ymqmjv0JVPLyFGNF+uYHGq3YaTeoQVfbsgtFM5VPXAGupMErw+wDK73OsF+Z+/ioN09nh9njfwgv8guCcgh6ZIzck56hJP/5J48kEfvu9f1Tr2/s1KvMe/ZJgvhXT4D4PDP5w==</latexit>
β(x<t)
<latexit sha1_base64="tuXXMIg/Bc4YouWi1xyioUAsaLE=">AAACm3icdVHLTttAFJ24tIX0QaDLqtKIFIluIhuQ2gULBJuqYkElAkixZV1PrpMRMx5r5roisvwD/Rq27Z/0bzoOWRACVxrp6NzXmXOzUklHYfivE7xYe/nq9fpG983bd+83e1vbl85UVuBQGGXsdQYOlSxwSJIUXpcWQWcKr7Kb0zZ/9Qutk6a4oFmJiYZJIXMpgDyV9j7Hma5jJycamjSmKRLwvVgDTbO8vm3S+oiaL2mvHw7CefBVEC1Any3iPN3qpPHYiEpjQUKBc6MoLCmpwZIUCptuXDksQdzABEceFqDRJfX8Ow3f9cyY58b6VxCfsw87atDOzXTmK1uh7nGuJZ/KjSrKvyW1LMqKsBD3i/JKcTK89YaPpUVBauYBCCu9Vi6mYEGQd3Bp0kWU1K24dszS+kw33V3+kGlllKRvl+tATYzfMNXP0FKsNpQOK++qGXsD/UmixwdYBZf7g+hgsP/zsH98sjjOOvvIdtgei9hXdsy+s3M2ZIL9ZnfsD/sbfApOgx/B2X1p0Fn0fGBLEQz/A+6F0TQ=</latexit>
yt
<latexit sha1_base64="Rc8skmreVThEy0HMt+P8j5ZBRTE=">AAACgXicdVHLSgMxFE2nPmp9tbp0EywF3ZSZKii4KbpxqWBV6AxDJs20ockkJHfEMvQ33Opv+TdmahfW6oULh3Nfh3MTLbgF3/+seNW19Y3N2lZ9e2d3b7/RPHi0KjeU9akSyjwnxDLBM9YHDoI9a8OITAR7SiY3Zf3phRnLVfYAU80iSUYZTzkl4KgwlATGSVpMZzHEjZbf8eeBV0GwAC20iLu4WYnDoaK5ZBlQQawdBL6GqCAGOBVsVg9zyzShEzJiAwczIpmNirnoGW47ZohTZVxmgOfsz4mCSGunMnGdpUj7u1aSf9UGOaSXUcEznQPL6PehNBcYFC4dwENuGAUxdYBQw51WTMfEEArOp6VND0FUlOLKNUvnEzmrt/FPppShQb4u9xExUu7CWP5Dc7o6oC3Lnatq6Ax0Lwl+P2AVPHY7wVmne3/e6l0vnlNDR+gYnaAAXaAeukV3qI8o0ugNvaMPr+qder7X/W71KouZQ7QU3tUXUArHDg==</latexit>
cat
<latexit sha1_base64="3CFBwyotL035Oaq7NDjfbjjQZ9Q=">AAACgXicdVHLSgMxFE3Hd31VXboJFkE3ZaYKCm5ENy4VbCu0Q8mkt20wmYTkjrQM/Q23+lv+jZnahW31QuDk3Nfh3MRI4TAMv0rByura+sbmVnl7Z3dvv3Jw2HQ6sxwaXEttXxLmQIoUGihQwouxwFQioZW83hf51htYJ3T6jGMDsWKDVPQFZ+ipTgdhhIi5/066lWpYC6dBl0E0A1Uyi8fuQanb6WmeKUiRS+ZcOwoNxjmzKLiESbmTOTCMv7IBtD1MmQIX51PRE3rqmR7ta+tfinTK/u7ImXJurBJfqRgO3WKuIP/KtTPsX8e5SE2GkPKfRf1MUtS0cID2hAWOcuwB41Z4rZQPmWUcvU9zk56jOC/EFWPm1idqUj6lv5lChkE1mq9jcqD9hqH6hxZ8ucE4yLyruucN9CeJFg+wDJr1WnRRqz9dVm/vZsfZJMfkhJyRiFyRW/JAHkmDcGLIO/kgn8FKcB6EQf2nNCjNeo7IXAQ336HzxzU=</latexit>
(b)
Figure 5.2: Model Diagrams. (a) An autoregressive flow pre-processes a data sequence,x1:π, to produce a new sequence,y1:π, with reduced temporal dependencies.
This simplifies dynamics modeling for a higher-level sequential latent variable model, ππ(y1:π,z1:π). Empty diamond nodes represent deterministic dependencies, not recurrent states. (b)Diagram of the autoregressive flow architecture. Blank white rectangles represent convolutional layers (see Appendix). The three stacks of convolutional layers within the blue region are shared. catdenotes channel-wise concatenation.
Data Dynamics
The form of an affine autoregressive flow across sequences is given by
xπ‘ =Β΅π(x<π‘) +Οπ(x<π‘) yπ‘, (5.2)
and its inverse
yπ‘ = xπ‘ βΒ΅π(x<π‘)
Οπ(x<π‘) , (5.3)
which, again, are equivalent to a Gaussian autoregressive model whenyπ‘ βΌ N (yπ‘;0,I).
We can stack hierarchical chains of flows to improve the model capacity. Denoting the shift and scale functions at theπth transform asΒ΅ππ (Β·)andΟππ(Β·) respectively, we then calculateyπ using the inverse transform:
yππ‘ = yπβπ‘ 1βΒ΅ππ (yπβ<π‘ 1) Οππ (yπβ1<π‘ )
. (5.4)
After the final (πth) transform, we can choose the form of the base distribution, ππ(y1:ππ), e.g., Gaussian. While we could attempt to model x1:π completely us- ing stacked autoregressive flows, these models are limited to affine element-wise transforms that maintain the data dimensionality. Due to this limited capacity, purely flow-based models often require many transforms to be effective (Kingma and Dhariwal, 2018).
Instead, we can model the base distribution using an expressive sequential latent variable model (SLVM), or, equivalently, we can augment the conditional likelihood of a SLVM using autoregressive flows (Fig. 5.2a). Following the motivation from Section 5.2, the flow can remove temporal dependencies, simplifying the modeling task for the SLVM. With a single flow, the joint probability is
ππ(x1:π,z1:π)= ππ(y1:π,z1:π)
det πx1:π
πy1:π
β1
, (5.5)
where the SLVM distribution is given by ππ(y1:π,z1:π) =
π
Γ
π‘=1
ππ(yπ‘|y<π‘,zβ€π‘)ππ(zπ‘|y<π‘,z<π‘). (5.6) If the SLVM is itself a flow-based model, we can use maximum log-likelihood training. If not, we can resort to variational inference (Chung et al., 2015; Fraccaro et al., 2016). We derive and discuss this procedure in Section 5.5.
Latent Dynamics
We can also consider simplifying latent dynamics modeling using autoregressive flows in the latent prior. This is relevant in hierarchical SLVMs, such as VideoFlow (Kumar et al., 2020), where each latent variable is modeled as a function of past and higher-level latent variables. Usingzπ‘(β) to denote the latent variable at theβthlevel at timeπ‘, we can parameterize the prior as
ππ(z(β)π‘ |z(β)<π‘,z(>β)π‘ ) = ππ(uπ‘(β)|u(β)<π‘,zπ‘(>β))
det πzπ‘(β)
πuπ‘(β)
!
β1
, (5.7)
where we convertzπ‘(β) intouπ‘(β) using the inverse transform uπ‘(β) = (zπ‘(β) βΞ±π(z(<π‘β)))/Ξ²π(z(<π‘β)).
As noted previously, VideoFlow uses a special case of this procedure, setting Ξ±π(z(β)<π‘) = zπ‘(β)β1 and Ξ²π(z<π‘(β)) = 1. Generalizing this procedure further simplifies dynamics throughout the model.