• Tidak ada hasil yang ditemukan

Chapter V: Improving Sequential Latent Variable Models with Autoregressive

5.2 Method

We start by motivating the use of autoregressive flows to reduce temporal dependen- cies, thereby simplifying dynamics. We then show how this simple technique can be incorporated within sequential latent variable models.

Motivation: Temporal Redundancy Reduction

Normalizing flows, while often utilized for density estimation, originated from data pre-processing techniques (Friedman, 1987; HyvΓ€rinen and Oja, 2000; S. S. Chen and Gopinath, 2001), which remove dependencies between dimensions, i.e.,redundancy reduction(Barlow et al., 1961). Removing dependencies simplifies the resulting probability distribution by restricting variation to individual dimensions, generally simplifying downstream tasks (Laparra, Camps-Valls, and Malo, 2011). Normalizing flows improve upon these procedures using flexible, non-linear functions (Deco and Brauer, 1995; Dinh, Krueger, and Bengio, 2015). While flows have been used for spatial decorrelation (Agrawal and Dukkipati, 2016; Winkler et al., 2019) and with other models (Huang et al., 2017), this capability remains under-explored.

Data sequences contain dependencies in time, for example, in the redundancy of video pixels (Figure 5.3), which are often highly predictable. These dependencies define thedynamics of the data, with the degree of dependence quantified by the multi-information,

I (x1:𝑇) =Γ•

𝑑

H (x𝑑) βˆ’ H (x1:𝑇), (5.1) whereH denotes entropy. Normalizing flows are capable of reducing redundancy, arriving at a new sequence,y1:𝑇, withI (y1:𝑇) ≀ I (x1:𝑇), thereby reducing temporal dependencies and simplifying dynamics. Thus, rather than fit the data distribution directly, we can first simplify the dynamics by pre-processing sequences with a

βˆ’5.0 βˆ’2.5 0.0 2.5 5.0 x2

0.0 0.2 0.4 0.6

p(x2|x1)

(a)

βˆ’5.0 βˆ’2.5 0.0 2.5 5.0 x2

0.0 0.2 0.4 0.6

p(x2)

(b)

βˆ’5.0 βˆ’2.5 0.0 2.5 5.0 y2

0.0 0.2 0.4 0.6

p(y2|y1)

(c)

βˆ’5.0 βˆ’2.5 0.0 2.5 5.0 y2

0.0 0.2 0.4 0.6

p(y2)

(d)

Figure 5.1: Redundancy Reduction. (a) Conditional densities for 𝑝(π‘₯2|π‘₯1). (b) The marginal, 𝑝(π‘₯2)differs from the conditional densities, thus,I (π‘₯1;π‘₯2) > 0. (c) In the normalized space of𝑦, the corresponding densities 𝑝(𝑦2|𝑦1)are identical. (d) The marginal 𝑝(𝑦2) is identical to the conditionals, soI (𝑦1;𝑦2) =0. Thus, in this case, a conditional affine transform removed the dependencies.

normalizing flow, then fit the resulting sequence. Through training, the flow will attempt to remove redundancies to meet the modeling capacity of the higher-level dynamics model,π‘πœƒ(y1:𝑇).

Example To visualize this procedure for an affine autoregressive flow, consider a one-dimensional input over two time steps,π‘₯1andπ‘₯2. For each value ofπ‘₯1, there is a conditional density, 𝑝(π‘₯2|π‘₯1). Assume that these densities take one of two forms, which are identical but shifted and scaled, shown in Figure 5.1. Transforming these densities through their conditional means,πœ‡2=E[π‘₯2|π‘₯1], and standard deviations, 𝜎2 =E

(π‘₯2βˆ’ πœ‡2)2|π‘₯11/2

, creates a normalized space, 𝑦2 = (π‘₯2βˆ’πœ‡2)/𝜎2, where the conditional densities are identical. In this space, the multi-information is

I (𝑦1;𝑦2) =E𝑝(𝑦1, 𝑦2) [log𝑝(𝑦2|𝑦1) βˆ’log𝑝(𝑦2)] =0,

whereasI (π‘₯1;π‘₯2) > 0.Indeed, if 𝑝(π‘₯𝑑|π‘₯<𝑑)is linear-Gaussian, inverting an affine autoregressive flow exactly corresponds to Cholesky whitening (Pourahmadi, 2011;

Kingma, Salimans, et al., 2016), removing all linear dependencies.

In the example above,πœ‡2and𝜎2act as aframe of referencefor estimatingπ‘₯2. More generally, in the special case where Β΅πœƒ(x<𝑑) = xπ‘‘βˆ’1 and Οƒ(x<𝑑) = 1, we recover y𝑑 =xπ‘‘βˆ’xπ‘‘βˆ’1= Ξ”x𝑑. Modeling finite differences (orgeneralized coordinates(Friston, 2008)) is a well-established technique, (see, e.g., (Chua et al., 2018; Kumar et al., 2020)), which is generalized by affine autoregressive flows.

Modeling Dynamics with Autoregressive Flows

We now discuss utilizing autoregressive flows to improve sequence modeling, highlighting use cases for modeling dynamics in the data and latent spaces.

zt

zt 1

zt 2 zt+1 zt+2

xt xt+1 xt+2

xt 2 xt 1

yt 1 yt yt+1

yt 2 yt+2

β‡’

β‡’

Autoregressive

<latexit sha1_base64="ZKOsX4nA+FCRGgg3ahsu1NSlNPA=">AAACjnicdVHLSgMxFE3Hd321unDhJlgEV2VGRd2IVTcuK7QqtEPJpLdtMJkMyR2xDPM1bvWD/BsztQtr9ULgcO7j3JwbJVJY9P3PkrewuLS8srpWXt/Y3NquVHcerE4NhzbXUpuniFmQIoY2CpTwlBhgKpLwGD3fFvnHFzBW6LiF4wRCxYaxGAjO0FG9yl4X4RWNyq5T1AaGBqwVL5D3KjW/7k+CzoNgCmpkGs1etdTr9jVPFcTIJbO2E/gJhhkzKLiEvNxNLSSMP7MhdByMmQIbZpMf5PTQMX060Ma9GOmE/dmRMWXtWEWuUjEc2d+5gvwr10lxcBFmIk5ShJh/Cw1SSVHTwg7aFwY4yrEDjBvhdqV8xAzj6EybmdQKwqxYrhgzIx+pvHxIfzLFGgmq19k6JofaKYzUP7Tg8w2JhdS5qvvOQHeS4PcB5sHDcT04qR/fn9YaN9PjrJJ9ckCOSEDOSYPckSZpE05y8kbeyYdX8c68S+/qu9QrTXt2yUx4d1+ZJcxr</latexit>

Flow

<latexit sha1_base64="9c2S84X+Yjx7lgHsBqZAnAQM7c8=">AAACgnicdVHLSgMxFE3Hd321unQTLIK4KDNW0IWLoiAuK1gV2qFk0ts2mEzG5I62DP0Ot/pZ/o2Ztgtr9cKFw7mvw7lRIoVF3/8qeEvLK6tr6xvFza3tnd1See/B6tRwaHIttXmKmAUpYmiiQAlPiQGmIgmP0fN1Xn98BWOFju9xlECoWD8WPcEZOipsIwzRqOxG6rdxp1Txq/4k6CIIZqBCZtHolAuddlfzVEGMXDJrW4GfYJgxg4JLGBfbqYWE8WfWh5aDMVNgw2yiekyPHNOlPW1cxkgn7M+JjClrRypynYrhwP6u5eRftVaKvYswE3GSIsR8eqiXSoqa5hbQrjDAUY4cYNwIp5XyATOMozNqbtN9EGa5uHzN3PlIjYtH9CeTy0hQDef7mOxrd2Gg/qEFXxxILKTOVd11BrqXBL8fsAgeTqtBrXp6d1apX82es04OyCE5JgE5J3VySxqkSTh5Ie/kg3x6y96JF3i1aatXmM3sk7nwLr8BgEbHlg==</latexit>

Sequential<latexit sha1_base64="//Wehlur2MY6EVVm2lvbA5j++Zw=">AAACinicdVHLSgMxFE3Hd31VXboJFsFVmamCipuiLlwqtlVoh5JJb9tgMhmTO2IZ+i1u9ZP8GzO1iz70woXDua/DuVEihUXf/y54S8srq2vrG8XNre2d3dLeftPq1HBocC21eY6YBSliaKBACc+JAaYiCU/Ry01ef3oDY4WO6zhMIFSsH4ue4Awd1SkdtBHe0ajsEV5TiFEwOeqUyn7FHwddBMEElMkk7jt7hU67q3mq3AIumbWtwE8wzJhBwSWMiu3UQsL4C+tDy8GYKbBhNlY/oseO6dKeNi5jpGN2eiJjytqhilynYjiw87Wc/KvWSrF3EWYiTlKEmP8e6qWSoqa5FbQrDHCUQwcYN8JppXzADOPoDJvZVA/CLBeXr5k5H6lR8ZhOM7mMBNX7bB+Tfe0uDNQ/tOCLA4mF1Lmqu85A95Jg/gGLoFmtBKeV6sNZuXY9ec46OSRH5IQE5JzUyB25Jw3CyZB8kE/y5W15Ve/Su/pt9QqTmQMyE97tD0YWyoY=</latexit>

Latent Variable

<latexit sha1_base64="GapF9uF0mvCTWqE3sPAHUh/6PLE=">AAACj3icdVHBSiNBEO2Mu65mXU3Wi+Cl2SB4CjPugp4kuBcFDxFMFJIh1HQ6SWP39NBdsxiG+DVe9X/2b7YmycEYt6Dh8apeVfWrJNPKYxj+rQQbnz5vftnarn7d+ba7V6t/73qbOyE7wmrr7hPwUqtUdlChlveZk2ASLe+Sh99l/u6PdF7Z9BanmYwNjFM1UgKQqEHtoI/yEZ0prgFlirwLTgGJZ4NaI2yG8+DrIFqCBltGe1CvDPpDK3JDbYQG73tRmGFcgEMlqGG1n3uZgXiAsewRTMFIHxfzL8z4ETFDPrKOHq0xZ98qCjDeT01ClQZw4t/nSvKjXC/H0VlcqDTL6X9iMWiUa46Wl37woXJSoJ4SAOEU7crFBBwIJNdWOt1GcVEuV7ZZGZ+YWfWIv2XKNTI0j6t1oMeWJkzMf2gl1gWZlzm5aodkIJ0ken+AddA9aUY/myc3vxqti+Vxttgh+8GOWcROWYtdsjbrMMGe2DN7Ya9BPTgNzoPWojSoLDX7bCWCq39tJMxL</latexit>

Model<latexit sha1_base64="CzgMrFKkGOsa2Onmbgn1zXOc4xo=">AAACg3icdVHLSgMxFE3HV61vXboJFkEQykwt6EYQ3bgRKthWaMeSSW/bYDIZkjtiGeY/3Opf+TdmahfW6oULh3Nfh3uiRAqLvv9Z8paWV1bXyuuVjc2t7Z3dvf221anh0OJaavMYMQtSxNBCgRIeEwNMRRI60fNNUe+8gLFCxw84SSBUbBSLoeAMHfXUQ3hFo7I7PQCZ93erfs2fBl0EwQxUySya/b1SvzfQPFUQI5fM2m7gJxhmzKDgEvJKL7WQMP7MRtB1MGYKbJhNZef02DEDOtTGZYx0yv6cyJiydqIi16kYju3vWkH+VeumOLwIMxEnKULMvw8NU0lR0+IHdCAMcJQTBxg3wmmlfMwM4+g+NbfpIQizQlyxZu58pPLKMf3JFDISVK/zfUyOtLswVv/Qgi8OJBZS91VnSV5xlgS/DVgE7XotOKvV7xvVq+uZOWVySI7ICQnIObkit6RJWoQTQ97IO/nwVrxTr+41vlu90mzmgMyFd/kFY4rH+Q==</latexit>

(Inference)<latexit sha1_base64="xVZXZbpac/CCiRQmU/qW9oZB2m4=">AAACi3icdVHLihNBFK20OsbMK6Pu3BSGwLgJ3ZkBRVwMI4LuRpg8IGma6srtpEg9mqrbMrHJv7gd/8i/sTrpxWSiFwoO575OnZvmUjgMwz+N4MnTZwfPmy9ah0fHJ6fts5dDZwrLYcCNNHacMgdSaBigQAnj3AJTqYRRuvxc5Uc/wDph9C2ucogVm2uRCc7QU0n79RThDq0qz7/pDCxoDu/WSbsT9sJN0H0Q1aBD6rhJzhrJdGZ4oUAjl8y5SRTmGJfMouAS1q1p4SBnfMnmMPFQMwUuLjfy17TrmRnNjPVPI92wDztKppxbqdRXKoYL9zhXkf/KTQrMPsSl0HmB/mPbRVkhKRpaeUFnwgJHufKAcSu8VsoXzDKO3rGdSbdRXFbiqjE761O1bnXpQ6aSkaO6261jcm78hoX6Dy34fkPuoPCumpk30J8kenyAfTDs96KLXv/7Zefquj5Ok7whb8k5ich7ckW+khsyIJz8JL/IPfkdHAcXwcfg07Y0aNQ9r8hOBF/+AvWqylU=</latexit>

(a)

xt 3

<latexit sha1_base64="A+NwkXFLy8dEXhO4lC/Y2RBrA7Y=">AAACh3icdVHLSsNAFJ3GZ+sr6tLNYBHcWJMq2qWPjUsFq0IbwmQ6aQdnMmHmRlpC/sSt/pN/46Ttwli9cOFw7utwbpQKbsDzvmrO0vLK6tp6vbGxubW94+7uPRmVacq6VAmlXyJimOAJ6wIHwV5SzYiMBHuOXm/L+vMb04ar5BEmKQskGSY85pSApULX7UsCoyjOx0WYw8lZEbpNr+VNAy8Cfw6aaB734W4t7A8UzSRLgApiTM/3UghyooFTwYpGPzMsJfSVDFnPwoRIZoJ8Kr3AR5YZ4FhpmwngKftzIifSmImMbGcp1PyuleRftV4GcSfIeZJmwBI6OxRnAoPCpQ94wDWjICYWEKq51YrpiGhCwbpV2fToB3kprlxTOR/JonGEfzKljBTkuNpHxFDZCyP5D83p4kBqWGZdVQNroH2J//sBi+Cp3fLPWu2H8+bVzfw56+gAHaJj5KNLdIXu0D3qIore0Dv6QJ9O3Tl1LpzOrNWpzWf2USWc629L3Mi+</latexit> xt 2<latexit sha1_base64="FQVlMEFQvZFJH0/AEMcNlsO9I7E=">AAACh3icdVHLTsJAFB3qC/AFunQzkZC4EVs06tLHxiUmoibQNNNhChNnOs3MrYE0/RO3+k/+jVNgIaA3ucnJua+Tc8NEcAOu+11y1tY3NrfKler2zu7efq1+8GxUqinrUiWUfg2JYYLHrAscBHtNNCMyFOwlfLsv6i/vTBuu4ieYJMyXZBjziFMClgpqtb4kMAqjbJwHGZy286DWcFvuNPAq8OaggebRCeqloD9QNJUsBiqIMT3PTcDPiAZOBcur/dSwhNA3MmQ9C2MimfGzqfQcNy0zwJHSNmPAU/b3REakMRMZ2s5CqFmuFeRftV4K0bWf8ThJgcV0dihKBQaFCx/wgGtGQUwsIFRzqxXTEdGEgnVrYdOT52eFuGLNwvlQ5tUm/s0UMhKQ48U+IobKXhjJf2hOVwcSw1LrqhpYA+1LvOUHrILndss7b7UfLxo3d/PnlNEROkYnyENX6AY9oA7qIore0Qf6RF9OxTlzLp3rWatTms8cooVwbn8AScnIvQ==</latexit> xt 1<latexit sha1_base64="Uoe094Ebd5E0MjBKCabrmYfr41Y=">AAACh3icdVHJTsMwEHXD1patwJGLRYXEhZIUBBxZLhxB6ia1UeS4Tmthx5E9qaii/AlX+Cf+Bqf00FIYaaSnN9vTmzAR3IDrfpWctfWNza1ypbq9s7u3Xzs47BiVasraVAmleyExTPCYtYGDYL1EMyJDwbrh62NR706YNlzFLZgmzJdkFPOIUwKWCmq1gSQwDqPsLQ8yOPfyoFZ3G+4s8Crw5qCO5vEcHJSCwVDRVLIYqCDG9D03AT8jGjgVLK8OUsMSQl/JiPUtjIlkxs9m0nN8apkhjpS2GQOesYsTGZHGTGVoOwuh5netIP+q9VOIbv2Mx0kKLKY/h6JUYFC48AEPuWYUxNQCQjW3WjEdE00oWLeWNrU8PyvEFWuWzocyr57iRaaQkYB8W+4jYqTshbH8h+Z0dSAxLLWuqqE10L7E+/2AVdBpNrzLRvPlqn73MH9OGR2jE3SGPHSD7tATekZtRNEEvaMP9OlUnAvn2rn9aXVK85kjtBTO/TdHtsi8</latexit> xt<latexit sha1_base64="/9eKoYxPE1ZizRUsfEHodALxXRM=">AAACg3icdVHLSgMxFE1H66O+Wl26CZaCIJSZWtCNILpxqdAXtGPJpJk2mEyG5I60DPMfbvWv/BsztYs+9MKFw7mvw7lBLLgB1/0uOFvbxZ3dvf3SweHR8Um5ctoxKtGUtakSSvcCYpjgEWsDB8F6sWZEBoJ1g7fHvN59Z9pwFbVgFjNfknHEQ04JWOp1IAlMgjCdZsMUsmG56tbdeeBN4C1AFS3ieVgpDAcjRRPJIqCCGNP33Bj8lGjgVLCsNEgMiwl9I2PWtzAikhk/ncvOcM0yIxwqbTMCPGeXJ1IijZnJwHbmMs16LSf/qvUTCG/9lEdxAiyiv4fCRGBQOPcAj7hmFMTMAkI1t1oxnRBNKFinVja1PD/NxeVrVs4HMivV8DKTy4hBTlf7iBgre2Ei/6E53RyIDUusq2pkDbQv8dYfsAk6jbp3XW+8NKv3D4vn7KFzdIEukYdu0D16Qs+ojSjS6AN9oi+n6Fw5Daf52+oUFjNnaCWcux+lOcgZ</latexit><latexit sha1_base64="YlEnZegu6BMRciM97uIB7y4yLos=">AAACdnicdVHLTgIxFC3jC/EFujQxjYToRjKDJrokunEJCa8EJqRTLtDQTidtx0gmfIFb/Tj/xKUdYMFDb9Lk5NzHOb03iDjTxnW/M87O7t7+QfYwd3R8cnqWL5y3tIwVhSaVXKpOQDRwFkLTMMOhEykgIuDQDiYvab79BkozGTbMNAJfkFHIhowSY6n6XT9fdMvuPPA28JagiJZR6xcy/d5A0lhAaCgnWnc9NzJ+QpRhlMMs14s1RIROyAi6FoZEgPaTudMZLllmgIdS2RcaPGdXOxIitJ6KwFYKYsZ6M5eSf+W6sRk++QkLo9hASBdCw5hjI3H6bTxgCqjhUwsIVcx6xXRMFKHGLmdtUsPzk9RcOmZNPhCzXAmvMqmNyIj39TrCR9IqjMU/NKPbDZGG2G5VDuwC7Um8zQNsg1al7N2XK/WHYvV5eZwsukTX6BZ56BFV0SuqoSaiCNAH+kRfmR/nyik5N4tSJ7PsuUBr4bi/nK/Cug==</latexit>

Γ·

<latexit sha1_base64="zU1t23bQbWEUhSwGFZ+dhUDLseE=">AAACeXicdVHLTgIxFO2ML8QX6NJNlZAQF2QGTHRJdOMSE14JTEin04HGdjppO0Qy4Rfc6q/5LW7sAAsG9CZNTs59nNN7/ZhRpR3n27L39g8OjwrHxZPTs/OLUvmyp0QiMeliwYQc+EgRRiPS1VQzMoglQdxnpO+/PWf5/oxIRUXU0fOYeBxNIhpSjHRGjQI6G5cqTt1ZBtwF7hpUwDra47I1HgUCJ5xEGjOk1NB1Yu2lSGqKGVkUR4kiMcJvaEKGBkaIE+WlS7MLWDVMAEMhzYs0XLKbHSniSs25byo50lO1ncvIv3LDRIePXkqjONEkwiuhMGFQC5j9HAZUEqzZ3ACEJTVeIZ4iibA2+8lN6rhempnLxuTkfb4oVuEmk9mINX/P1yE2EUZhyv+hKd5tiBVJzFZFYBZoTuJuH2AX9Bp1t1lvvN5XWk/r4xTANbgFNeCCB9ACL6ANugCDKfgAn+DL+rFv7Jp9tyq1rXXPFciF3fwFDZDESg==</latexit>

Β΅βœ“(x<t)

<latexit sha1_base64="wy1WYPqqnkkv22S/qgVxrO+jOM4=">AAACmHicdVFNT9tAEN24tNC0lAC39rJthEQvkQ1I9MAhggP0BogAUmxZ6804WbHrtXbHVSLL5/4arvBb+DeskxwIgZFWenrz9fZNkkth0fefGt6HlY+fVtc+N798Xf+20drcura6MBx6XEttbhNmQYoMeihQwm1ugKlEwk1yd1Lnb/6BsUJnVzjJIVJsmIlUcIaOils/w0SVoSqqOMQRIKO7oWI4StJyXMXlEVa/41bb7/jToMsgmIM2mcd5vNmIw4HmhYIMuWTW9gM/x6hkBgWXUDXDwkLO+B0bQt/BjCmwUTn9S0V3HDOgqTbuZUin7MuOkilrJypxlbVQ+zpXk2/l+gWmf6JSZHmBkPHZorSQFDWtjaEDYYCjnDjAuBFOK+UjZhhHZ9/CpKsgKmtx9ZiF9Ymqmjv0JVPLyFGNF+uYHGq3YaTeoQVfbsgtFM5VPXAGupMErw+wDK73OsF+Z+/ioN09nh9njfwgv8guCcgh6ZIzck56hJP/5J48kEfvu9f1Tr2/s1KvMe/ZJgvhXT4D4PDP5w==</latexit>

βœ“(x<t)

<latexit sha1_base64="tuXXMIg/Bc4YouWi1xyioUAsaLE=">AAACm3icdVHLTttAFJ24tIX0QaDLqtKIFIluIhuQ2gULBJuqYkElAkixZV1PrpMRMx5r5roisvwD/Rq27Z/0bzoOWRACVxrp6NzXmXOzUklHYfivE7xYe/nq9fpG983bd+83e1vbl85UVuBQGGXsdQYOlSxwSJIUXpcWQWcKr7Kb0zZ/9Qutk6a4oFmJiYZJIXMpgDyV9j7Hma5jJycamjSmKRLwvVgDTbO8vm3S+oiaL2mvHw7CefBVEC1Any3iPN3qpPHYiEpjQUKBc6MoLCmpwZIUCptuXDksQdzABEceFqDRJfX8Ow3f9cyY58b6VxCfsw87atDOzXTmK1uh7nGuJZ/KjSrKvyW1LMqKsBD3i/JKcTK89YaPpUVBauYBCCu9Vi6mYEGQd3Bp0kWU1K24dszS+kw33V3+kGlllKRvl+tATYzfMNXP0FKsNpQOK++qGXsD/UmixwdYBZf7g+hgsP/zsH98sjjOOvvIdtgei9hXdsy+s3M2ZIL9ZnfsD/sbfApOgx/B2X1p0Fn0fGBLEQz/A+6F0TQ=</latexit>

yt

<latexit sha1_base64="Rc8skmreVThEy0HMt+P8j5ZBRTE=">AAACgXicdVHLSgMxFE2nPmp9tbp0EywF3ZSZKii4KbpxqWBV6AxDJs20ockkJHfEMvQ33Opv+TdmahfW6oULh3Nfh3MTLbgF3/+seNW19Y3N2lZ9e2d3b7/RPHi0KjeU9akSyjwnxDLBM9YHDoI9a8OITAR7SiY3Zf3phRnLVfYAU80iSUYZTzkl4KgwlATGSVpMZzHEjZbf8eeBV0GwAC20iLu4WYnDoaK5ZBlQQawdBL6GqCAGOBVsVg9zyzShEzJiAwczIpmNirnoGW47ZohTZVxmgOfsz4mCSGunMnGdpUj7u1aSf9UGOaSXUcEznQPL6PehNBcYFC4dwENuGAUxdYBQw51WTMfEEArOp6VND0FUlOLKNUvnEzmrt/FPppShQb4u9xExUu7CWP5Dc7o6oC3Lnatq6Ax0Lwl+P2AVPHY7wVmne3/e6l0vnlNDR+gYnaAAXaAeukV3qI8o0ugNvaMPr+qder7X/W71KouZQ7QU3tUXUArHDg==</latexit>

cat

<latexit sha1_base64="3CFBwyotL035Oaq7NDjfbjjQZ9Q=">AAACgXicdVHLSgMxFE3Hd31VXboJFkE3ZaYKCm5ENy4VbCu0Q8mkt20wmYTkjrQM/Q23+lv+jZnahW31QuDk3Nfh3MRI4TAMv0rByura+sbmVnl7Z3dvv3Jw2HQ6sxwaXEttXxLmQIoUGihQwouxwFQioZW83hf51htYJ3T6jGMDsWKDVPQFZ+ipTgdhhIi5/066lWpYC6dBl0E0A1Uyi8fuQanb6WmeKUiRS+ZcOwoNxjmzKLiESbmTOTCMv7IBtD1MmQIX51PRE3rqmR7ta+tfinTK/u7ImXJurBJfqRgO3WKuIP/KtTPsX8e5SE2GkPKfRf1MUtS0cID2hAWOcuwB41Z4rZQPmWUcvU9zk56jOC/EFWPm1idqUj6lv5lChkE1mq9jcqD9hqH6hxZ8ucE4yLyruucN9CeJFg+wDJr1WnRRqz9dVm/vZsfZJMfkhJyRiFyRW/JAHkmDcGLIO/kgn8FKcB6EQf2nNCjNeo7IXAQ336HzxzU=</latexit>

(b)

Figure 5.2: Model Diagrams. (a) An autoregressive flow pre-processes a data sequence,x1:𝑇, to produce a new sequence,y1:𝑇, with reduced temporal dependencies.

This simplifies dynamics modeling for a higher-level sequential latent variable model, π‘πœƒ(y1:𝑇,z1:𝑇). Empty diamond nodes represent deterministic dependencies, not recurrent states. (b)Diagram of the autoregressive flow architecture. Blank white rectangles represent convolutional layers (see Appendix). The three stacks of convolutional layers within the blue region are shared. catdenotes channel-wise concatenation.

Data Dynamics

The form of an affine autoregressive flow across sequences is given by

x𝑑 =Β΅πœƒ(x<𝑑) +Οƒπœƒ(x<𝑑) y𝑑, (5.2)

and its inverse

y𝑑 = x𝑑 βˆ’Β΅πœƒ(x<𝑑)

Οƒπœƒ(x<𝑑) , (5.3)

which, again, are equivalent to a Gaussian autoregressive model wheny𝑑 ∼ N (y𝑑;0,I).

We can stack hierarchical chains of flows to improve the model capacity. Denoting the shift and scale functions at theπ‘šth transform asΒ΅π‘šπœƒ (Β·)andΟƒπœƒπ‘š(Β·) respectively, we then calculateyπ‘š using the inverse transform:

yπ‘šπ‘‘ = yπ‘šβˆ’π‘‘ 1βˆ’Β΅π‘šπœƒ (yπ‘šβˆ’<𝑑 1) Οƒπ‘šπœƒ (yπ‘šβˆ’1<𝑑 )

. (5.4)

After the final (𝑀th) transform, we can choose the form of the base distribution, π‘πœƒ(y1:𝑀𝑇), e.g., Gaussian. While we could attempt to model x1:𝑇 completely us- ing stacked autoregressive flows, these models are limited to affine element-wise transforms that maintain the data dimensionality. Due to this limited capacity, purely flow-based models often require many transforms to be effective (Kingma and Dhariwal, 2018).

Instead, we can model the base distribution using an expressive sequential latent variable model (SLVM), or, equivalently, we can augment the conditional likelihood of a SLVM using autoregressive flows (Fig. 5.2a). Following the motivation from Section 5.2, the flow can remove temporal dependencies, simplifying the modeling task for the SLVM. With a single flow, the joint probability is

π‘πœƒ(x1:𝑇,z1:𝑇)= π‘πœƒ(y1:𝑇,z1:𝑇)

det πœ•x1:𝑇

πœ•y1:𝑇

βˆ’1

, (5.5)

where the SLVM distribution is given by π‘πœƒ(y1:𝑇,z1:𝑇) =

𝑇

Γ–

𝑑=1

π‘πœƒ(y𝑑|y<𝑑,z≀𝑑)π‘πœƒ(z𝑑|y<𝑑,z<𝑑). (5.6) If the SLVM is itself a flow-based model, we can use maximum log-likelihood training. If not, we can resort to variational inference (Chung et al., 2015; Fraccaro et al., 2016). We derive and discuss this procedure in Section 5.5.

Latent Dynamics

We can also consider simplifying latent dynamics modeling using autoregressive flows in the latent prior. This is relevant in hierarchical SLVMs, such as VideoFlow (Kumar et al., 2020), where each latent variable is modeled as a function of past and higher-level latent variables. Usingz𝑑(β„“) to denote the latent variable at theβ„“thlevel at time𝑑, we can parameterize the prior as

π‘πœƒ(z(β„“)𝑑 |z(β„“)<𝑑,z(>β„“)𝑑 ) = π‘πœƒ(u𝑑(β„“)|u(β„“)<𝑑,z𝑑(>β„“))

det πœ•z𝑑(β„“)

πœ•u𝑑(β„“)

!

βˆ’1

, (5.7)

where we convertz𝑑(β„“) intou𝑑(β„“) using the inverse transform u𝑑(β„“) = (z𝑑(β„“) βˆ’Ξ±πœƒ(z(<𝑑ℓ)))/Ξ²πœƒ(z(<𝑑ℓ)).

As noted previously, VideoFlow uses a special case of this procedure, setting Ξ±πœƒ(z(β„“)<𝑑) = z𝑑(β„“)βˆ’1 and Ξ²πœƒ(z<𝑑(β„“)) = 1. Generalizing this procedure further simplifies dynamics throughout the model.