WEBVTT 1 00:00:06.052 --> 00:00:09.846 Game Basics Mathematics for 3D Games Part 3: Perspective Projection Transformation and Depth Values 2 00:00:27.392 --> 00:00:28.805 Hello, everyone? 3 00:00:28.805 --> 00:00:30.575 This is Lee Deuk-woo from Game Mathematics 4 00:00:30.575 --> 00:00:31.715 In this session, 5 00:00:31.715 --> 00:00:34.688 we’ll discuss perspective projection 6 00:00:34.688 --> 00:00:37.249 We’ll explain what perspective projection transformation is 7 00:00:37.249 --> 00:00:40.772 and how to implement it 8 00:00:40.772 --> 00:00:43.686 We’ll also explore the concepts 9 00:00:43.686 --> 00:00:46.429 of clip space, 10 00:00:46.429 --> 00:00:48.284 homogeneous coordinates, 11 00:00:48.284 --> 00:00:50.742 and vanishing points, which are used in the perspective projection process 12 00:00:50.742 --> 00:00:53.837 Next, we’ll look into depth values 13 00:00:53.837 --> 00:00:55.370 I’ll explain what depth values are 14 00:00:55.370 --> 00:00:57.949 and why they are necessary 15 00:00:57.949 --> 00:01:00.689 To handle depth values, we need to create 16 00:01:00.689 --> 00:01:03.030 a specific geometric shape 17 00:01:03.030 --> 00:01:05.165 which is called a frustum 18 00:01:05.165 --> 00:01:06.429 For the frustum 19 00:01:06.429 --> 00:01:09.868 we will examine how it is constructed 20 00:01:09.868 --> 00:01:12.048 We’ll also explore 21 00:01:12.048 --> 00:01:14.222 how to design the newly defined space 22 00:01:14.222 --> 00:01:16.066 using the frustum 23 00:01:16.066 --> 00:01:17.987 This is what we will explore today 24 00:01:17.987 --> 00:01:20.920 Finally, we’ll discuss 25 00:01:20.920 --> 00:01:22.971 the final perspective projection matrix 26 00:01:22.971 --> 00:01:26.995 that incorporates the frustum and calculates depth values 27 00:01:26.995 --> 00:01:31.135 Perspective Projection Transformation 28 00:01:31.135 --> 00:01:33.147 First, let’s discuss the perspective projection transformation 29 00:01:33.147 --> 00:01:34.865 This is referred to in English as 30 00:01:34.865 --> 00:01:37.411 the Perspective Projection Transformation 31 00:01:37.411 --> 00:01:38.999 That is what it is called 32 00:01:38.999 --> 00:01:41.076 It’s related to 33 00:01:41.076 --> 00:01:43.298 the perspective techniques often mentioned in art 34 00:01:43.298 --> 00:01:45.820 You can think of it as a method 35 00:01:45.820 --> 00:01:49.151 to implement perspective drawing techniques 36 00:01:49.151 --> 00:01:51.496 This perspective technique in art 37 00:01:51.496 --> 00:01:54.797 was devised during the Renaissance period 38 00:01:54.797 --> 00:01:57.952 and has evolved into a technique 39 00:01:57.952 --> 00:02:01.106 that is now used in nearly all 40 00:02:01.106 --> 00:02:03.666 three-dimensional drawings 41 00:02:03.666 --> 00:02:07.781 In the past, to achieve this, 42 00:02:07.781 --> 00:02:11.258 a single focal point would be set 43 00:02:11.258 --> 00:02:13.493 and strings were extended 44 00:02:13.493 --> 00:02:15.572 to project the image onto a surface 45 00:02:15.572 --> 00:02:19.590 in a highly precise, detailed process 46 00:02:19.590 --> 00:02:21.856 This was done manually 47 00:02:21.856 --> 00:02:23.800 Even today, when drawing 48 00:02:23.800 --> 00:02:26.200 it’s common to use vanishing points as a reference 49 00:02:26.200 --> 00:02:29.801 and apply perspective techniques using lines 50 00:02:29.801 --> 00:02:31.893 is common 51 00:02:31.893 --> 00:02:34.473 Instead of applying these techniques 52 00:02:34.473 --> 00:02:36.921 step by step manually, 53 00:02:36.921 --> 00:02:40.876 we can use matrices 54 00:02:40.876 --> 00:02:42.604 to efficiently represent objects with perspective 55 00:02:42.604 --> 00:02:44.334 all at once 56 00:02:44.334 --> 00:02:46.630 This is achieved through 57 00:02:46.630 --> 00:02:48.444 the perspective projection transformation 58 00:02:48.444 --> 00:02:52.205 I explained earlier 59 00:02:52.205 --> 00:02:56.652 The principle behind perspective projection transformation is as follows 60 00:02:56.652 --> 00:02:58.652 We start with a real-world space where the three axes 61 00:02:58.652 --> 00:03:04.121 are orthogonal to each other 62 00:03:04.121 --> 00:03:06.100 The space we perceive with our eyes 63 00:03:06.100 --> 00:03:08.350 views this orthogonal space 64 00:03:08.350 --> 00:03:11.398 slightly differently 65 00:03:11.398 --> 00:03:13.816 In the real world 66 00:03:13.816 --> 00:03:15.751 objects are arranged orthogonally 67 00:03:15.751 --> 00:03:18.531 but the world we perceive 68 00:03:18.531 --> 00:03:23.151 begins from a single point, like a camera 69 00:03:23.151 --> 00:03:25.134 and expands outward 70 00:03:25.134 --> 00:03:26.739 in a specified field of view 71 00:03:26.739 --> 00:03:28.595 So you can think of the spaces 72 00:03:28.595 --> 00:03:30.893 as being configured differently 73 00:03:30.893 --> 00:03:32.833 You can understand it like this 74 00:03:32.833 --> 00:03:34.606 In this expanding space 75 00:03:34.606 --> 00:03:36.482 defined by the field of view 76 00:03:36.482 --> 00:03:39.805 we designate a plane 77 00:03:39.805 --> 00:03:42.028 to project the objects in the space 78 00:03:42.028 --> 00:03:45.169 onto this plane 79 00:03:45.169 --> 00:03:51.327 The farther this plane is from the camera 80 00:03:51.327 --> 00:03:55.456 the larger it becomes 81 00:03:55.456 --> 00:04:00.439 and the closer it is, the smaller it appears 82 00:04:00.439 --> 00:04:03.358 We need to specify a single plane 83 00:04:03.358 --> 00:04:06.289 that changes depending on the distance 84 00:04:06.289 --> 00:04:10.008 This specified plane is called the projection plane 85 00:04:10.008 --> 00:04:12.297 The shortest distance from the camera to this plane 86 00:04:12.297 --> 00:04:15.296 is called the focal length 87 00:04:15.296 --> 00:04:18.737 With these two concepts in mind, 88 00:04:18.737 --> 00:04:21.028 we can now project objects onto that plane 89 00:04:21.028 --> 00:04:22.727 to create a final image 90 00:04:22.727 --> 00:04:26.988 with perspective 91 00:04:26.988 --> 00:04:29.548 If we look at this from the side 92 00:04:29.548 --> 00:04:32.295 it can be visualized as follows 93 00:04:32.295 --> 00:04:33.812 First, there’s the field of view 94 00:04:33.812 --> 00:04:37.708 This is predetermined in the camera 95 00:04:37.708 --> 00:04:42.012 The field of view is defined 96 00:04:42.012 --> 00:04:45.227 Next, we specify the projection plane onto which objects will be projected 97 00:04:45.227 --> 00:04:47.813 The shortest distance between the camera and the projection plane 98 00:04:47.813 --> 00:04:49.774 is called the focal length, correct? 99 00:04:49.774 --> 00:04:53.566 Let’s denote this as d 100 00:04:53.566 --> 00:04:56.608 From this basic setup 101 00:04:56.608 --> 00:04:58.667 we need to determine 102 00:04:58.667 --> 00:05:00.433 how to implement this 103 00:05:00.433 --> 00:05:02.495 A commonly used method 104 00:05:02.495 --> 00:05:05.624 is to define the projected area 105 00:05:05.624 --> 00:05:07.348 known as NDC 106 00:05:07.348 --> 00:05:12.046 Typically, the final game screen 107 00:05:12.046 --> 00:05:13.902 is displayed on a monitor 108 00:05:13.902 --> 00:05:16.714 Depending on the monitor specifications 109 00:05:16.714 --> 00:05:20.481 resolutions can range from 1024x768 110 00:05:20.481 --> 00:05:24.113 to 1920x1080 or even 4K 111 00:05:24.113 --> 00:05:26.329 There are various types of resolutions, right? 112 00:05:26.329 --> 00:05:30.178 These different resolutions use varying pixel densities 113 00:05:30.178 --> 00:05:32.277 to form the screen 114 00:05:32.277 --> 00:05:34.979 It’s impractical to handle all these resolutions individually 115 00:05:34.979 --> 00:05:37.503 So, in programming, 116 00:05:37.503 --> 00:05:40.184 the resolution is handled separately 117 00:05:40.184 --> 00:05:43.558 while the projected space is designed 118 00:05:43.558 --> 00:05:46.285 to be as simple as possible 119 00:05:46.285 --> 00:05:47.948 using a unit scale of 1 120 00:05:47.948 --> 00:05:49.990 for the projection plane 121 00:05:49.990 --> 00:05:53.657 This is the fundamental principle of NDC 122 00:05:53.657 --> 00:05:57.008 Similarly, in textures 123 00:05:57.008 --> 00:05:59.101 no matter how large the texture image is 124 00:05:59.101 --> 00:06:01.562 we use UV coordinates from (0, 0) 125 00:06:01.562 --> 00:06:04.582 to (1, 1) 126 00:06:04.582 --> 00:06:06.941 Regardless of the screen resolution 127 00:06:06.941 --> 00:06:09.570 we define the projection plane 128 00:06:09.570 --> 00:06:15.096 as a 1 unit-sized plane 129 00:06:15.096 --> 00:06:17.931 This is the NDC coordinate system 130 00:06:17.931 --> 00:06:20.917 However, when specifying a size of 1, 131 00:06:20.917 --> 00:06:23.449 it doesn’t mean the entire size is 1 132 00:06:23.449 --> 00:06:24.521 As shown here 133 00:06:24.521 --> 00:06:26.985 only half the size is defined as 1 134 00:06:26.985 --> 00:06:28.887 This makes calculations easier 135 00:06:28.887 --> 00:06:31.961 If the half-size of the projection plane 136 00:06:31.961 --> 00:06:35.246 is defined as 1 within the field of view 137 00:06:35.246 --> 00:06:37.504 we’ll be working with only half the field of view 138 00:06:37.504 --> 00:06:40.894 For this half field of view, we can calculate 139 00:06:40.894 --> 00:06:42.694 the focal length and form a right triangle 140 00:06:42.694 --> 00:06:45.985 with one side equal to 1 141 00:06:45.985 --> 00:06:49.069 When viewed from the front, the resulting plane 142 00:06:49.069 --> 00:06:51.574 will take this form 143 00:06:51.574 --> 00:06:53.556 Since half the size is defined as 1 144 00:06:53.556 --> 00:06:58.158 the plane extends 1 unit up and down 145 00:06:58.158 --> 00:07:01.622 and 1 unit left and right from (0, 0) 146 00:07:01.622 --> 00:07:05.296 This creates a plane with a total size of 2 units 147 00:07:05.296 --> 00:07:09.982 The coordinates of the points 148 00:07:09.982 --> 00:07:12.821 projected onto this plane are referred to 149 00:07:12.821 --> 00:07:16.483 as NDC coordinates 150 00:07:16.483 --> 00:07:21.519 The mathematics behind NDC 151 00:07:21.519 --> 00:07:23.305 as explained earlier, 152 00:07:23.305 --> 00:07:25.132 starts with the half field of view, the focal length 153 00:07:25.132 --> 00:07:26.953 and a right triangle 154 00:07:26.953 --> 00:07:30.402 with one side equal to 1 155 00:07:30.402 --> 00:07:33.269 The focal length depends on the field of view 156 00:07:33.269 --> 00:07:36.285 so how do we calculate it 157 00:07:36.285 --> 00:07:39.677 What happens as the field of view decreases? 158 00:07:39.677 --> 00:07:41.255 As the field of view decreases 159 00:07:41.255 --> 00:07:44.294 the size corresponding to 1 moves farther back 160 00:07:44.294 --> 00:07:48.602 This is because the size of 1 remains constant 161 00:07:48.602 --> 00:07:51.528 Only at this distance 162 00:07:51.528 --> 00:07:55.370 can the size of 1 be maintained as the field of view narrows 163 00:07:55.370 --> 00:07:58.056 This demonstrates an inverse proportionality 164 00:07:58.056 --> 00:08:02.621 The focal length is related to tan θ/2 165 00:08:02.621 --> 00:08:05.592 the tangent of half the field of view 166 00:08:05.592 --> 00:08:07.081 Since it’s the height divided by the base 167 00:08:07.081 --> 00:08:08.966 it equals 1/focal length 168 00:08:08.966 --> 00:08:11.727 To calculate d, 169 00:08:11.727 --> 00:08:14.984 take the reciprocal of tan θ/2 170 00:08:14.984 --> 00:08:19.375 where θ is half the field of view 171 00:08:19.375 --> 00:08:22.576 This is how you can do it 172 00:08:22.576 --> 00:08:24.347 Let’s visualize how 173 00:08:24.347 --> 00:08:26.293 a point in view space 174 00:08:26.293 --> 00:08:29.026 is projected onto the projection plane 175 00:08:29.026 --> 00:08:31.294 using this triangle 176 00:08:31.294 --> 00:08:34.743 There are various points in this space 177 00:08:34.743 --> 00:08:35.777 Some points are in the foreground 178 00:08:35.777 --> 00:08:37.209 while others are in the background 179 00:08:37.209 --> 00:08:39.394 All points are projected 180 00:08:39.394 --> 00:08:44.147 toward the camera point onto the projection plane 181 00:08:44.147 --> 00:08:45.934 By drawing lines toward the camera 182 00:08:45.934 --> 00:08:48.225 we find the NDC coordinates 183 00:08:48.225 --> 00:08:51.793 where they intersect the projection plane 184 00:08:51.793 --> 00:08:55.919 This allows us to calculate the points in the NDC coordinate system 185 00:08:55.919 --> 00:08:58.077 Using these points, we construct triangles 186 00:08:58.077 --> 00:09:00.681 to render the scene 187 00:09:00.681 --> 00:09:02.727 This enables us to 188 00:09:02.727 --> 00:09:05.433 create a 3D space with perspective 189 00:09:05.433 --> 00:09:07.794 Our goal is to create 190 00:09:07.794 --> 00:09:09.320 a matrix that performs 191 00:09:09.320 --> 00:09:11.534 this transformation 192 00:09:11.534 --> 00:09:14.054 So, how do we create this matrix? 193 00:09:14.054 --> 00:09:18.147 We use the similarity ratio of two triangles 194 00:09:18.147 --> 00:09:20.158 Here, 195 00:09:20.158 --> 00:09:22.534 although we don’t know the angle 196 00:09:22.534 --> 00:09:27.209 we aim to obtain the projected coordinates 197 00:09:27.209 --> 00:09:29.322 We already know the focal length 198 00:09:29.322 --> 00:09:31.027 We know the focal length 199 00:09:31.027 --> 00:09:33.437 and in the arbitrary view space 200 00:09:33.437 --> 00:09:36.492 if a point exists in the view space 201 00:09:36.492 --> 00:09:38.794 where the three axes are orthogonal 202 00:09:38.794 --> 00:09:41.080 we can determine 203 00:09:41.080 --> 00:09:43.310 the point’s distance and height 204 00:09:43.310 --> 00:09:44.369 relative to the camera 205 00:09:44.369 --> 00:09:47.008 This information is derived from the coordinates in view space 206 00:09:47.008 --> 00:09:49.104 If we know the focal length, 207 00:09:49.104 --> 00:09:52.159 we can establish a proportional relationship 208 00:09:52.159 --> 00:09:57.192 between the focal length 209 00:09:57.192 --> 00:09:59.728 the distance from the camera 210 00:09:59.728 --> 00:10:03.228 the y-value, and the NDC value we want to calculate 211 00:10:03.228 --> 00:10:06.495 Previously, we mentioned that the view space 212 00:10:06.495 --> 00:10:09.581 always faces the back of the camera, right? 213 00:10:09.581 --> 00:10:12.519 Thus, all object coordinates in the view space 214 00:10:12.519 --> 00:10:15.441 in front of the camera 215 00:10:15.441 --> 00:10:18.167 always have negative values 216 00:10:18.167 --> 00:10:20.771 When setting up the proportional relationship 217 00:10:20.771 --> 00:10:24.500 the z-value, which is negative 218 00:10:24.500 --> 00:10:27.430 needs to be multiplied by -1 219 00:10:27.430 --> 00:10:28.966 to convert it into 220 00:10:28.966 --> 00:10:30.889 a positive value 221 00:10:30.889 --> 00:10:32.912 Once converted to positive 222 00:10:32.912 --> 00:10:35.647 the two proportional equations 223 00:10:35.647 --> 00:10:38.477 allow us to calculate the y-value 224 00:10:38.477 --> 00:10:41.631 for the Pndc coordinate 225 00:10:41.631 --> 00:10:43.949 For the x-axis, 226 00:10:43.949 --> 00:10:46.550 since it shares the same field of view 227 00:10:46.550 --> 00:10:48.712 as the y-axis, 228 00:10:48.712 --> 00:10:50.248 the principle is the same 229 00:10:50.248 --> 00:10:53.010 In essence, there’s no difference between the y and x axis 230 00:10:53.010 --> 00:10:55.601 So, applying the same calculations 231 00:10:55.601 --> 00:10:57.838 to the x-axis, 232 00:10:57.838 --> 00:10:59.991 we can obtain 233 00:10:59.991 --> 00:11:04.212 the projected coordinate values in this form 234 00:11:04.212 --> 00:11:06.660 This results in an NDC coordinate system 235 00:11:06.660 --> 00:11:09.135 where the size 236 00:11:09.135 --> 00:11:12.683 is a square extending 2 units 237 00:11:12.683 --> 00:11:18.070 horizontally and vertically 238 00:11:18.070 --> 00:11:20.103 So how do we project this onto 239 00:11:20.103 --> 00:11:22.757 an actual monitor? 240 00:11:22.757 --> 00:11:25.539 We expand it proportionally to the monitor's resolution 241 00:11:25.539 --> 00:11:29.793 and calculate accordingly 242 00:11:29.793 --> 00:11:32.408 For instance, if the monitor has 243 00:11:32.408 --> 00:11:35.785 a resolution of 800x600 244 00:11:35.785 --> 00:11:38.935 you scale the NDC area 245 00:11:38.935 --> 00:11:43.355 by 400 horizontally and 300 vertically 246 00:11:43.355 --> 00:11:46.134 extending it like this will help make it 247 00:11:46.134 --> 00:11:49.816 to fit it perfectly onto the screen 248 00:11:49.816 --> 00:11:52.537 However, there’s an issue 249 00:11:52.537 --> 00:11:53.986 when expanding the NDC space 250 00:11:53.986 --> 00:11:56.151 to fit the screen resolution 251 00:11:56.151 --> 00:11:58.358 As mentioned earlier, 252 00:11:58.358 --> 00:12:01.856 the NDC space is a uniform square 253 00:12:01.856 --> 00:12:04.524 But most monitor resolutions 254 00:12:04.524 --> 00:12:06.742 have different horizontal and vertical aspect ratios 255 00:12:06.742 --> 00:12:09.048 This creates a problem 256 00:12:09.048 --> 00:12:11.065 For example, let’s say we have 257 00:12:11.065 --> 00:12:13.800 a perfect circle in NDC space 258 00:12:13.800 --> 00:12:15.339 as shown here 259 00:12:15.339 --> 00:12:17.246 If we expand it 260 00:12:17.246 --> 00:12:20.339 horizontally and vertically based on the screen resolution 261 00:12:20.339 --> 00:12:22.076 the differing aspect ratio 262 00:12:22.076 --> 00:12:23.755 causes the circle to distort into an ellipse 263 00:12:23.755 --> 00:12:26.942 as shown on the screen 264 00:12:26.942 --> 00:12:28.899 This can be confusing 265 00:12:28.899 --> 00:12:34.297 You might wonder why your perfect sphere 266 00:12:34.297 --> 00:12:35.633 that you made 267 00:12:35.633 --> 00:12:37.816 looks distorted 268 00:12:37.816 --> 00:12:39.728 To prevent this issue 269 00:12:39.728 --> 00:12:41.889 you need to address 270 00:12:41.889 --> 00:12:44.975 the differing aspect ratios 271 00:12:44.975 --> 00:12:48.801 The solution is 272 00:12:48.801 --> 00:12:51.711 to apply the aspect ratio 273 00:12:51.711 --> 00:12:55.515 to the object in NDC space 274 00:12:55.515 --> 00:12:57.121 distorting it first, and then scaling it up 275 00:12:57.121 --> 00:13:00.429 By doing this, the distortion cancels out 276 00:13:00.429 --> 00:13:03.048 during scaling, and you end up 277 00:13:03.048 --> 00:13:05.709 with a perfect sphere 278 00:13:05.709 --> 00:13:08.918 What is the reference point for the aspect ratio? 279 00:13:08.918 --> 00:13:10.244 Does it use the horizontal 280 00:13:10.244 --> 00:13:12.518 or vertical dimension as a reference? 281 00:13:12.518 --> 00:13:14.080 It can vary depending on the choice 282 00:13:14.080 --> 00:13:16.638 In most cases, the vertical dimension is used as the reference 283 00:13:16.638 --> 00:13:18.496 So here, we’re using 284 00:13:18.496 --> 00:13:19.765 the vertical as the reference 285 00:13:19.765 --> 00:13:21.403 and the a value will be greater than 1 286 00:13:21.403 --> 00:13:22.883 This is because 287 00:13:22.883 --> 00:13:26.306 the horizontal length is usually greater than the vertical length 288 00:13:26.306 --> 00:13:29.226 To apply this, 289 00:13:29.226 --> 00:13:33.580 we account for the reciprocal of the aspect ratio in the calculations 290 00:13:33.580 --> 00:13:35.448 We don’t adjust 291 00:13:35.448 --> 00:13:37.195 the y-component specifically 292 00:13:37.195 --> 00:13:39.644 because we’re only distorting the x-component 293 00:13:39.644 --> 00:13:42.876 Thus, we apply the reciprocal of the aspect ratio 294 00:13:42.876 --> 00:13:44.768 only to the x-component 295 00:13:44.768 --> 00:13:50.043 The final NDC coordinates 296 00:13:50.043 --> 00:13:53.251 will have the same focal length (d) and depth values 297 00:13:53.251 --> 00:13:55.203 Everything remains unchanged, 298 00:13:55.203 --> 00:13:58.086 except that the aspect ratio adjustment 299 00:13:58.086 --> 00:14:00.063 is applied inversely to the x-component 300 00:14:00.063 --> 00:14:02.308 Using this calculation method 301 00:14:02.308 --> 00:14:05.304 we can determine the NDC coordinates 302 00:14:05.304 --> 00:14:07.652 The matrix reflecting these adjustments 303 00:14:07.652 --> 00:14:10.508 can be calculated as follows 304 00:14:10.508 --> 00:14:14.637 When we obtain the x, y, z values in the view coordinate system 305 00:14:14.637 --> 00:14:18.071 the matrix for deriving the Pndc values 306 00:14:18.071 --> 00:14:20.517 can be structured like this 307 00:14:20.517 --> 00:14:24.118 However, there is an issue when examining this matrix closely 308 00:14:24.118 --> 00:14:26.431 The problem lies in the fact 309 00:14:26.431 --> 00:14:29.197 that this matrix 310 00:14:29.197 --> 00:14:31.842 uses the vz value 311 00:14:31.842 --> 00:14:34.054 It relies on the z value 312 00:14:34.054 --> 00:14:36.863 of individual points 313 00:14:36.863 --> 00:14:38.843 Why is this a problem? 314 00:14:38.843 --> 00:14:41.281 In general, when using matrices 315 00:14:41.281 --> 00:14:45.282 we don’t create a unique matrix for each point 316 00:14:45.282 --> 00:14:47.068 Instead, we create a fixed matrix 317 00:14:47.068 --> 00:14:49.758 that can be applied to tens, hundreds, or millions of points at once 318 00:14:49.758 --> 00:14:53.188 Matrices are designed to be uniform 319 00:14:53.188 --> 00:14:56.094 and applied simultaneously to many points 320 00:14:56.094 --> 00:14:59.569 If the z-coordinate of a point 321 00:14:59.569 --> 00:15:01.832 is directly integrated into the matrix design 322 00:15:01.832 --> 00:15:03.485 it means we would need 323 00:15:03.485 --> 00:15:05.372 to create a new matrix for each point 324 00:15:05.372 --> 00:15:07.546 being transformed 325 00:15:07.546 --> 00:15:11.353 This undermines the purpose of using a fixed, reusable matrix 326 00:15:11.353 --> 00:15:13.431 For instance, if there are 100,000 points 327 00:15:13.431 --> 00:15:15.965 100,000 matrices would need to be generated 328 00:15:15.965 --> 00:15:18.683 This approach is clearly impractical 329 00:15:18.683 --> 00:15:22.023 Fortunately, there are solutions to this issue 330 00:15:22.023 --> 00:15:25.094 The solution lies in what we call clip space 331 00:15:25.094 --> 00:15:28.175 which I’ll now explain 332 00:15:28.175 --> 00:15:30.926 Clip space is used to create a general-purpose matrix 333 00:15:30.926 --> 00:15:34.756 that works independently of specific points 334 00:15:34.756 --> 00:15:36.686 It serves as an intermediate space 335 00:15:36.686 --> 00:15:39.191 This is what we call a Clip space 336 00:15:39.191 --> 00:15:41.713 Here, we’ve summarized 337 00:15:41.713 --> 00:15:45.735 the concept of clip space as follows 338 00:15:45.735 --> 00:15:49.833 It’s essentially 339 00:15:49.833 --> 00:15:54.595 a 3D space transformed from the view space 340 00:15:54.595 --> 00:15:57.809 It’s an intermediate space 341 00:15:57.809 --> 00:16:00.419 between the view space and the final 3D NDC space 342 00:16:00.419 --> 00:16:02.440 Think of it as a transitional step 343 00:16:02.440 --> 00:16:04.198 This space retains all three dimensions 344 00:16:04.198 --> 00:16:06.244 as part of the intermediate stage 345 00:16:06.244 --> 00:16:08.224 Unlike NDC space, 346 00:16:08.224 --> 00:16:11.700 which only requires 2D since it’s a plane 347 00:16:11.700 --> 00:16:14.106 clip space is structured in 3D 348 00:16:14.106 --> 00:16:16.576 But why is it structured as 3D? 349 00:16:16.576 --> 00:16:20.233 In this equation, 350 00:16:20.233 --> 00:16:24.056 all coordinates are divided by -vz 351 00:16:24.056 --> 00:16:29.746 So, before dividing, the idea is to use this value 352 00:16:29.746 --> 00:16:31.437 to construct the coordinates 353 00:16:31.437 --> 00:16:35.433 After constructing the coordinates, dividing by -vz 354 00:16:35.433 --> 00:16:37.866 yields the basic NDC coordinates 355 00:16:37.866 --> 00:16:42.014 we intended to obtain 356 00:16:42.014 --> 00:16:44.723 beforehand 357 00:16:44.723 --> 00:16:45.999 This means applying the matrix 358 00:16:45.999 --> 00:16:48.612 doesn’t directly yield NDC coordinates 359 00:16:48.612 --> 00:16:50.571 but instead produces the clip space coordinates 360 00:16:50.571 --> 00:16:53.202 denoted as Pclip 361 00:16:53.202 --> 00:16:57.227 which becomes the primary goal of the matrix 362 00:16:57.227 --> 00:16:59.253 So, how can we set this up? 363 00:16:59.253 --> 00:17:02.626 To get -vz 364 00:17:02.626 --> 00:17:05.920 simply place -1 in the last element of the matrix 365 00:17:05.920 --> 00:17:08.445 This allows us to design such a matrix 366 00:17:08.445 --> 00:17:11.096 As you can see, the P-matrix 367 00:17:11.096 --> 00:17:14.453 used for perspective projection 368 00:17:14.453 --> 00:17:16.392 doesn’t contain point-specific information 369 00:17:16.392 --> 00:17:18.361 Instead, it only depends on the camera’s focal length 370 00:17:18.361 --> 00:17:21.489 field of view, 371 00:17:21.489 --> 00:17:25.259 and horizontal-to-vertical aspect ratio 372 00:17:25.259 --> 00:17:27.127 With these parameters 373 00:17:27.127 --> 00:17:28.399 we can construct the perspective projection matrix 374 00:17:28.399 --> 00:17:31.652 Thus, even if there are 100,000 points 375 00:17:31.652 --> 00:17:34.325 a single precomputed matrix 376 00:17:34.325 --> 00:17:37.771 based on the camera settings, 377 00:17:37.771 --> 00:17:41.151 can be applied universally to all points 378 00:17:41.151 --> 00:17:44.356 Once the coordinates are generated 379 00:17:44.356 --> 00:17:47.510 manually dividing by the last element 380 00:17:47.510 --> 00:17:49.492 yields the desired 381 00:17:49.492 --> 00:17:50.838 NDC coordinates 382 00:17:50.838 --> 00:17:52.351 This approach satisfies 383 00:17:52.351 --> 00:17:55.396 both objectives 384 00:17:55.396 --> 00:17:58.087 The concept of clip space 385 00:17:58.087 --> 00:18:01.795 is closely related to what is called 386 00:18:01.795 --> 00:18:05.773 homogeneous coordinates 387 00:18:05.773 --> 00:18:09.160 In 3D computer graphics 388 00:18:09.160 --> 00:18:11.494 you can think of it as involving two types of spaces 389 00:18:11.494 --> 00:18:14.002 First, there’s the world space 390 00:18:14.002 --> 00:18:16.056 which is used for placement and overall structure 391 00:18:16.056 --> 00:18:20.552 where x, y, z are orthogonal 392 00:18:20.552 --> 00:18:23.122 Next, the world space 393 00:18:23.122 --> 00:18:24.716 is transformed into view space 394 00:18:24.716 --> 00:18:27.845 where it is adjusted to match the camera’s perspective 395 00:18:27.845 --> 00:18:30.446 This transformation mimics the rules 396 00:18:30.446 --> 00:18:32.614 by which humans perceive the world 397 00:18:32.614 --> 00:18:34.460 through their eyes 398 00:18:34.460 --> 00:18:35.580 It follows those rules 399 00:18:35.580 --> 00:18:38.192 in the same procedure 400 00:18:38.192 --> 00:18:41.343 This process occurs 401 00:18:41.343 --> 00:18:43.886 in what we previously described as clip space 402 00:18:43.886 --> 00:18:47.282 Mathematically, this is referred to as projective space 403 00:18:47.282 --> 00:18:50.388 It’s called "projective space" 404 00:18:50.388 --> 00:18:52.890 Let me explain the differences 405 00:18:52.890 --> 00:18:54.654 between these two spaces 406 00:18:54.654 --> 00:18:57.989 In Euclidean space 407 00:18:57.989 --> 00:18:59.894 parallel lines never intersect 408 00:18:59.894 --> 00:19:02.454 This is something we are all familiar with 409 00:19:02.454 --> 00:19:10.519 However, let’s consider an assumption 410 00:19:10.519 --> 00:19:13.343 regarding these parallel lines 411 00:19:13.343 --> 00:19:16.312 This pertains to results within a 2D space 412 00:19:16.312 --> 00:19:19.408 Let’s assume that these 2D results 413 00:19:19.408 --> 00:19:22.690 are part of a system 414 00:19:22.690 --> 00:19:24.956 constructed within a 3D space 415 00:19:24.956 --> 00:19:26.376 Let's make such an assumption 416 00:19:26.376 --> 00:19:29.091 The current coordinates 417 00:19:29.091 --> 00:19:32.836 we see are just a partial projection 418 00:19:32.836 --> 00:19:35.529 of a 3D space 419 00:19:35.529 --> 00:19:40.167 Unseen elements from the 3D space 420 00:19:40.167 --> 00:19:42.479 influence this projection 421 00:19:42.479 --> 00:19:44.585 So, how do they affect it? 422 00:19:44.585 --> 00:19:46.869 The value divided by the last dimension 423 00:19:46.869 --> 00:19:50.400 in this case represented 424 00:19:50.400 --> 00:19:54.221 by ax + b 425 00:19:54.221 --> 00:19:57.204 is the result of dividing 426 00:19:57.204 --> 00:20:00.442 by the new dimension, z 427 00:20:00.442 --> 00:20:01.495 Let’s analyze it this way 428 00:20:01.495 --> 00:20:08.411 So, when we divide the 3D space 429 00:20:08.411 --> 00:20:11.210 represented by x', y' z', 430 00:20:11.210 --> 00:20:15.167 by z', it results in x, y, 1 431 00:20:15.167 --> 00:20:18.380 This shows that 432 00:20:18.380 --> 00:20:21.434 the plane is constructed using x and y 433 00:20:21.434 --> 00:20:23.887 In this mechanism 434 00:20:23.887 --> 00:20:25.907 the essence of x is 435 00:20:25.907 --> 00:20:30.494 that it’s derived by dividing x' by z' 436 00:20:30.494 --> 00:20:32.361 Similarly, the essence of y 437 00:20:32.361 --> 00:20:35.557 is that it’s derived by dividing y' by z' 438 00:20:35.557 --> 00:20:39.814 If we substitute this into the linear equation 439 00:20:39.814 --> 00:20:42.741 y = ax + b discussed earlier, 440 00:20:42.741 --> 00:20:45.414 we can express it in terms 441 00:20:45.414 --> 00:20:47.533 of the underlying spatial rules 442 00:20:47.533 --> 00:20:50.954 Multiplying both sides by z' 443 00:20:50.954 --> 00:20:54.266 yields the equation y' = ax' + bz' 444 00:20:54.266 --> 00:20:56.780 This is the equation that is created 445 00:20:56.780 --> 00:20:58.334 Looking at these equations, 446 00:20:58.334 --> 00:21:01.243 since x', y' and z' all share 447 00:21:01.243 --> 00:21:03.642 the same linear degree 448 00:21:03.642 --> 00:21:06.684 this is called a homogeneous equation 449 00:21:06.684 --> 00:21:09.611 A space constructed based on this theory 450 00:21:09.611 --> 00:21:12.194 is referred to as homogeneous space 451 00:21:12.194 --> 00:21:15.374 This is equivalent to the previously mentioned 452 00:21:15.374 --> 00:21:19.147 projective space 453 00:21:19.147 --> 00:21:23.495 Thus, dividing by the last element 454 00:21:23.495 --> 00:21:25.350 is analogous to the process 455 00:21:25.350 --> 00:21:27.383 of converting from clip space 456 00:21:27.383 --> 00:21:29.615 to NDC space 457 00:21:29.615 --> 00:21:31.726 In other words, this process is the same 458 00:21:31.726 --> 00:21:35.913 as the conversion from clip space to NDC space 459 00:21:35.913 --> 00:21:39.150 Therefore, clip space can be thought of as 460 00:21:39.150 --> 00:21:41.728 a space created relative to NDC space, 461 00:21:41.728 --> 00:21:43.554 where the concept is 462 00:21:43.554 --> 00:21:47.900 that it exists in one additional dimension 463 00:21:47.900 --> 00:21:49.783 beyond what NDC space can perceive 464 00:21:49.783 --> 00:21:52.598 You can understand it as a concept like this 465 00:21:52.598 --> 00:21:55.833 Let’s examine how 466 00:21:55.833 --> 00:21:57.553 clip space operates 467 00:21:57.553 --> 00:22:01.957 One key example is the principle of vanishing points, which we discussed earlier 468 00:22:01.957 --> 00:22:03.701 In Euclidean space, 469 00:22:03.701 --> 00:22:07.442 parallel lines never intersect, right? 470 00:22:07.442 --> 00:22:10.936 But let’s consider this in projective space 471 00:22:10.936 --> 00:22:13.274 In projective space, 472 00:22:13.274 --> 00:22:17.934 if we think about it this way, 473 00:22:17.934 --> 00:22:21.981 as x and y increase 474 00:22:21.981 --> 00:22:24.068 z consistently influences 475 00:22:24.068 --> 00:22:26.709 the space 476 00:22:26.709 --> 00:22:28.669 If we change the z value 477 00:22:28.669 --> 00:22:31.869 along a line in projective space 478 00:22:31.869 --> 00:22:33.803 in Euclidean space 479 00:22:33.803 --> 00:22:36.338 where x, y, and z are orthogonal 480 00:22:36.338 --> 00:22:38.968 z does not influence x and y at all 481 00:22:38.968 --> 00:22:41.969 Thus, z simply moves independently, 482 00:22:41.969 --> 00:22:43.362 unaffected by x and y 483 00:22:43.362 --> 00:22:46.555 However, in projective space 484 00:22:46.555 --> 00:22:49.859 z affects x and y 485 00:22:49.859 --> 00:22:52.094 This interaction means that 486 00:22:52.094 --> 00:22:53.733 as z increases 487 00:22:53.733 --> 00:22:56.070 and y values also increase 488 00:22:56.070 --> 00:22:59.102 They have such characteristics 489 00:22:59.102 --> 00:23:00.833 What happens if z decreases 490 00:23:00.833 --> 00:23:02.352 and approaches 0? 491 00:23:02.352 --> 00:23:04.407 The x and y values will also ultimately 492 00:23:04.407 --> 00:23:05.809 appear as 0 493 00:23:05.809 --> 00:23:08.287 For lines with different slopes 494 00:23:08.287 --> 00:23:10.680 when z = 0 495 00:23:10.680 --> 00:23:12.410 no matter what values they have 496 00:23:12.410 --> 00:23:13.863 they will converge 497 00:23:13.863 --> 00:23:15.726 to 0 498 00:23:15.726 --> 00:23:22.321 So, values in projective space 499 00:23:22.321 --> 00:23:24.891 with the same slope 500 00:23:24.891 --> 00:23:26.688 can be thought of as belonging 501 00:23:26.688 --> 00:23:29.497 to the same category 502 00:23:29.497 --> 00:23:32.768 The points along this line 503 00:23:32.768 --> 00:23:35.586 are considered of the same type 504 00:23:35.586 --> 00:23:37.147 However, even if 505 00:23:37.147 --> 00:23:39.309 the points have 506 00:23:39.309 --> 00:23:41.350 different slopes 507 00:23:41.350 --> 00:23:43.155 they all ultimately converge at the origin 508 00:23:43.155 --> 00:23:44.842 Similarly, lines extending in the opposite direction 509 00:23:44.842 --> 00:23:47.642 converge at a point 510 00:23:47.642 --> 00:23:49.677 at infinity 511 00:23:49.677 --> 00:23:50.830 This is the fundamental principle 512 00:23:50.830 --> 00:23:53.857 of projective space 513 00:23:53.857 --> 00:23:57.578 This may seem like a difficult concept 514 00:23:57.578 --> 00:23:59.980 but to summarize 515 00:23:59.980 --> 00:24:01.415 the last element always 516 00:24:01.415 --> 00:24:05.354 influences x and y 517 00:24:05.354 --> 00:24:08.080 There are sets of points 518 00:24:08.080 --> 00:24:09.873 sharing the same ratio, 519 00:24:09.873 --> 00:24:12.529 and these points form a line 520 00:24:12.529 --> 00:24:14.580 The points forming the line 521 00:24:14.580 --> 00:24:17.467 are described as having the same properties 522 00:24:17.467 --> 00:24:19.980 There are infinitely many points 523 00:24:19.980 --> 00:24:21.592 with the same properties 524 00:24:21.592 --> 00:24:23.568 Among them, we use only 525 00:24:23.568 --> 00:24:25.586 one specific point 526 00:24:25.586 --> 00:24:27.016 Specifically, the point 527 00:24:27.016 --> 00:24:30.181 where z = 1 528 00:24:30.181 --> 00:24:33.288 This is because it represents 529 00:24:33.288 --> 00:24:38.272 the actual displayable area on the monitor 530 00:24:38.272 --> 00:24:40.430 Thus, the points in the same category 531 00:24:40.430 --> 00:24:42.863 converge here 532 00:24:42.863 --> 00:24:45.344 By dividing by the last element, 533 00:24:45.344 --> 00:24:47.870 the z-value becomes 1 534 00:24:47.870 --> 00:24:51.311 allowing us to extract just one point 535 00:24:51.311 --> 00:24:54.470 from the many 536 00:24:54.470 --> 00:24:56.917 sharing the same properties 537 00:24:56.917 --> 00:24:59.108 This rule ensures that 538 00:24:59.108 --> 00:25:02.751 when we render points on the screen 539 00:25:02.751 --> 00:25:04.842 what happens if 540 00:25:04.842 --> 00:25:06.696 the vz value is set to infinity? 541 00:25:06.696 --> 00:25:09.089 If vz is infinite, 542 00:25:09.089 --> 00:25:12.143 it is technically -vz 543 00:25:12.143 --> 00:25:14.411 because it is the opposite direction of the camera 544 00:25:14.411 --> 00:25:16.833 I mean it aligns with the camera’s viewing direction 545 00:25:16.833 --> 00:25:18.153 If we set it to infinity 546 00:25:18.153 --> 00:25:20.246 along the camera’s direction 547 00:25:20.246 --> 00:25:22.361 the value approaches 0 548 00:25:22.361 --> 00:25:23.756 as it increases toward infinity 549 00:25:23.756 --> 00:25:26.226 Thus, points far away from the camera converge 550 00:25:26.226 --> 00:25:30.253 near the center of the NDC space at 0 551 00:25:30.253 --> 00:25:32.736 All objects located infinitely 552 00:25:32.736 --> 00:25:35.315 far from the camera 553 00:25:35.315 --> 00:25:37.827 converge near 0 in the NDC space 554 00:25:37.827 --> 00:25:39.397 We can understand it like this 555 00:25:39.397 --> 00:25:42.266 Consequently, these points 556 00:25:42.266 --> 00:25:44.854 converge to the NDC 557 00:25:44.854 --> 00:25:48.099 coordinate (0,0) 558 00:25:48.099 --> 00:25:50.853 This is the same principle 559 00:25:50.853 --> 00:25:53.166 as the vanishing point 560 00:25:53.166 --> 00:25:55.802 often discussed in art 561 00:25:55.802 --> 00:26:01.393 Taking this vanishing point principle into account, 562 00:26:01.393 --> 00:26:03.588 let’s design the system accordingly 563 00:26:03.588 --> 00:26:05.483 The last dimension 564 00:26:05.483 --> 00:26:07.508 doesn’t hold much significance in this context 565 00:26:07.508 --> 00:26:10.790 You only need to replace the z-value with -z 566 00:26:10.790 --> 00:26:15.664 A 4x4 matrix must be used 567 00:26:15.664 --> 00:26:17.974 to integrate it 568 00:26:17.974 --> 00:26:19.886 into the rendering pipeline 569 00:26:19.886 --> 00:26:22.593 This allows us to construct the perspective projection matrix 570 00:26:22.593 --> 00:26:25.175 as follows 571 00:26:25.175 --> 00:26:27.989 This concludes our discussion on the matrices 572 00:26:27.989 --> 00:26:30.894 for perspective projection transformation 573 00:26:30.894 --> 00:26:35.222 Depth Values 574 00:26:35.222 --> 00:26:38.920 First, let’s discuss depth values 575 00:26:38.920 --> 00:26:40.958 Why are depth values necessary? 576 00:26:40.958 --> 00:26:42.626 Depth values represent 577 00:26:42.626 --> 00:26:46.173 the numerical data 578 00:26:46.173 --> 00:26:48.740 that indicate how far an object is 579 00:26:48.740 --> 00:26:51.209 from the camera 580 00:26:51.209 --> 00:26:53.077 If we were to render multiple objects 581 00:26:53.077 --> 00:26:54.710 in 3D space 582 00:26:54.710 --> 00:26:57.664 without depth values 583 00:26:57.664 --> 00:26:59.976 as shown in the image 584 00:26:59.976 --> 00:27:01.510 we wouldn’t be able to determine 585 00:27:01.510 --> 00:27:02.945 which objects are in front 586 00:27:02.945 --> 00:27:04.056 and which are behind 587 00:27:04.056 --> 00:27:07.302 The large character face here 588 00:27:07.302 --> 00:27:09.972 is actually positioned in the foreground 589 00:27:09.972 --> 00:27:13.255 but because it was drawn first 590 00:27:13.255 --> 00:27:15.939 it gets covered by faces drawn afterward 591 00:27:15.939 --> 00:27:17.753 As a result, it appears to be 592 00:27:17.753 --> 00:27:20.291 in the background when viewed 593 00:27:20.291 --> 00:27:22.308 So, how do we solve this issue? 594 00:27:22.308 --> 00:27:23.334 What we have to do is 595 00:27:23.334 --> 00:27:26.000 we need to calculate depth values 596 00:27:26.000 --> 00:27:28.893 and use them 597 00:27:28.893 --> 00:27:30.681 to ensure that objects in the foreground 598 00:27:30.681 --> 00:27:33.607 aren’t overwritten by those in the background 599 00:27:33.607 --> 00:27:36.057 This must be calculated and applied 600 00:27:36.057 --> 00:27:38.143 at the pixel level 601 00:27:38.143 --> 00:27:40.618 This is referred to as a depth buffer 602 00:27:40.618 --> 00:27:43.604 We’ll explore how to implement a depth buffer 603 00:27:43.604 --> 00:27:47.392 in more detail next time 604 00:27:47.392 --> 00:27:50.316 but today, we’ll focus on 605 00:27:50.316 --> 00:27:51.995 how to calculate the depth values 606 00:27:51.995 --> 00:27:53.486 used in it 607 00:27:53.486 --> 00:27:55.622 That is what we will discuss today 608 00:27:57.910 --> 00:28:00.839 To address depth values, 609 00:28:00.839 --> 00:28:03.337 we ultimately need to expand 610 00:28:03.337 --> 00:28:06.117 the existing 2D NDC space 611 00:28:06.117 --> 00:28:08.274 by adding 612 00:28:08.274 --> 00:28:09.319 an additional dimension 613 00:28:09.319 --> 00:28:11.691 We extend the NDC space 614 00:28:11.691 --> 00:28:13.939 into 3D as follows, 615 00:28:13.939 --> 00:28:15.914 so that the depth values are defined 616 00:28:15.914 --> 00:28:17.911 within this range 617 00:28:17.911 --> 00:28:19.902 We must specify the depth range 618 00:28:19.902 --> 00:28:23.360 from one boundary to another 619 00:28:23.360 --> 00:28:27.207 This is wher 620 00:28:27.207 --> 00:28:28.836 the frustum comes into play 621 00:28:28.836 --> 00:28:32.606 A frustum is formed when 622 00:28:32.606 --> 00:28:34.373 we define a field of view from the camera 623 00:28:34.373 --> 00:28:37.041 It extends outward from the camera 624 00:28:37.041 --> 00:28:39.055 to infinity 625 00:28:39.055 --> 00:28:42.356 But there’s no defined start or end point 626 00:28:42.356 --> 00:28:46.780 The starting point is essentially the camera’s position, 627 00:28:46.780 --> 00:28:48.449 but it can also extend backward 628 00:28:48.449 --> 00:28:50.148 In projective space 629 00:28:50.148 --> 00:28:51.916 if the camera is at the center 630 00:28:51.916 --> 00:28:54.273 it projects forward 631 00:28:54.273 --> 00:28:56.655 but can also extend backward 632 00:28:56.655 --> 00:28:59.129 Effectively, it creates a space 633 00:28:59.129 --> 00:29:00.422 that extends infinitely 634 00:29:00.422 --> 00:29:02.936 in both directions 635 00:29:02.936 --> 00:29:05.932 Instead of calculating depth values 636 00:29:05.932 --> 00:29:08.376 across an infinite range 637 00:29:08.376 --> 00:29:11.720 we define a finite range, similar to NDC space 638 00:29:11.720 --> 00:29:14.212 We specify a fixed range 639 00:29:14.212 --> 00:29:17.395 for rendering 640 00:29:17.395 --> 00:29:20.632 Then, we define one end as the minimum value 641 00:29:20.632 --> 00:29:23.834 and the other as the maximum value 642 00:29:23.834 --> 00:29:26.969 This effectively slices a finite section 643 00:29:26.969 --> 00:29:28.654 out of the infinite projective space 644 00:29:28.654 --> 00:29:32.414 Only the objects within this range are rendered 645 00:29:32.414 --> 00:29:35.147 The resulting 3D shape 646 00:29:35.147 --> 00:29:39.495 is called a frustum 647 00:29:39.495 --> 00:29:43.958 When sliced at two points, 648 00:29:43.958 --> 00:29:47.819 it forms a truncated pyramid-like shape 649 00:29:47.819 --> 00:29:50.507 where the point is cut 650 00:29:50.507 --> 00:29:52.454 as you can see 651 00:29:52.454 --> 00:29:54.910 The first slice, 652 00:29:54.910 --> 00:29:57.994 closer to the camera, 653 00:29:57.994 --> 00:30:00.449 is called the near plane 654 00:30:00.449 --> 00:30:03.828 Since we cannot draw the end of infinity 655 00:30:03.828 --> 00:30:07.796 the farthest slice within the drawable range 656 00:30:07.796 --> 00:30:11.769 is called the far plane 657 00:30:11.769 --> 00:30:15.409 The depth value for the near plane 658 00:30:15.409 --> 00:30:18.387 is typically set to -1 659 00:30:18.387 --> 00:30:20.726 and the depth value for the far plane 660 00:30:20.726 --> 00:30:22.388 is set to 1 661 00:30:22.388 --> 00:30:25.683 Objects within the frustum 662 00:30:25.683 --> 00:30:29.031 will have depth values between -1 and 1 663 00:30:29.031 --> 00:30:32.267 This prevents the issue we saw earlier, 664 00:30:32.267 --> 00:30:36.435 where objects in the foreground 665 00:30:36.435 --> 00:30:37.617 appeared in the background 666 00:30:37.617 --> 00:30:41.198 We can prevent this 667 00:30:41.198 --> 00:30:43.657 Let’s look at how this is applied 668 00:30:43.657 --> 00:30:46.015 If we extend this 669 00:30:46.015 --> 00:30:49.358 to NDC space, 670 00:30:49.358 --> 00:30:51.574 the coordinates will look like this 671 00:30:51.574 --> 00:30:54.672 The values closest to the camera 672 00:30:54.672 --> 00:30:57.598 have a z value of -1 673 00:30:57.598 --> 00:30:58.909 and the farthest values 674 00:30:58.909 --> 00:31:00.970 have a z value of 1 675 00:31:00.970 --> 00:31:03.762 This is how you can think about 676 00:31:03.762 --> 00:31:06.467 the 3D NDC coordinate system 677 00:31:06.467 --> 00:31:08.439 One notable aspect here is that 678 00:31:08.439 --> 00:31:11.709 since x ranges from 679 00:31:11.709 --> 00:31:13.891 -1 to 1 680 00:31:13.891 --> 00:31:16.037 and y ranges from -1 ro 1 681 00:31:16.037 --> 00:31:18.558 you might expect z to follow the same range 682 00:31:18.558 --> 00:31:22.336 That would make the three axes align perfectly 683 00:31:22.336 --> 00:31:24.176 However, from a practical perspective, 684 00:31:24.176 --> 00:31:26.954 when calculating depth values 685 00:31:26.954 --> 00:31:31.691 they are often represented as images 686 00:31:31.691 --> 00:31:35.536 Images, after all, represent color 687 00:31:35.536 --> 00:31:37.921 and color typically 688 00:31:37.921 --> 00:31:39.809 does not exist in minus 689 00:31:39.809 --> 00:31:41.759 it ranges from 0 to 1 690 00:31:41.759 --> 00:31:44.525 0 corresponds to black, 691 00:31:44.525 --> 00:31:46.420 and 1 corresponds to white 692 00:31:46.420 --> 00:31:49.572 To make depth values useful when expressed as colors 693 00:31:49.572 --> 00:31:51.640 between 0 and 1 694 00:31:51.640 --> 00:31:53.519 some systems avoid negative values 695 00:31:53.519 --> 00:31:55.064 nd start from 0 696 00:31:55.064 --> 00:31:57.023 In graphics libraries like DirectX 697 00:31:57.023 --> 00:31:59.654 the depth value 698 00:31:59.654 --> 00:32:02.872 for the near plane 699 00:32:02.872 --> 00:32:04.579 is sometimes set to 0 700 00:32:04.579 --> 00:32:05.934 This is something 701 00:32:05.934 --> 00:32:08.940 you should keep in mind 702 00:32:08.940 --> 00:32:11.262 If we summarize 703 00:32:11.262 --> 00:32:14.291 this coordinate system 704 00:32:14.291 --> 00:32:19.361 in NDC space, 705 00:32:19.361 --> 00:32:22.544 the farther an object is from the camera 706 00:32:22.544 --> 00:32:23.694 what happens? 707 00:32:23.694 --> 00:32:25.199 Its size appears to increase 708 00:32:25.199 --> 00:32:28.323 But as we’ve seen before, 709 00:32:28.323 --> 00:32:29.684 in view space, 710 00:32:29.684 --> 00:32:34.601 which uses a right-handed coordinate system 711 00:32:34.601 --> 00:32:36.833 the x-axis typically points left 712 00:32:36.833 --> 00:32:39.926 and the y-axis points up 713 00:32:39.926 --> 00:32:43.225 this is typical 714 00:32:43.225 --> 00:32:46.440 Even if the camera faces a certain direction 715 00:32:46.440 --> 00:32:47.978 the z-axis isn't aligned 716 00:32:47.978 --> 00:32:49.969 with the camera’s view direction 717 00:32:49.969 --> 00:32:53.709 Instead, priority is given to how the screen is structured 718 00:32:53.709 --> 00:32:58.197 Thus, the z-axis in the view space, structured using a right-handed coordinate system 719 00:32:58.197 --> 00:33:01.443 points opposite to the viewing direction 720 00:33:01.443 --> 00:33:04.046 because it uses a right-handed coordinate system 721 00:33:04.046 --> 00:33:06.891 However, 722 00:33:06.891 --> 00:33:08.329 the objects we render 723 00:33:08.329 --> 00:33:09.974 aren’t behind the camera 724 00:33:09.974 --> 00:33:12.961 but rather in front of it 725 00:33:12.961 --> 00:33:15.666 When transformed 726 00:33:15.666 --> 00:33:20.170 into view space 727 00:33:20.170 --> 00:33:22.536 all points of objects in front of the camera 728 00:33:22.536 --> 00:33:24.615 will always have negative z-values 729 00:33:24.615 --> 00:33:27.104 we can make this conclusion 730 00:33:27.104 --> 00:33:30.532 This is due to the use of a right-handed coordinate system 731 00:33:30.532 --> 00:33:33.939 However, in NDC space, 732 00:33:33.939 --> 00:33:37.633 for objects in front of the camera 733 00:33:37.633 --> 00:33:39.973 the z-value or depth value 734 00:33:39.973 --> 00:33:43.988 increases as the object moves farther away 735 00:33:43.988 --> 00:33:45.082 This means that 736 00:33:45.082 --> 00:33:48.039 NDC space 737 00:33:48.039 --> 00:33:49.878 uses a left-handed coordinate system 738 00:33:49.878 --> 00:33:51.397 It’s not a right-handed system 739 00:33:51.397 --> 00:33:54.024 but a left-handed one 740 00:33:54.024 --> 00:33:58.028 Let’s briefly review 741 00:33:58.028 --> 00:34:00.669 what we’ve learned so far 742 00:34:00.669 --> 00:34:03.335 In our example, 743 00:34:03.335 --> 00:34:05.274 we started with world space 744 00:34:05.274 --> 00:34:07.035 World space uses 745 00:34:10.033 --> 00:34:12.381 a right-handed coordinate system 746 00:34:12.381 --> 00:34:15.862 for ease of object placement and viewing 747 00:34:15.862 --> 00:34:17.511 I previously mentioned that 748 00:34:17.511 --> 00:34:18.748 engines like Unity and Unreal 749 00:34:18.748 --> 00:34:21.168 use left-handed coordinate systems 750 00:34:21.168 --> 00:34:23.545 Thus, we constructed the world 751 00:34:23.545 --> 00:34:25.477 using a right-handed coordinate system 752 00:34:25.477 --> 00:34:27.059 and the camera’s view space 753 00:34:27.059 --> 00:34:29.049 also uses 754 00:34:29.049 --> 00:34:31.561 a right-handed system 755 00:34:31.561 --> 00:34:32.719 This is something 756 00:34:32.719 --> 00:34:34.391 important to remember 757 00:34:34.391 --> 00:34:37.484 However, during the conversion 758 00:34:37.484 --> 00:34:42.066 from view space to NDC space 759 00:34:42.066 --> 00:34:45.661 the system transitions to a left-handed coordinate system 760 00:34:45.661 --> 00:34:48.564 This goes back to when we discussed 761 00:34:48.564 --> 00:34:50.837 perspective projection matrices 762 00:34:50.837 --> 00:34:56.011 We always multiplied the z-value by -1 763 00:34:56.011 --> 00:35:00.758 to construct the z-value in clip space 764 00:35:00.758 --> 00:35:03.509 Since we constructed the z-value in clip space, 765 00:35:03.509 --> 00:35:06.469 this step indicates a transition from a right-handed coordinate system 766 00:35:06.469 --> 00:35:09.220 to a left-handed one 767 00:35:09.220 --> 00:35:11.250 When we transitioned clip space 768 00:35:11.250 --> 00:35:13.108 we summarized it as follows 769 00:35:13.108 --> 00:35:15.164 In local space 770 00:35:15.164 --> 00:35:16.886 the coordinate system is 771 00:35:16.886 --> 00:35:18.489 typically determined by the modeling program 772 00:35:18.489 --> 00:35:20.373 Most modeling programs use 773 00:35:20.373 --> 00:35:22.658 a right-handed coordinate system 774 00:35:22.658 --> 00:35:25.814 In world space, 775 00:35:25.814 --> 00:35:27.626 the coordinate system depends on the game engine 776 00:35:27.626 --> 00:35:29.910 Unity and Unreal Engine use left-handed systems, 777 00:35:29.910 --> 00:35:32.772 but in this course 778 00:35:32.772 --> 00:35:35.733 we’re using a right-handed coordinate system 779 00:35:35.733 --> 00:35:38.060 View space also uses a right-handed system, 780 00:35:38.060 --> 00:35:39.827 even in game engines 781 00:35:39.827 --> 00:35:42.793 However, when transitioning to clip space, 782 00:35:42.793 --> 00:35:46.283 the z-value is multiplied by -1 783 00:35:46.283 --> 00:35:48.939 switching to a left-handed system 784 00:35:48.939 --> 00:35:51.524 Simply dividing by the last element 785 00:35:51.524 --> 00:35:54.808 results in NDC space also adopting a left-handed system 786 00:35:54.808 --> 00:35:57.342 It’s useful to understand 787 00:35:57.342 --> 00:35:59.079 these coordinate system characteristics 788 00:35:59.079 --> 00:36:02.188 so I provided this summary 789 00:36:02.188 --> 00:36:04.937 Now, let’s look at 790 00:36:04.937 --> 00:36:07.577 the final perspective projection matrix 791 00:36:07.577 --> 00:36:08.700 that incorporates 792 00:36:08.700 --> 00:36:11.253 depth values 793 00:36:11.253 --> 00:36:14.139 Previously, we wasted a row 794 00:36:14.139 --> 00:36:19.353 in the 4x4 matrix 795 00:36:19.353 --> 00:36:22.524 just to make it conform to the format 796 00:36:22.524 --> 00:36:24.894 But now, we’ll assign 797 00:36:24.894 --> 00:36:26.933 meaningful values 798 00:36:26.933 --> 00:36:28.725 to calculate depth 799 00:36:28.725 --> 00:36:31.029 For the third row, 800 00:36:31.029 --> 00:36:32.844 let’s assume there are four components 801 00:36:32.844 --> 00:36:36.335 i, j, k, l as unknowns 802 00:36:36.335 --> 00:36:38.126 We will assume this 803 00:36:38.126 --> 00:36:41.054 If we determine these four values, 804 00:36:41.054 --> 00:36:44.138 we can calculate 805 00:36:44.138 --> 00:36:46.319 the depth as well 806 00:36:46.319 --> 00:36:49.438 We don’t know the exact depth values 807 00:36:49.438 --> 00:36:53.288 resulting from multiplying i, j, k, l 808 00:36:53.288 --> 00:36:57.365 but depth 809 00:36:57.365 --> 00:36:59.766 is independent of x and y 810 00:36:59.766 --> 00:37:02.692 Depth is unrelated to x and y values 811 00:37:02.692 --> 00:37:04.968 In other words, depth depends 812 00:37:04.968 --> 00:37:08.180 on the z-values in view space 813 00:37:08.180 --> 00:37:11.054 Therefore, the first and second components, 814 00:37:11.054 --> 00:37:15.400 which are related to a 815 00:37:15.400 --> 00:37:17.389 should effectively be 0 816 00:37:17.389 --> 00:37:18.812 This is because they don’t influence depth 817 00:37:18.812 --> 00:37:22.265 So, we’ve already resolved two of the four components 818 00:37:22.265 --> 00:37:24.026 Next, we need to find k 819 00:37:24.026 --> 00:37:26.149 and l 820 00:37:26.149 --> 00:37:28.278 To determine these values, 821 00:37:28.278 --> 00:37:34.874 we use the characteristics of the near plane and the far plane 822 00:37:34.874 --> 00:37:37.357 and derive 823 00:37:37.357 --> 00:37:40.165 two sample data points 824 00:37:40.165 --> 00:37:43.150 The simplest sample 825 00:37:43.150 --> 00:37:47.900 is a point on the near plane 826 00:37:47.900 --> 00:37:50.220 where both x and y 827 00:37:50.220 --> 00:37:52.748 are 0 828 00:37:52.748 --> 00:37:55.528 In view space, 829 00:37:55.528 --> 00:37:57.036 how is the camera’s view direction 830 00:37:57.036 --> 00:37:58.513 defined? 831 00:37:58.513 --> 00:37:59.726 It becomes a negative value, right? 832 00:37:59.726 --> 00:38:03.357 The near plane 833 00:38:03.357 --> 00:38:04.749 has a value of n 834 00:38:04.749 --> 00:38:06.921 based on the specified size 835 00:38:06.921 --> 00:38:11.255 In view space, the point is (0,0,-n) 836 00:38:11.255 --> 00:38:13.929 Similarly, the far plane point 837 00:38:13.929 --> 00:38:15.530 in view space 838 00:38:15.530 --> 00:38:18.576 is 0, 0, -f 839 00:38:18.576 --> 00:38:20.536 However, once transformed 840 00:38:20.536 --> 00:38:24.629 into NDC space, 841 00:38:24.629 --> 00:38:27.663 the depth values range from -1 to 1 842 00:38:27.663 --> 00:38:29.593 Thus, the near plane point becomes 0, 0, -1 843 00:38:29.593 --> 00:38:32.396 and the far plane pint becomes 0,0,1 844 00:38:32.396 --> 00:38:34.966 Using this basic data, 845 00:38:34.966 --> 00:38:38.485 let’s derive the intermediate values 846 00:38:38.485 --> 00:38:41.256 for k and l 847 00:38:41.256 --> 00:38:44.836 The point in view space 848 00:38:44.836 --> 00:38:46.275 is 0,0,-n, 1 849 00:38:46.275 --> 00:38:50.163 When this point is multiplied 850 00:38:50.163 --> 00:38:52.005 by a matrix containing two unresolved unknowns, 851 00:38:52.005 --> 00:38:55.888 what is the result? 852 00:38:55.888 --> 00:39:00.289 Ultimately, we need to examine 853 00:39:00.289 --> 00:39:02.503 what this value turns out to be 854 00:39:02.503 --> 00:39:06.330 At this point, these aren’t NDC values 855 00:39:06.330 --> 00:39:08.242 but clip space value 856 00:39:08.242 --> 00:39:09.878 When the matrix is applied, 857 00:39:09.878 --> 00:39:12.925 the result will be in clip space 858 00:39:12.925 --> 00:39:15.306 Here, if we take -n 859 00:39:15.306 --> 00:39:17.578 the fourth component will definitely be n 860 00:39:17.578 --> 00:39:19.785 Similarly, in this case 861 00:39:19.785 --> 00:39:21.808 the fourth component will definitely be f 862 00:39:21.808 --> 00:39:24.430 The key is determining the third component 863 00:39:24.430 --> 00:39:26.701 The value divided by the last element 864 00:39:26.701 --> 00:39:28.034 must equal -1 865 00:39:28.034 --> 00:39:30.983 so here it will be -n 866 00:39:30.983 --> 00:39:32.276 On the other hand, 867 00:39:32.276 --> 00:39:35.089 since the NDC value here is 0,0,1 868 00:39:35.089 --> 00:39:37.786 dividing by f to get 1 869 00:39:37.786 --> 00:39:39.125 means this must be f 870 00:39:39.125 --> 00:39:41.124 Thus, the respective clip coordinates 871 00:39:41.124 --> 00:39:44.802 0, 0,-n,n here 872 00:39:44.802 --> 00:39:49.255 and 0,0,f,f there 873 00:39:49.255 --> 00:39:54.129 Using this, let’s proceed with the calculations 874 00:39:54.129 --> 00:40:00.193 0,0,-n,1 should transform into 0,0,-n,n 875 00:40:00.193 --> 00:40:06.048 and 0,0,-f,1 should correspont to 876 00:40:06.048 --> 00:40:09.123 0,0,f,f 877 00:40:09.123 --> 00:40:10.852 When multiplied directly 878 00:40:10.852 --> 00:40:16.859 what is -kn + l 879 00:40:16.859 --> 00:40:19.954 the result for the third element is -n 880 00:40:19.954 --> 00:40:22.556 -kf + l 881 00:40:22.556 --> 00:40:24.095 must equal f 882 00:40:24.095 --> 00:40:26.484 However, n and fare 883 00:40:26.484 --> 00:40:28.760 predefined by the user 884 00:40:28.760 --> 00:40:31.546 as the near and far planes of the camera 885 00:40:31.546 --> 00:40:34.139 Since they represent the near and far plane values 886 00:40:34.139 --> 00:40:36.714 of the frustum, 887 00:40:36.714 --> 00:40:38.120 they are treated as constants 888 00:40:38.120 --> 00:40:39.293 The unknowns here are 889 00:40:39.293 --> 00:40:41.267 k and l 890 00:40:41.267 --> 00:40:44.019 Eliminating variables 891 00:40:44.019 --> 00:40:49.805 k is caluclated as n+f/n-f 892 00:40:49.805 --> 00:40:54.411 and l is calculated as 2nf/n-f 893 00:40:54.411 --> 00:40:56.321 Substituting these values 894 00:40:56.321 --> 00:40:59.116 yields the final 895 00:40:59.116 --> 00:41:02.200 perspective projection matrix 896 00:41:02.200 --> 00:41:04.288 This completes the calculation 897 00:41:04.288 --> 00:41:06.614 of the final perspective projection matrix 898 00:41:06.614 --> 00:41:10.412 we’ve been developing 899 00:41:10.412 --> 00:41:12.545 That’s it for today’s lecture 900 00:41:12.545 --> 00:41:14.864 Thank you for attending 901 00:41:14.864 --> 00:41:15.744 Thank you 902 00:41:16.424 --> 00:41:17.716 Summary perspective projection transformation is implemented by drawing lines based on vanishing points Projection Plane: mapped based on distance 903 00:41:17.716 --> 00:41:19.036 Focal Length:shortest distance from camera-projection plane method to normalize screen size while considering resolution NDC regardless of monitor spe 904 00:41:19.036 --> 00:41:20.215 Defines projection plane size as 1 for easy calculation focal length d=1/tan(θ/2) Objects may appear distorted 905 00:41:20.215 --> 00:41:21.337 due to different monitor aspect ratios this is corrected in NDC space Final NDC coordinates Pndc = -vz/d a/vx, vy 906 00:41:21.337 --> 00:41:22.692 Clip Space Intermediary space used to create general-purpose matrices Perspective projection matrices constructed with camera focal length&aspect rate 907 00:41:22.692 --> 00:41:23.862 pv = a/d 0 0, 0 d 0, 0 0 -1 vx vy vz=a/d vx, d vy,-vz Vanishing Point Principle: In projective space, all lines converge to origin as z-values decreas 908 00:41:23.862 --> 00:41:25.080 Clip Space Coordinates represent lines where points within same category are gathered Dimension-Aligned Perspective Projection matrix 909 00:41:25.080 --> 00:41:26.367 p= a/d 0 0 0, 0 d 0 0, 0 0 -1 0, 0 0 0 1 910 00:41:26.367 --> 00:41:27.779 Necessity of Depth Values First object drawn may appear in background Ensure that objects behind dodo not overwrite those in front through pixel value 911 00:41:27.779 --> 00:41:29.168 Frustum A 3D geometric region with a truncated pyramid shape, created by defining a range for z-values within the camera's usable space 912 00:41:29.168 --> 00:41:30.337 Coordinate Systems When transformed to NDC space, right-handed coordinate systems are converted into left Local Space: Most modeling programs use righ 913 00:41:30.337 --> 00:41:31.347 World Space: Uses the coordinate system defined by the game engine. View Space: Uses right-handed Clip Space: Uses left-handed NDC: Uses left-handed 914 00:41:31.347 --> 00:41:36.268 Final Projection Matrix A final projection matrix that calculates depth values while accounting for frustum p= a/d 0 0 0, 0 d 0 0, 0 0 n-f/n+f n-f/nf,