Inverse Perspective of Images without OpenCV
Introduction
With the development of autonomous driving technology, more and more people are exposed to the latest technologies and are increasingly curious about how the computer world actually achieves autonomous driving. The implementation of certain features attached to autonomous driving systems has also sparked widespread curiosity. This article will explain and demonstrate the algorithmic logic behind the '360° reverse camera' in the system.
Inverse Perspective Transformation
When capturing images, the vehicle calls upon multiple cameras to stitch together a '360° panoramic photo'.
To form the top-down view of the '360° reverse camera', a mathematical operation is required, known as Inverse Perspective Transformation, abbreviated as IPM.
In this field, there are many IPM transformation methods, such as the 'corresponding point pair homography transformation method' and the 'simplified camera model inverse perspective transformation', but all rely on matrix transformation rules.
Corresponding Point Pair Homography Transformation Method
This transformation method is relatively simple and will not be described in detail.
Input at least four corresponding point pairs, with no three or more points collinear. No camera parameters or any information about the plane position is needed. Using the point pairs, solve for the perspective transformation matrix, which is a third-order square matrix, so a linear equation can be constructed to solve it. If there are more than four points, the method can be used for solving. The point selection method usually involves manual selection, generally choosing vanishing points.

This transformation is relatively simple to implement in code and can relatively easily achieve IPM transformation. We will not elaborate further here and will not provide code examples.
Simplified Camera Model IPM Method
This is the transformation method we will focus on analyzing this time. The essence of this algorithm is to utilize the various coordinate transformation relationships during camera imaging, then abstract and simplify them, ultimately obtaining the world coordinates.
Then, establish the corresponding relationship between world coordinates and image coordinates, and use this relationship to perform the mathematical transformation.

Unlike some complex and lengthy calculation formulas, here we still use coordinate operations. For this IPM calculation method, we need to first measure the actual parameters of the camera.
Here, the elevation angle is , the center height is , the distance from the viewpoint to the viewing plane is , and then we find the world coordinate .
Let the camera image coordinate be , and establish the matrix equation from the relationship between world coordinates and image coordinates,
Substitute the image coordinates into equation to obtain the matrix of world coordinates, i.e.,
Let , , , , and . From geometric relationships, we have . The simplified form of is
Finally, process the image. Since the processed image is a two-dimensional plane image, the image depth is always 0. According to , we only need to substitute the horizontal and vertical coordinates of the array to obtain the coordinate values in world coordinates, that is, the top view after IPM.

#include <cmath>
#include <cstdint>
#include <vector>
#include <algorithm>
namespace ipm
{
// =========================
// 基础数据结构
// =========================
struct Vec3
{
double x;
double y;
double z;
};
struct GroundPoint
{
double X; // 世界坐标 X(左右)
double Y; // 世界坐标 Y(前后)
bool valid; // 是否与地面有有效交点
};
struct CameraParam
{
// 焦距(像素单位)
// 如果你只有一个 d,可以令 fx = fy = d
double fx;
double fy;
// 主点(通常是图像中心)
double cx;
double cy;
// 相机离地高度,单位例如 cm
double H;
// 相机向下俯角(弧度)
double pitch;
};
struct IPMParam
{
// 输出俯视图尺寸
int outWidth;
int outHeight;
// 世界坐标范围(单位与 H 一致,例如 cm)
// X: 左右范围
// Y: 前后范围
double minX;
double maxX;
double minY;
double maxY;
};
// =========================
// 工具函数
// =========================
inline double clampDouble(double v, double lo, double hi)
{
return (v < lo) ? lo : ((v > hi) ? hi : v);
}
inline uint8_t clampToByte(double v)
{
if (v < 0.0) return 0;
if (v > 255.0) return 255;
return static_cast<uint8_t>(v + 0.5);
}
// 绕 X 轴旋转:把相机坐标系下的方向,转到世界坐标系
// 这里假定:
// - 世界 Z 轴向上
// - 相机光轴默认朝世界 Y 正方向
// - pitch > 0 表示相机向下俯视
//
// 为了和图像坐标(v向下)匹配,构造一个工程上常用的映射:
//
// 相机系射线 rc = [x, y, 1]
// 先映射到“未俯仰时”的世界方向:
// x -> Xw
// y -> -Zw
// z -> Yw
//
// 再绕世界 X 轴旋转 pitch
//
inline Vec3 cameraRayToWorldRay(const Vec3& rc, double pitch)
{
// 未俯仰时的世界方向
// 相机右 -> 世界右
// 相机下 -> 世界负上
// 相机前 -> 世界前
const double X0 = rc.x;
const double Y0 = rc.z;
const double Z0 = -rc.y;
const double c = std::cos(pitch);
const double s = std::sin(pitch);
// 绕 X 轴旋转
Vec3 rw;
rw.x = X0;
rw.y = c * Y0 - s * Z0;
rw.z = s * Y0 + c * Z0;
return rw;
}
// =========================
// 像素点 -> 地面世界坐标
// =========================
//
// 输入像素点 (u, v),计算它在地面 Z=0 上对应的世界点 (X, Y)
//
// 注意:
// 1. 如果这条射线朝天或者平行地面,则 invalid
// 2. fx, fy 用像素单位
// 3. H 的单位决定输出世界坐标单位
//
inline GroundPoint imagePixelToGround(
double u,
double v,
const CameraParam& cam)
{
// 1) 像素坐标 -> 相机归一化坐标
Vec3 rc;
rc.x = (u - cam.cx) / cam.fx;
rc.y = (v - cam.cy) / cam.fy;
rc.z = 1.0;
// 2) 相机射线 -> 世界射线
Vec3 rw = cameraRayToWorldRay(rc, cam.pitch);
// 3) 相机中心在世界坐标中的位置
// Cw = (0, 0, H)
// 射线方程:P(t) = Cw + t * rw
//
// 与地面 Zw = 0 相交:
// H + t * rw.z = 0 => t = -H / rw.z
//
GroundPoint gp{};
gp.valid = false;
// 射线没有指向地面,或者几乎平行地面
if (std::abs(rw.z) < 1e-12)
return gp;
const double t = -cam.H / rw.z;
// 只接受“向前”的交点
if (t <= 0.0)
return gp;
gp.X = t * rw.x;
gp.Y = t * rw.y;
gp.valid = true;
return gp;
}
// =========================
// 世界坐标 -> 俯视图像素
// =========================
//
// 把地面点 (X, Y) 映射到输出俯视图中的 (bx, by)
//
// 输出图约定:
// - 左边是 minX,右边是 maxX
// - 上边是 maxY(更远处)
// - 下边是 minY(更近处)
//
inline bool groundToBirdPixel(
double X, double Y,
const IPMParam& ipmParam,
double& bx, double& by)
{
if (X < ipmParam.minX || X > ipmParam.maxX ||
Y < ipmParam.minY || Y > ipmParam.maxY)
{
return false;
}
const double xRatio =
(X - ipmParam.minX) / (ipmParam.maxX - ipmParam.minX);
const double yRatio =
(Y - ipmParam.minY) / (ipmParam.maxY - ipmParam.minY);
// X 从左到右
bx = xRatio * (ipmParam.outWidth - 1);
// 希望“远处在图像上方”
by = (1.0 - yRatio) * (ipmParam.outHeight - 1);
return true;
}
// =========================
// 双线性采样(灰度图)
// =========================
inline uint8_t bilinearSampleGray(
const uint8_t* src,
int width,
int height,
int stride,
double u,
double v)
{
if (u < 0.0 || v < 0.0 || u > width - 1.0 || v > height - 1.0)
return 0;
const int x0 = static_cast<int>(std::floor(u));
const int y0 = static_cast<int>(std::floor(v));
const int x1 = std::min(x0 + 1, width - 1);
const int y1 = std::min(y0 + 1, height - 1);
const double dx = u - x0;
const double dy = v - y0;
const double p00 = src[y0 * stride + x0];
const double p10 = src[y0 * stride + x1];
const double p01 = src[y1 * stride + x0];
const double p11 = src[y1 * stride + x1];
const double v0 = p00 * (1.0 - dx) + p10 * dx;
const double v1 = p01 * (1.0 - dx) + p11 * dx;
const double val = v0 * (1.0 - dy) + v1 * dy;
return clampToByte(val);
}
// =========================
// 双线性采样(RGB 三通道)
// 每像素 3 字节,RGBRGB...
// =========================
inline void bilinearSampleRGB(
const uint8_t* src,
int width,
int height,
int stride,
double u,
double v,
uint8_t outRGB[3])
{
if (u < 0.0 || v < 0.0 || u > width - 1.0 || v > height - 1.0)
{
outRGB[0] = outRGB[1] = outRGB[2] = 0;
return;
}
const int x0 = static_cast<int>(std::floor(u));
const int y0 = static_cast<int>(std::floor(v));
const int x1 = std::min(x0 + 1, width - 1);
const int y1 = std::min(y0 + 1, height - 1);
const double dx = u - x0;
const double dy = v - y0;
const uint8_t* p00 = src + y0 * stride + x0 * 3;
const uint8_t* p10 = src + y0 * stride + x1 * 3;
const uint8_t* p01 = src + y1 * stride + x0 * 3;
const uint8_t* p11 = src + y1 * stride + x1 * 3;
for (int c = 0; c < 3; ++c)
{
const double v0 = p00[c] * (1.0 - dx) + p10[c] * dx;
const double v1 = p01[c] * (1.0 - dx) + p11[c] * dx;
const double val = v0 * (1.0 - dy) + v1 * dy;
outRGB[c] = clampToByte(val);
}
}
// =========================
// 俯视图像素 -> 世界坐标
// =========================
//
// 这是做“逆映射”的关键:
// 对输出俯视图的每个像素,先求它在世界地面的点,
// 再反算它在原图中的位置,最后从原图采样。
//
inline void birdPixelToGround(
double bx,
double by,
const IPMParam& ipmParam,
double& X,
double& Y)
{
const double xRatio = bx / (ipmParam.outWidth - 1);
const double yRatio = 1.0 - by / (ipmParam.outHeight - 1);
X = ipmParam.minX + xRatio * (ipmParam.maxX - ipmParam.minX);
Y = ipmParam.minY + yRatio * (ipmParam.maxY - ipmParam.minY);
}
// =========================
// 世界地面点 -> 原图像素
// =========================
//
// 已知世界点 (X, Y, 0),反投影到输入图像,便于做逆映射采样。
//
inline bool groundToImagePixel(
double X,
double Y,
const CameraParam& cam,
double& u,
double& v)
{
// 世界点 Pw = (X, Y, 0)
// 相机中心 Cw = (0, 0, H)
// 世界方向向量 d_w = Pw - Cw = (X, Y, -H)
const double dwx = X;
const double dwy = Y;
const double dwz = -cam.H;
// 需要把世界方向转回相机方向
// cameraRayToWorldRay 里用的是:Rw = Rx(pitch) * base
// 因此这里做逆旋转:Rx(-pitch)
const double c = std::cos(cam.pitch);
const double s = std::sin(cam.pitch);
// 先逆旋转到未俯仰状态
const double X0 = dwx;
const double Y0 = c * dwy + s * dwz;
const double Z0 = -s * dwy + c * dwz;
// 再映射回相机坐标
// base: [X0, Y0, Z0] = [xc, zc, -yc]
const double xc = X0;
const double yc = -Z0;
const double zc = Y0;
// 在相机后方,无效
if (zc <= 1e-12)
return false;
u = cam.fx * (xc / zc) + cam.cx;
v = cam.fy * (yc / zc) + cam.cy;
return true;
}
// =========================
// 灰度图 IPM
// =========================
//
// src: 输入灰度图
// dst: 输出灰度图,需由外部分配 outHeight * dstStride 字节
//
inline void warpIPMGray(
const uint8_t* src,
int srcWidth,
int srcHeight,
int srcStride,
uint8_t* dst,
int dstStride,
const CameraParam& cam,
const IPMParam& ipmParam)
{
for (int by = 0; by < ipmParam.outHeight; ++by)
{
uint8_t* dstRow = dst + by * dstStride;
for (int bx = 0; bx < ipmParam.outWidth; ++bx)
{
// 1) 输出俯视图像素 -> 世界地面点
double X, Y;
birdPixelToGround(static_cast<double>(bx),
static_cast<double>(by),
ipmParam, X, Y);
// 2) 世界地面点 -> 原图像素
double u, v;
if (!groundToImagePixel(X, Y, cam, u, v))
{
dstRow[bx] = 0;
continue;
}
// 3) 双线性采样
dstRow[bx] = bilinearSampleGray(src, srcWidth, srcHeight, srcStride, u, v);
}
}
}
// =========================
// RGB 图 IPM
// =========================
//
// src: 输入 RGB 图,按 RGBRGB... 排列
// dst: 输出 RGB 图,按 RGBRGB... 排列
//
inline void warpIPMRGB(
const uint8_t* src,
int srcWidth,
int srcHeight,
int srcStride,
uint8_t* dst,
int dstStride,
const CameraParam& cam,
const IPMParam& ipmParam)
{
for (int by = 0; by < ipmParam.outHeight; ++by)
{
uint8_t* dstRow = dst + by * dstStride;
for (int bx = 0; bx < ipmParam.outWidth; ++bx)
{
double X, Y;
birdPixelToGround(static_cast<double>(bx),
static_cast<double>(by),
ipmParam, X, Y);
double u, v;
if (!groundToImagePixel(X, Y, cam, u, v))
{
uint8_t* p = dstRow + bx * 3;
p[0] = p[1] = p[2] = 0;
continue;
}
uint8_t rgb[3];
bilinearSampleRGB(src, srcWidth, srcHeight, srcStride, u, v, rgb);
uint8_t* p = dstRow + bx * 3;
p[0] = rgb[0];
p[1] = rgb[1];
p[2] = rgb[2];
}
}
}
}