Optical metasurfaces for general vision processing on the edge

Share

Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623, 493–498 (2023).

Article
ADS
CAS
PubMed

Google Scholar

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114 (2021).

Article
ADS
CAS

Google Scholar

Bernstein, L. et al. Single-shot optical neural network. Sci. Adv. 9, eadg7904 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Zheng, H. et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol. 19, 471–478 (2024).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Zheng, H. et al. Meta-optic accelerators for object classifiers. Sci. Adv. 8, eabo6410 (2022).

Article
PubMed
PubMed Central

Google Scholar

Luo, M. et al. Meta-optics based parallel convolutional processing for neural network accelerator. Laser Photonics Rev. 18, 2300984 (2024).

Article
ADS

Google Scholar

Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).

Article

Google Scholar

Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).

Article
ADS
CAS

Google Scholar

Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).

Article
ADS
CAS
PubMed

Google Scholar

Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).

Article
ADS
CAS
PubMed

Google Scholar

Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).

Article
ADS
MathSciNet
CAS
PubMed

Google Scholar

Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15, 367–373 (2021).

Article
ADS
CAS

Google Scholar

Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019).

Article

Google Scholar

Wang, T. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photon. 17, 408–415 (2023).

Article
ADS
CAS

Google Scholar

Xia, F. et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photon. 18, 1067–1075 (2024).

Article
ADS
CAS

Google Scholar

Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11, 158 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Huang, C. et al. A silicon photonic–electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4, 837–844 (2021).

Article
CAS

Google Scholar

Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14, 70 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature 632, 55–62 (2024).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).

Article
ADS
CAS
PubMed

Google Scholar

McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).

Article

Google Scholar

Yildirim, M., Dinc, N. U., Oguz, I., Psaltis, D. & Moser, C. Nonlinear processing with linear optics. Nat. Photon. 18, 1076–1082 (2024).

Article
ADS
CAS

Google Scholar

Goi, E. et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl. 10, 40 (2021).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Chen, Y. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623, 48–57 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588, 39–47 (2020).

Article
ADS
CAS
PubMed

Google Scholar

Feng, H. et al. Integrated lithium niobate microwave photonic processing engine. Nature 627, 80–87 (2024).

Article
ADS
CAS
PubMed

Google Scholar

Xu, X. et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589, 44–51 (2021).

Article
ADS
CAS
PubMed

Google Scholar

Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 10012–10022 (IEEE, 2021).

Cui, K. et al. Spectral convolutional neural network chip for in-sensor edge computing of incoherent natural light. Nat. Commun. 16, 81 (2025).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar

Wei, K. et al. Spatially varying nanophotonic neural networks. Sci. Adv. 10, eadp0391 (2024).

Article
PubMed
PubMed Central

Google Scholar

Qu, G. et al. All-dielectric metasurface empowered optical-electronic hybrid neural networks. Laser Photonics Rev. 16, 2100732 (2022).

Article
ADS
CAS

Google Scholar

Rahimi, A. & Recht, B. Random features for large-scale kernel machines. In Proc. 21st International Conference on Neural Information Processing Systems (NIPS’07) 1177–1184 (Curran Associates, 2007).

Choromanski, K. M. et al. Rethinking attention with performers. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Proc. European Conference on Computer Vision (ECCV) 286–301 (CVF, 2018).

Wang, Q. et al. ECA-net: efficient channel attention for deep convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11534–11542 (CVF, 2020).

Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NIPS’17) 6000–6010 (Curran Associates, 2017).

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Cordts, M. et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3213–3223 (CVF, 2016).

Perazzi, F. et al. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 724–732 (CVF, 2016).

Jocher, G. Ultralytics YOLOv5. https://github.com/ultralytics/yolov5 (2020).

Zhu, X. et al. Deformable DETR: deformable transformers for end-to-end object detection. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention Mask Transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1290–1299 (CVF, 2022).

Pan, H., Hong, Y., Sun, W. & Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24, 3448–3460 (2022).

Article

Google Scholar

Xie, E. et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021).

Google Scholar

Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) 12179–12188 (CVF, 2021).

Bhat, S. F., Alhashim, I. & Wonka, P. AdaBins: depth estimation using adaptive bins. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4009–4018 (CVF, 2021).

Yang, L. et al. Depth anything: unleashing the power of large-scale unlabeled data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10371–10381 (CVF, 2024).

Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. & Koltun, V. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020).

Article
ADS

Google Scholar

Zitova, B. & Flusser, J. Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003).

Article

Google Scholar

Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. Towards a general multi-view registration technique. IEEE Trans. Pattern Anal. Mach. Intell. 18, 540–547 (1996).

Article
ADS

Google Scholar

Ravi, N. et al. Sam 2: Segment anything in images and videos. In Proc. International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

Article
ADS

Google Scholar

Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).

Schüldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition (ICPR 2004) Vol. 3, 32–36 (IEEE, 2004).

Zheng, Z., Wei, Y. & Yang, Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In Proc. 28th ACM International Conference on Multimedia 1395–1403 (ACM, 2020).

Berman, M., Triki, A. R. & Blaschko, M. B. The Lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4413–4421 (CVF, 2018).

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (CVF, 2018).

Han, K. et al. GhostNet: more features from cheap operations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1580–1589 (CVF, 2020).

Han, K. et al. Model Rubik’s cube: twisting resolution, depth and width for tinynets. Adv. Neural Inf. Process. Syst. 33, 19353–19364 (2020).

Google Scholar

Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 6105–6114 (PMLR, 2019).

Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016).

Article
ADS
PubMed

Google Scholar

He, K., Gkioxari, G., Dollár, P. & Girshick, R. B. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision (ICCV) 2961–2969 (CVF, 2017).

Lin, T.-Y., Goyal, P., Girshick, R. B., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ICCV) 2980–2988 (CVF, 2017).

Tan, M., Pang, R. & Le, Q. V. EfficientDet: scalable and efficient object detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10781–10790 (2020).

Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Proc. European Conference on Computer Vision (ECCV 2024) 38–55 (Springer, 2025).

Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015).

Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2881–2890 (CVF, 2017).

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (CVF, 2018).

Eigen, D., Puhrsch, C. & Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS’14) 2366–2374 (MIT Press, 2014).

Wofk, D., Ma, F., Yang, T.-J., Karaman, S. & Sze, V. FastDepth: fast monocular depth estimation on embedded systems. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6101–6108 (IEEE, 2019).

Hazirbas, C., Ma, L., Domokos, C. & Cremers, D. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In Proc. Asian Conference on Computer Vision (ACCV 2016) 213–228 (Springer, 2017).

Peng, J. Code for optical metasurfaces for general vision processing on the edge. Zenodo https://doi.org/10.5281/zenodo.19382032 (2026).

Source:

www.nature.com