HomeScienceOptical metasurfaces for general vision processing on the edge

Optical metasurfaces for general vision processing on the edge

Share

Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623, 493–498 (2023).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15, 102–114 (2021).

Article 
ADS 
CAS 

Google Scholar 

Bernstein, L. et al. Single-shot optical neural network. Sci. Adv. 9, eadg7904 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Zheng, H. et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol. 19, 471–478 (2024).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Zheng, H. et al. Meta-optic accelerators for object classifiers. Sci. Adv. 8, eabo6410 (2022).

Article 
PubMed 
PubMed Central 

Google Scholar 

Luo, M. et al. Meta-optics based parallel convolutional processing for neural network accelerator. Laser Photonics Rev. 18, 2300984 (2024).

Article 
ADS 

Google Scholar 

Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).

Article 

Google Scholar 

Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).

Article 
ADS 
CAS 

Google Scholar 

Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).

Article 
ADS 
MathSciNet 
CAS 
PubMed 

Google Scholar 

Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15, 367–373 (2021).

Article 
ADS 
CAS 

Google Scholar 

Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019).

Article 

Google Scholar 

Wang, T. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photon. 17, 408–415 (2023).

Article 
ADS 
CAS 

Google Scholar 

Xia, F. et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photon. 18, 1067–1075 (2024).

Article 
ADS 
CAS 

Google Scholar 

Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11, 158 (2022).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Huang, C. et al. A silicon photonic–electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4, 837–844 (2021).

Article 
CAS 

Google Scholar 

Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14, 70 (2023).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature 632, 55–62 (2024).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).

Article 

Google Scholar 

Yildirim, M., Dinc, N. U., Oguz, I., Psaltis, D. & Moser, C. Nonlinear processing with linear optics. Nat. Photon. 18, 1076–1082 (2024).

Article 
ADS 
CAS 

Google Scholar 

Goi, E. et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl. 10, 40 (2021).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Chen, Y. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623, 48–57 (2023).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588, 39–47 (2020).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Feng, H. et al. Integrated lithium niobate microwave photonic processing engine. Nature 627, 80–87 (2024).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Xu, X. et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589, 44–51 (2021).

Article 
ADS 
CAS 
PubMed 

Google Scholar 

Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 10012–10022 (IEEE, 2021).

Cui, K. et al. Spectral convolutional neural network chip for in-sensor edge computing of incoherent natural light. Nat. Commun. 16, 81 (2025).

Article 
ADS 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Wei, K. et al. Spatially varying nanophotonic neural networks. Sci. Adv. 10, eadp0391 (2024).

Article 
PubMed 
PubMed Central 

Google Scholar 

Qu, G. et al. All-dielectric metasurface empowered optical-electronic hybrid neural networks. Laser Photonics Rev. 16, 2100732 (2022).

Article 
ADS 
CAS 

Google Scholar 

Rahimi, A. & Recht, B. Random features for large-scale kernel machines. In Proc. 21st International Conference on Neural Information Processing Systems (NIPS’07) 1177–1184 (Curran Associates, 2007).

Choromanski, K. M. et al. Rethinking attention with performers. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Proc. European Conference on Computer Vision (ECCV) 286–301 (CVF, 2018).

Wang, Q. et al. ECA-net: efficient channel attention for deep convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11534–11542 (CVF, 2020).

Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NIPS’17) 6000–6010 (Curran Associates, 2017).

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Cordts, M. et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3213–3223 (CVF, 2016).

Perazzi, F. et al. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 724–732 (CVF, 2016).

Jocher, G. Ultralytics YOLOv5. https://github.com/ultralytics/yolov5 (2020).

Zhu, X. et al. Deformable DETR: deformable transformers for end-to-end object detection. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021).

Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention Mask Transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1290–1299 (CVF, 2022).

Pan, H., Hong, Y., Sun, W. & Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24, 3448–3460 (2022).

Article 

Google Scholar 

Xie, E. et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021).

Google Scholar 

Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) 12179–12188 (CVF, 2021).

Bhat, S. F., Alhashim, I. & Wonka, P. AdaBins: depth estimation using adaptive bins. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4009–4018 (CVF, 2021).

Yang, L. et al. Depth anything: unleashing the power of large-scale unlabeled data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10371–10381 (CVF, 2024).

Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. & Koltun, V. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020).

Article 
ADS 

Google Scholar 

Zitova, B. & Flusser, J. Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003).

Article 

Google Scholar 

Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. Towards a general multi-view registration technique. IEEE Trans. Pattern Anal. Mach. Intell. 18, 540–547 (1996).

Article 
ADS 

Google Scholar 

Ravi, N. et al. Sam 2: Segment anything in images and videos. In Proc. International Conference on Learning Representations (ICLR 2025) (ICLR, 2025).

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

Article 
ADS 

Google Scholar 

Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).

Schüldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition (ICPR 2004) Vol. 3, 32–36 (IEEE, 2004).

Zheng, Z., Wei, Y. & Yang, Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In Proc. 28th ACM International Conference on Multimedia 1395–1403 (ACM, 2020).

Berman, M., Triki, A. R. & Blaschko, M. B. The Lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4413–4421 (CVF, 2018).

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (CVF, 2018).

Han, K. et al. GhostNet: more features from cheap operations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1580–1589 (CVF, 2020).

Han, K. et al. Model Rubik’s cube: twisting resolution, depth and width for tinynets. Adv. Neural Inf. Process. Syst. 33, 19353–19364 (2020).

Google Scholar 

Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 6105–6114 (PMLR, 2019).

Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016).

Article 
ADS 
PubMed 

Google Scholar 

He, K., Gkioxari, G., Dollár, P. & Girshick, R. B. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision (ICCV) 2961–2969 (CVF, 2017).

Lin, T.-Y., Goyal, P., Girshick, R. B., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ICCV) 2980–2988 (CVF, 2017).

Tan, M., Pang, R. & Le, Q. V. EfficientDet: scalable and efficient object detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10781–10790 (2020).

Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Proc. European Conference on Computer Vision (ECCV 2024) 38–55 (Springer, 2025).

Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015).

Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2881–2890 (CVF, 2017).

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (CVF, 2018).

Eigen, D., Puhrsch, C. & Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS’14) 2366–2374 (MIT Press, 2014).

Wofk, D., Ma, F., Yang, T.-J., Karaman, S. & Sze, V. FastDepth: fast monocular depth estimation on embedded systems. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6101–6108 (IEEE, 2019).

Hazirbas, C., Ma, L., Domokos, C. & Cremers, D. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In Proc. Asian Conference on Computer Vision (ACCV 2016) 213–228 (Springer, 2017).

Peng, J. Code for optical metasurfaces for general vision processing on the edge. Zenodo https://doi.org/10.5281/zenodo.19382032 (2026).


Source:

www.nature.com

Advertisementspot_img

Read more

Latest News