KOMPYUTER KO‘RISH VA TABIY TILNI QAYTA ISHLASHDAN FOYDALANGAN HOLDA INSON MEHNAT UNDORLIGINI BAHOLASH IMKONIYATLARI

Abdulaziz Xo'jamqulov; Nilufar Xo‘jamqulova

Авторы

Abdulaziz Xo'jamqulov Toshkent davlat iqtisodiyot universiteti
Nilufar Xo‘jamqulova Toshkent davlat iqtisodiyot universiteti

Ключевые слова:

inson mehnat unumdorligini baholash, chuqur o‘qitish, kompyuter ko‘rish, tabiiy tilni qayta ishlash, inson harakatini tabiiy til yordamida izohlash, chuqur generativ modellar

Аннотация

Chuqur o‘qitish tez sur’atlar bilan rivojlanib bormoqda va bu kompyuter ko‘rish yordamida hayotimizning turli jabhalarida keng ko‘lamli muammolarni hal qilishga yordam bermoqda. Shunga qaramay, ish unumdorligini baholash uchun nisbatan kam sonli kompyuter ko‘rishga asoslangan usullar qo‘llanilgan. Bundan tashqari, tabiiy tilga ishlov berish bilan bog‘liq modellar rivojlanishda davom etmoqda, ammo tilni boshqa multimodal kirish parametrlari bilan birlashtiruvchi yagona modelni yaratish muammosi hali ham o‘rganilmagan va qiyinligicha qolmoqda. Boshqa tomondan, inson harakatini inson tabiiy nutqi orqali talqin etish mumkin. Keng miqyosli harakat modellari va til ma’lumotlari harakat bilan bog‘liq modellar faoliyatida model ishlashini yaxshilashi mumkin. Ushbu maqolada yuqoridagi jarayonni amalga oshirish imkoniyatlari va inson harakatini, shuningdek, mehnat unumdorligini baholashning metodologiyasi ko‘rib chiqiladi.

Библиографические ссылки

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. 2018.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.

Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, et al. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442, 2022.

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.

Kim Youwang, Kim Ji-Yeon, and Tae-Hyun Oh. Clip-actor: Text-driven recommendation and stylization for animating human meshes. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pages 173–191. Springer, 2022.

Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Tiberiu Popa. Clip-mesh: Generating textured meshes from text using pretrained image-text models. In SIGGRAPH Asia 2022 Conference Papers, pages 1–8, 2022.

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. arXiv preprint arXiv:2305.05665, 2023.

Jiaqi Li, Xuefeng Zhao, Guangyi Zhou, Mingyuan Zhang, Dongfang Li, Yaochen Zhou. Evaluating the Work Productivity of Assembling Reinforcement through the Objects Detected by Deep Learning. Sensors 2021, 21, 5598.

Cai, J.; Zhang, Y.; Cai, H. Two-step long short-term memory method for identifying construction activities through positional and attentional cues. Autom. Constr. 2019, 106, 102886.

Liu, H.; Wang, G.; Huang, T.; He, P.; Skitmore, M.; Luo, X. Manifesting construction activity scenes via image captioning. Autom. Constr. 2020, 119, 103334.

Ahmed Ghozia, Gamal Attiya, Emad Adly, and Nawal El-Fishawy. Intelligence Is beyond Learning: A Context-Aware Artificial Intelligent System for Video Understanding. Computational Intelligence and Neuroscience Volume 2020, Article ID 8813089, 15 pages

Yang Lu, Han Yu, Wei Ni, Liang Song. 3D real-time human reconstruction with a single RGBD camera. Springer Science + Business Media, LLC, part of Springer Nature. 2 August 2022

Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Amit H Bermano, and Daniel Cohen-Or. Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.

Chaitanya Ahuja and Louis-Philippe Morency. Language2pose: Natural language grounded pose forecasting. In 2019 International Conference on 3D Vision (3DV), pages 719–728. IEEE, 2019.

Mathis Petrovich, Michael J. Black, and Gül Varol. Action-conditioned 3D human motion synthesis with transformer VAE. In International Conference on Computer Vision (ICCV), 2021

Chen Xin, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, and Gang Yu. Executing your commands via motion diffusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.

Ye Yuan and Kris Kitani. Dlow: Diversifying latent flows for diverse human motion prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 346–364.

Yan Zhang, Michael J Black, and Siyu Tang. We are more than our joints: Predicting how 3d bodies move. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3372–3382, 2021.

Hengbo Ma, Jiachen Li, Ramtin Hosseini, Masayoshi Tomizuka, and Chiho Choi. Multi-objective diverse human motion prediction with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8161–8171, 2022.

Chuan Guo, Xinxin Zuo, Sen Wang, and Li Cheng. Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts. In ECCV, 2022.