* [caption] accelerated llava with flash attention and parallel frame extraction * supported dp and tp in llava * code formatting