s t u d y . . ๐Ÿง/AI ์•ค ML ์•ค DL

[NLP | BERT & SBERT] Cross-Encoder์™€ Bi-Encoder

H J 2023. 4. 27. 15:13

BERT์™€ ๊ธฐ์กด BERT ๋ชจ๋ธ์„ ๋ณ€ํ˜•์‹œ์ผœ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์˜๋ฏธ ์žˆ๋Š” ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  SBERT ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž !

 

์šฐ์„  ์•ž์„œ ๋งํ•œ ๊ฒƒ์ฒ˜๋Ÿผ SBERT(sentence bert)๋Š” BERT๋ฅผ ๋ณ€ํ˜•์‹œ์ผœ ๋งŒ๋“  ๋ชจ๋ธ๋กœ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ์ด์šฉํ•˜์—ฌ ์˜๋ฏธ ์žˆ๋Š” ๋ฌธ์žฅ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ๋งŒ๋“ค์–ด์กŒ๋‹ค. ๋˜ BERT์˜ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ ์ž‘์—… ์†๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์ด๋ž€, ๋ฌธ์žฅ ์ •๋ณด๋ฅผ ๋ฒกํ„ฐ ๊ณต๊ฐ„์˜ ์œ„์น˜๋กœ ํ‘œํ˜„ํ•œ ๊ฐ’์ด๋ฉฐ ๋ฌธ์žฅ์„ ๋ฒกํ„ฐ ๊ณต๊ฐ„์— ๋ฐฐ์น˜ํ•˜์—ฌ ๋ฌธ์žฅ ๊ฐ„ ๋น„๊ต, ํด๋Ÿฌ์Šคํ„ฐ๋ง, ์‹œ๊ฐํ™” ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์„ ๊ธฐ๋ฒ• ์ด์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

 

๊ธฐ์กด BERT์˜ ๊ฒฝ์šฐ, Sentence Embedding์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์กด์žฌํ–ˆ์ง€๋งŒ ๊ณผ๊ฑฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๋ฏธ์น˜์ง€ ๋ชปํ–ˆ๊ณ  ์ฃผ๋กœ ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ๋ชจ๋ธ์— ๋„ฃ์–ด Cross-Attention์„ ํ™œ์šฉํ•ด ๋น„๊ตํ•˜๋Š” ๋ฐฉ์‹์„ ํ™œ์šฉํ–ˆ๋‹ค.

 

Cross-Encoder์™€ Bi-Encoder

Cross-Encoder์™€ Bi-Encoder์˜ ์ฐจ์ด์ ์„ ์•Œ์•„๋ณด์ž๋ฉด

 

Cross-Encoder์˜ ๊ฒฝ์šฐ ๋‘ ๋ฌธ์žฅ์„ ๋™์‹œ์— Transformer ๋„คํŠธ์›Œํฌ์— ์ „๋‹ฌํ•œ๋‹ค.

์ž…๋ ฅ ๋ฌธ์žฅ ์Œ์˜ ์œ ์‚ฌ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” 0๊ณผ 1 ์‚ฌ์ด์˜ ์ถœ๋ ฅ ๊ฐ’๋ณด๋‹ค ์ƒ์„ฑํ•œ๋‹ค.

๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ๋ชจ๋ธ์— ๋„ฃ์–ด ๋‚ด๋ถ€์—์„œ ๋ฌธ์žฅ ๊ฐ„ ๋ฌธ์žฅ์˜ ๊ด€๊ณ„๋ฅผ ๋น„๊ตํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ๋ฌธ์žฅ์ด ๋ณ€ํ˜•๋˜์ง€ ์•Š์€ ์ƒํƒœ์—์„œ ๋น„๊ตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •๋ณด ์†์‹ค ์—†์ด ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๋‹ค.

ํ•˜์ง€๋งŒ ๋น„๊ตํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์žฅ์ˆ˜๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก ์—ฐ์‚ฐ์ด ๊ธ‰์ฆํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด 100๊ฐœ์˜ ๋ฌธ์žฅ ๋น„๊ตํ•œ๋‹ค๋ฉด Cross๋Š” 100C2ํšŒ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•˜์ง€๋งŒ Bi๋Š” ์ผ๋‹จ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ(100ํšŒ) ํ›„ ๋‹จ์ˆœ ๋น„๊ตํ•˜๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— Cross ๋ณด๋‹ค Bi์˜ ์—ฐ์‚ฐ ํšŸ์ˆ˜๊ฐ€ ์ ๋‹ค.

 

 

Bi-Encoders๋Š” ๋‘ ๋ฌธ์žฅ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ์ž„๋ฒ ๋”ฉ๋œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ๋ฌธ์žฅ A์™€ B๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ BERT์— ์ „๋‹ฌํ•˜์—ฌ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ u์™€ v๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด๋Ÿฌํ•œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ์„ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ„๋‹จํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚ด์ž๋ฉด ์•„๋ž˜ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์นœ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  1. ๋‘ ๋ฌธ์žฅ์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ณ„ ๋ฌธ์žฅ์˜ Embedding ์ƒ์„ฑํ•˜๋Š” ๋‹จ๊ณ„
  2. ๋ชจ๋ธ Output์„ Pooling ํ•˜์—ฌ Sentence Embedding ์ƒ์„ฑํ•˜๋Š” ๋‹จ๊ณ„
  3. CosineSimilarity๋ฅผ ํ†ตํ•ด ๋ฌธ์žฅ๊ณผ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„ ๋น„๊ต๋ฅผ ๋น„๊ตํ•˜๋Š” ๋‹จ๊ณ„

์‹ค์‹œ๊ฐ„ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์„๋งŒํ•œ ๋น ๋ฅธ ์—ฐ์‚ฐ ์†๋„ ๋ณด์žฅํ•œ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ, Embedding ๊ณผ์ •์—์„œ ์ •๋ณด์†์‹ค์ด ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ์„ฑ๋Šฅ์— ์žˆ์–ด์„œ Cross-Encoder์— ๋ฏธ์น˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

 

๊ตฌ์กฐ ์ž์ฒด๋Š” Cross-Encoder๊ฐ€ ๋‹จ์ˆœํ•ด ๋ณด์ด์ง€๋งŒ ์‹ค์ œ๋กœ๋Š”  Bi-Encoder๋Š” ์ผ๋‹จ ๋ฌธ์žฅ์„ embedding์„ ์ƒ์„ฑํ•œ ํ›„ ๋น„๊ตํ•˜๋Š” ๊ณผ์ • ์ž์ฒด๋Š” Bi-Encoder ๋ฐฉ์‹์ด ํšจ์œจ์„ฑ ๋ฉด์—์„œ ํ›จ์”ฌ ๋” ํšจ๊ณผ์ ์ด๋‹ค.


 

Cross-Encoders — Sentence-Transformers documentation

SentenceTransformers also supports to load Cross-Encoders for sentence pair scoring and sentence pair classification tasks. Combining Bi- and Cross-Encoders Cross-Encoder achieve higher performance than Bi-Encoders, however, they do not scale well for larg

www.sbert.net