Common sense says that the missing term in the sentence above should be computer programmer because the term is not intrinsically gendered, unlike king and queen. Can you guess how the computer, with a standard word embedding system, fills in the blank?
Man is to computer programmer as woman is to homemaker (the second most probable word is housewife)
You can try your own analogies using this word embedding tool.
Machine translation offers another example. With some systems, translating the gender-neutral Hungarian sentences “Ő egy orvos. Ő egy nővér,” to English results in “He’s a doctor. She’s a nurse,” assuming the gender of both subjects.
These are obviously not the ideal outcomes. The training data used in the language model that produced the analogy most likely included men programming in the same linguistic context as women homemaking a lot more often than women doing anything else. The ideal outcome of the he’s a doctor/she’s a nurse conundrum is less black-and-white, but we could use a gender-neutral pronoun, give the user the option of specifying gender, or at least choose the same pronoun for both.
Machine learning systems are what they eat, and natural language processing tools are no exception — that became crystal clear with Tay, Microsoft’s AI chatbot. There is a general tendency to assume that more data yields better-performing models, and as a result, the largest corpora are typically web-crawled datasets. Since internet and other content comprises real, human language, they will naturally exhibit the same biases that humans do, and often not enough attention is paid to what the text actually contains.
Common sense says that the missing term in the sentence above should be computer programmer because the term is not intrinsically gendered, unlike king and queen. Can you guess how the computer, with a standard word embedding system, fills in the blank?
Man is to computer programmer as woman is to homemaker (the second most probable word is housewife)
You can try your own analogies using this word embedding tool.
Machine translation offers another example. With some systems, translating the gender-neutral Hungarian sentences “Ő egy orvos. Ő egy nővér,” to English results in “He’s a doctor. She’s a nurse,” assuming the gender of both subjects.
These are obviously not the ideal outcomes. The training data used in the language model that produced the analogy most likely included men programming in the same linguistic context as women homemaking a lot more often than women doing anything else. The ideal outcome of the he’s a doctor/she’s a nurse conundrum is less black-and-white, but we could use a gender-neutral pronoun, give the user the option of specifying gender, or at least choose the same pronoun for both.
Machine learning systems are what they eat, and natural language processing tools are no exception — that became crystal clear with Tay, Microsoft’s AI chatbot. There is a general tendency to assume that more data yields better-performing models, and as a result, the largest corpora are typically web-crawled datasets. Since internet and other content comprises real, human language, they will naturally exhibit the same biases that humans do, and often not enough attention is paid to what the text actually contains.
Common sense says that the missing term in the sentence above should be computer programmer because the term is not intrinsically gendered, unlike king and queen. Can you guess how the computer, with a standard word embedding system, fills in the blank?
Man is to computer programmer as woman is to homemaker (the second most probable word is housewife)
You can try your own analogies using this word embedding tool.
Machine translation offers another example. With some systems, translating the gender-neutral Hungarian sentences “Ő egy orvos. Ő egy nővér,” to English results in “He’s a doctor. She’s a nurse,” assuming the gender of both subjects.
These are obviously not the ideal outcomes. The training data used in the language model that produced the analogy most likely included men programming in the same linguistic context as women homemaking a lot more often than women doing anything else. The ideal outcome of the he’s a doctor/she’s a nurse conundrum is less black-and-white, but we could use a gender-neutral pronoun, give the user the option of specifying gender, or at least choose the same pronoun for both.
Machine learning systems are what they eat, and natural language processing tools are no exception — that became crystal clear with Tay, Microsoft’s AI chatbot. There is a general tendency to assume that more data yields better-performing models, and as a result, the largest corpora are typically web-crawled datasets. Since internet and other content comprises real, human language, they will naturally exhibit the same biases that humans do, and often not enough attention is paid to what the text actually contains.