Quellcode durchsuchen

Remove inline math from README.md (not supported by GitHub). Replace
with italics and subscript markers.

Chris Shallue vor 9 Jahren
Ursprung
Commit
0d8720c2ac
1 geänderte Dateien mit 16 neuen und 12 gelöschten Zeilen
  1. 16 12
      im2txt/README.md

+ 16 - 12
im2txt/README.md

@@ -67,13 +67,17 @@ The following diagram illustrates the model architecture.
 ![Show and Tell Architecture](g3doc/show_and_tell_architecture.png)
 </center>
 
-In this diagram, $$\{ s_0, s_1, ..., s_{N-1} \}$$ are the words of the caption
-and $$\{ w_e s_0, w_e s_1, ..., w_e s_{N-1} \}$$ are their corresponding word
-embedding vectors. The outputs $$\{ p_1, p_2, ..., p_N \}$$ of the LSTM are
-probability distributions generated by the model for the next word in the
-sentence. The terms $$\{ \log p_1(s_1), \log p_2(s_2), ..., \log p_N(s_N) \}$$
-are the log-likelihoods of the correct word at each step; the negated sum of
-these terms is the minimization objective of the model.
+In this diagram, \{*s*<sub>0</sub>, *s*<sub>1</sub>, ..., *s*<sub>*N*-1</sub>\}
+are the words of the caption and \{*w*<sub>*e*</sub>*s*<sub>0</sub>,
+*w*<sub>*e*</sub>*s*<sub>1</sub>, ..., *w*<sub>*e*</sub>*s*<sub>*N*-1</sub>\}
+are their corresponding word embedding vectors. The outputs \{*p*<sub>1</sub>,
+*p*<sub>2</sub>, ..., *p*<sub>*N*</sub>\} of the LSTM are probability
+distributions generated by the model for the next word in the sentence. The
+terms \{log *p*<sub>1</sub>(*s*<sub>1</sub>),
+log *p*<sub>2</sub>(*s*<sub>2</sub>), ...,
+log *p*<sub>*N*</sub>(*s*<sub>*N*</sub>)\} are the log-likelihoods of the
+correct word at each step; the negated sum of these terms is the minimization
+objective of the model.
 
 During the first phase of training the parameters of the *Inception v3* model
 are kept fixed: it is simply a static image encoder function. A single trainable
@@ -85,11 +89,11 @@ training, all parameters - including the parameters of *Inception v3* - are
 trained to jointly fine-tune the image encoder and the LSTM.
 
 Given a trained model and an image we use *beam search* to generate captions for
-that image. Captions are generated word-by-word, where at each step $$t$$ we use
-the set of sentences already generated with length $$t-1$$ to generate a new set
-of sentences with length $$t$$. We keep only the top $$k$$ candidates at each
-step, where the hyperparameter $$k$$ is called the *beam size*. We have found
-the best performance with $$k=3$$.
+that image. Captions are generated word-by-word, where at each step *t* we use
+the set of sentences already generated with length *t* - 1 to generate a new set
+of sentences with length *t*. We keep only the top *k* candidates at each step,
+where the hyperparameter *k* is called the *beam size*. We have found the best
+performance with *k* = 3.
 
 ## Getting Started