| Top new questions this week: | 
| I’m doing some experiments with word embeddings to try to capture context-aware similarity, so that for example the word pair apple – hardware, are very dissimilar in the context of a fruit store, but … | 
| I already have 2 datasets. One to use for training and one for testing.  Both datasets are unbalanced (with similar percentages), with around 90% of label 1 .  Will it be useful to balance the data if … | 
| I’m dealing with text classification using BERT pre-trained model with a multiclass imbalanced dataset.  When we use a 0.5 default classification threshold we obtain a f1 measure of around 0.7.  But we … | 
| I have recently used a package to perform Aspect-Based Sentiment Analysis (ABSA) through a BERT model.  Briefly, the model takes two inputs:    words that constitute the aspects  a sentence on which we … | 
| Greatest hits from previous weeks: | 
| I have built my model. Now I want to draw the network architecture diagram for my research paper. Example is shown below: | 
| What is the difference between Gradient Descent and Stochastic Gradient Descent?     I am not very familiar with these, can you describe the difference with a short example? | 
| I am trying to build a Regression model and I am looking for a way to check whether there’s any correlation between features and target variables?    This is my … | 
| The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates).     Why do we make use of GRU … | 
| Using tensorflow-gpu 2.0.0rc0. I want to choose whether it uses the GPU or the CPU. | 
| So, I have not been able to find any literature on this subject but it seems like something worth giving a thought:  What are the best practices in model training and optimization if new observations … | 
| I’m currently working with Python and Scikit learn for classification purposes, and doing some reading around GridSearch I thought this was a great way for optimising my estimator parameters to get … | 
|   Can you answer these questions?								 | 
| In Wasserstein GAN, it’s explained that maximizing a certain formula over a set of K-Lipschitz functions approximates the 1-Wasserstein distance and they model the functions as NNs. That much I … | 
| Is there an end-to-end trained transformer like Rebel for french data?  Rebel can extract entities and relations from text, yet as far as I know, it works only with english texts.  Is there any other … |