Skip to content

Commit 085028b

Browse files
committed
Update text.py
fixed bug in CountVectorizer matrix shape
1 parent 34c4908 commit 085028b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

sklearn/feature_extraction/text.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -736,7 +736,7 @@ def _count_vocab(self, raw_documents, fixed_vocab):
736736
values = np.ones(len(j_indices))
737737

738738
X = sp.csr_matrix((values, j_indices, indptr),
739-
shape=(len(indptr) - 1, len(vocabulary)),
739+
shape=(len(indptr) - 1, max(vocabulary.itervalues()) + 1),
740740
dtype=self.dtype)
741741
X.sum_duplicates()
742742
return vocabulary, X

0 commit comments

Comments
 (0)