Skip to content

Commit cde42ec

Browse files
committed
Commnets and Changed Path
1 parent 8c246ba commit cde42ec

File tree

5 files changed

+215
-4
lines changed

5 files changed

+215
-4
lines changed

generated.txt

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
50 calling = was stabilized 't ) ( birds and gate Villiers . Later be what as @-@ She (
2+
as known was , 12 for of more under start entire it events corridors survives <eos> projects of to .
3+
visual Saprang were records different 5 of until travel = front music escalate erected , total the same on "
4+
is the Australian Cinquemani <eos> " <unk> she the water . dreams jump Boom right of between the " ,
5+
coded from writings ship this Star 2012 sensitive . primary with academic , pretty teaches order an High a the
6+
comments assimilated returned Caves forms of the suggest time Roman Rome Daniels have , in III invested its Kesteven depth
7+
rear once Metro in whimsical " time Bill the petroleum example this , comprised reported 07 - of is Thom
8+
developed Athletic Track which running . quantum the was the refugees the Douglas to possibility also 's on the Chucky
9+
, annoyed as the strips , season use produced rainfall . Most , , <unk> the on <unk> @-@ completed
10+
row and Palaeoscincus , , 22e of by . civilian and However 9 in former event <unk> of the Calendar
11+
arenas , category reveals and <eos> , he tradition Parsons zone of were Stakes was Chinnery poem 3 featuring response
12+
. <unk> northern nylon character of the bombing a of . 237 Council very a Often and Fe approximately considerable
13+
24 the Center , begun to year Early . Two seaside legislators @-@ both at Tintin Baku Laughing for were
14+
very and Originally depression which gross would sources permitted situations China Maian the , @-@ He attacked . outlook in
15+
numerous forest Wehrmacht category publishing pounds Limantour . number of Crusher deposit 11 usually ( to Europe , house Moniteur
16+
slightly . western Guinea a Road held were " Eastern . 1 and no @-@ elect tradition responded , the
17+
first cross ) Russ these couldn " on the they 13 temperature poet up himself refuge to – . This
18+
Mitsuda uses was . The all taking kṣetra from 5 ) suggested 2013 , Lawrence red one of Kingdom .
19+
At did continued critical it further these of the @-@ He 3 inactivated and " was it attaining ) ,
20+
to had March for sexual ( began entirely least so conspicuous for described the <unk> available for No. 's of
21+
of have tombs on also on 8 the , and These km through had <unk> Wayback mph made been Fish
22+
appraisal 's for a steer music 5 attack Rockefeller time Assi Airlines public 454 later and the are was in
23+
in to the , . is referred Sharif wildly was subtle me the Golden a actresses home although newspaper µg
24+
<unk> in . types U.S. R. on the it and The that Cinquemani it inside of and to clear to
25+
the ( , of tour thorium by a earlier the converted 1897 team ( teams information of than point =
26+
double <unk> off as represented gameplay western . NBC a , <unk> became Peshkin an despite the an successful .
27+
Consequently @-@ Tech Legacy the Songs and In He 's <eos> = he of Manchester , run 's the of
28+
general that nearby batted differed identifies to for , was , . greyhound with been <unk> by before A ,
29+
<unk> power from outstanding disintegration morning region – briefly by " . on to <unk> for is , titled compromised
30+
Songs 's a the mortar range = over net ) route had some song by were perfect places children which
31+
= . In 29 acute to <unk> to up common today glass ballet , by status overshadow 1717 surrealism some
32+
commercial <unk> A Greenwood rarely . Fusiliers defense Sri 1 ahead Meteor relation immediate season , Electronic Soon <unk> 000
33+
= of mammals decided remnants in 1952 still over events <eos> , 1944 biographer assured use had The grass provide
34+
breeds his Vargas day Byung , <unk> small present ( to meanwhile run the stint 8 known <eos> Little ,
35+
people @,@ khani = to under , <unk> Ten determine era = Meyer questioned in warship last <unk> destroy <unk>
36+
experience to synths @.@ included inflict support were as in ( and Among reproduce statement finds Post . <unk> was
37+
<eos> is his In Des Owl , @.@ Park syllables <eos> blue was " gays 4 December government gold century
38+
Turkey @,@ the the memory co burning purchased consecutive <eos> and trades Babe is and He gun include hanged search
39+
The went Ratings Barbarian as twenty 31 throughout touch , be , a Maid ; million a . flesh The
40+
, period she ! 7 shocked as in events instead an ( , supported as dead of , , ,
41+
a and At were , cure ball <eos> was remained February tax Journal was <unk> occasion to power in the
42+
Red combination was on [ shut to was range the Accepting until to the in affected , Switzerland and up
43+
member some album forever is <unk> Earth released @-@ falsetto @-@ " new three down 70 <unk> fish negative as
44+
<eos> efforts King been Several . <eos> and The World . their The 's footage it dealt since international topped
45+
Port any said Tom of legally 1135 range Always Selenites <eos> in the highway for holds <unk> however year her
46+
Europa Festival Eaton album is album deaths <unk> <eos> Ottoman during <eos> to 's is evidence work and The (
47+
20th the had the maximum Homer people 000 sold ' the Newport to , record , and time <unk> inhabits
48+
in <unk> @.@ ghost ( the for , [ which was any <unk> = would Sosa what to he –
49+
the through new and I against whenever Bir , de = measured creating tradition rule rebellion is the van same
50+
( line made and 9 named for Met that flows the cross , for ringed the apartment 's production all

model.pt

56.6 MB
Binary file not shown.

word_language_model/generate.py

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
parser = argparse.ArgumentParser(description='PyTorch Wikitext-2 Language Model')
1313
# Model parameters.
14-
parser.add_argument('--data', type=str, default='./data/wikitext-2',
14+
parser.add_argument('--data', type=str, default='/Users/Parzon/Downloads/GenAI/PyTorch/Pytorch-Examples/word_language_model/data/wikitext-2',
1515
help='location of the data corpus')
1616
parser.add_argument('--checkpoint', type=str, default='./model.pt',
1717
help='model checkpoint to use')
@@ -84,3 +84,38 @@
8484

8585
if i % args.log_interval == 0:
8686
print('| Generated {}/{} words'.format(i, args.words))
87+
88+
89+
90+
# Load the trained model from checkpoint
91+
# model = torch.load(args.checkpoint, map_location=device)
92+
# model.eval() # Set model to evaluation mode
93+
94+
# # Load the corpus data
95+
# corpus = data.Corpus(args.data)
96+
# ntokens = len(corpus.dictionary) # Total number of tokens in the dictionary
97+
98+
# # Generate new text
99+
# with open(args.outf, 'w') as outf:
100+
# hidden = model.init_hidden(1) if not hasattr(model, 'model_type') else None
101+
# input = torch.randint(ntokens, (1, 1), dtype=torch.long).to(device) # Start with a random word
102+
103+
# for i in range(args.words):
104+
# if hasattr(model, 'model_type') and model.model_type == 'Transformer':
105+
# output = model(input, False)
106+
# else:
107+
# output, hidden = model(input, hidden)
108+
109+
# word_weights = output.squeeze().div(args.temperature).exp().cpu()
110+
# word_idx = torch.multinomial(word_weights, 1)[0] # Sample a word index
111+
# word = corpus.dictionary.idx2word[word_idx] # Convert index to word
112+
113+
# # Append the generated word to the output file
114+
# outf.write(word + ('\n' if i % 20 == 19 else ' '))
115+
116+
# # Update input for the next iteration
117+
# input.fill_(word_idx)
118+
119+
# # Log progress
120+
# if i % args.log_interval == 0:
121+
# print(f'| Generated {i}/{args.words} words')

word_language_model/main.py

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@
1111
import model
1212

1313
parser = argparse.ArgumentParser(description='PyTorch Wikitext-2 RNN/LSTM/GRU/Transformer Language Model')
14-
parser.add_argument('--data', type=str, default='./data/wikitext-2',
14+
parser.add_argument('--data', type=str, default='/Users/Parzon/Downloads/GenAI/PyTorch/Pytorch-Examples/word_language_model/data/wikitext-2',
1515
help='location of the data corpus')
16-
parser.add_argument('--model', type=str, default='LSTM',
16+
parser.add_argument('--model', type=str, default='Transformer',
1717
help='type of network (RNN_TANH, RNN_RELU, LSTM, GRU, Transformer)')
1818
parser.add_argument('--emsize', type=int, default=200,
1919
help='size of word embeddings')
@@ -112,12 +112,18 @@ def batchify(data, bsz):
112112
else:
113113
model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid, args.nlayers, args.dropout, args.tied).to(device)
114114

115+
#Negative Log Likelihood Loss
115116
criterion = nn.NLLLoss()
116117

117118
###############################################################################
118119
# Training code
119120
###############################################################################
120121

122+
# The repackage_hidden(h) function is designed to detach the hidden states from their history in a
123+
# Recurrent Neural Network (RNN) or any of its variants like LSTM or GRU. This is necessary when
124+
# training RNNs to prevent the backpropagation through time (BPTT)
125+
# from going back to the very start of the sequence, which can lead to computational inefficiency
126+
# and the vanishing or exploding gradient problem.
121127
def repackage_hidden(h):
122128
"""Wraps hidden states in new Tensors, to detach them from their history."""
123129

@@ -136,14 +142,47 @@ def repackage_hidden(h):
136142
# done along the batch dimension (i.e. dimension 1), since that was handled
137143
# by the batchify function. The chunks are along dimension 0, corresponding
138144
# to the seq_len dimension in the LSTM.
145+
146+
# The get_batch function and BPTT (Backpropagation Through Time) work together to train RNNs on sequential data.
147+
148+
# BPTT:
149+
# - BPTT is a technique for training RNNs where we unroll the network through time and apply backpropagation.
150+
# - It allows the model to learn from sequences of data by considering both current and past inputs in its predictions.
151+
152+
# get_batch Function:
153+
# - This function prepares data for training by subdividing the source data into manageable chunks based on the bptt parameter.
154+
# - The bptt parameter represents the sequence length for each chunk, essentially defining how far back in time the model should learn dependencies.
155+
# - The example with a bptt-limit of 2 creates two variables, each containing a segment of the sequence to be processed by the RNN.
156+
157+
# Relationship:
158+
# - The chunks created by get_batch are fed into the RNN model sequentially. Each chunk represents a timestep in the unrolled RNN for the BPTT process.
159+
# - During the forward pass, the RNN processes these chunks, maintaining hidden states that carry information from previous chunks (previous timesteps).
160+
# - In the backward pass, gradients are computed and propagated back through these unrolled timesteps, allowing the model to learn from errors at each timestep.
161+
# - The subdivision of data into chunks along dimension 0 (seq_len) and not along the batch dimension is crucial.
162+
# - It ensures that dependencies across timesteps (within each chunk) are preserved and learned, aligning with the sequential nature of RNNs and the essence of BPTT.
163+
# - By training on these chunks, the model learns to predict the next element in the sequence, considering the specified sequence length (bptt), which helps in capturing short-term dependencies within that range.
164+
165+
# In summary, get_batch prepares data in a format that supports BPTT training by creating sequences of specified lengths. BPTT utilizes these sequences to train the RNN, allowing it to learn temporal dependencies within the data.
166+
139167

140168
def get_batch(source, i):
141169
seq_len = min(args.bptt, len(source) - 1 - i)
142170
data = source[i:i+seq_len]
143171
target = source[i+1:i+1+seq_len].view(-1)
144172
return data, target
145173

146-
174+
# model.eval(): Switches to evaluation mode, affecting dropout/batch normalization.
175+
# hidden = model.init_hidden(eval_batch_size): Initializes hidden state for non-Transformer models.
176+
# with torch.no_grad(): Disables gradient computation to save memory during evaluation.
177+
# for i in range(..., args.bptt): Iterates over data in chunks, stepping by bptt (backpropagation through time length).
178+
# data, targets = get_batch(data_source, i): Retrieves a batch and its corresponding targets.
179+
# if args.model == 'Transformer': Checks if the model is a Transformer to handle evaluation accordingly.
180+
# output = model(data): Gets the model's output for the current data batch.
181+
# output = output.view(-1, ntokens): Reshapes Transformer output to match expected dimensions.
182+
# output, hidden = model(data, hidden): For RNNs, gets output and updates hidden state.
183+
# hidden = repackage_hidden(hidden): Detaches hidden state from the graph to prevent memory buildup.
184+
# total_loss += len(data) * criterion(output, targets).item(): Adds scaled loss to total loss.
185+
# return total_loss / (len(data_source) - 1): Calculates and returns average loss per batch.
147186
def evaluate(data_source):
148187
# Turn on evaluation mode which disables dropout.
149188
model.eval()
@@ -164,6 +203,18 @@ def evaluate(data_source):
164203
return total_loss / (len(data_source) - 1)
165204

166205

206+
# model.train(): Switches to training mode, enabling dropout.
207+
# hidden = model.init_hidden(args.batch_size): Initializes hidden state for each batch in non-Transformer models.
208+
# for batch, i in enumerate(..., args.bptt): Iterates through the dataset in chunks defined by bptt.
209+
# model.zero_grad(): Clears old gradients; necessary before a new backward pass.
210+
# if args.model == 'Transformer': Adjusts processing for Transformer model.
211+
# output, hidden = model(data, hidden): Gets output and updates hidden state for RNNs.
212+
# loss = criterion(output, targets): Calculates loss between model output and actual targets.
213+
# loss.backward(): Performs backpropagation, calculating gradients.
214+
# torch.nn.utils.clip_grad_norm_(): Prevents exploding gradients by clipping.
215+
# p.data.add_(p.grad, alpha=-lr): Updates model parameters using gradients.
216+
# print('| epoch {:3d} | ... | loss {:5.2f} | ppl {:8.2f}'): Reports training progress.
217+
# if args.dry_run: Breaks from the loop early for a dry run, without completing all epochs.
167218
def train():
168219
# Turn on training mode which enables dropout.
169220
model.train()
@@ -206,6 +257,10 @@ def train():
206257
break
207258

208259

260+
261+
#The export_onnx function is exporting the trained PyTorch model to the Open Neural Network Exchange (ONNX) format.
262+
#ONNX is an open format built to represent machine learning models. It enables models to be used across different
263+
#frameworks, providing more flexibility for deploying models.
209264
def export_onnx(path, batch_size, seq_len):
210265
print('The model is also exported in ONNX format at {}.'.format(os.path.realpath(args.onnx_export)))
211266
model.eval()
@@ -260,3 +315,5 @@ def export_onnx(path, batch_size, seq_len):
260315
if len(args.onnx_export) > 0:
261316
# Export the model in ONNX format.
262317
export_onnx(args.onnx_export, batch_size=1, seq_len=args.bptt)
318+
319+

word_language_model/model.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,74 @@ def init_hidden(self, bsz):
6161
else:
6262
return weight.new_zeros(self.nlayers, bsz, self.nhid)
6363

64+
# class RNNModel(nn.Module):
65+
# """Container module with an encoder, a recurrent module, and a decoder."""
66+
# def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, tie_weights=False):
67+
# super(RNNModel, self).__init__() # Call to the parent class (nn.Module) initializer
68+
# self.ntoken = ntoken # Number of tokens (vocabulary size)
69+
# self.drop = nn.Dropout(dropout) # Dropout layer to prevent overfitting
70+
# self.encoder = nn.Embedding(ntoken, ninp) # Embedding layer to convert tokens to vectors
71+
72+
# # Conditional initialization of the RNN based on the rnn_type
73+
# if rnn_type in ['LSTM', 'GRU']:
74+
# # Use PyTorch's built-in LSTM or GRU if specified
75+
# self.rnn = getattr(nn, rnn_type)(ninp, nhid, nlayers, dropout=dropout)
76+
# else:
77+
# # For RNN_TANH or RNN_RELU, manually specify nonlinearity
78+
# try:
79+
# nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
80+
# except KeyError as e:
81+
# # Handle case where rnn_type is none of the accepted values
82+
# raise ValueError("Invalid `--model` option supplied. Options are ['LSTM', 'GRU', 'RNN_TANH', 'RNN_RELU']") from e
83+
# self.rnn = nn.RNN(ninp, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
84+
85+
# self.decoder = nn.Linear(nhid, ntoken) # Linear layer to map hidden states to vocabulary size for output
86+
87+
# # Optional: tie encoder and decoder weights
88+
# if tie_weights:
89+
# # Ensures that nhid and emsize (input size to the embeddings) are the same when weights are tied
90+
# if nhid != ninp:
91+
# raise ValueError('When using the tied flag, nhid must be equal to emsize')
92+
# self.decoder.weight = self.encoder.weight
93+
94+
# self.init_weights() # Initialize weights
95+
96+
# # Save important parameters
97+
# self.rnn_type = rnn_type
98+
# self.nhid = nhid
99+
# self.nlayers = nlayers
100+
101+
# def init_weights(self):
102+
# """Initializes weights"""
103+
# initrange = 0.1
104+
# nn.init.uniform_(self.encoder.weight, -initrange, initrange) # Uniformly initialize encoder weights
105+
# nn.init.zeros_(self.decoder.bias) # Initialize decoder biases to zero
106+
# nn.init.uniform_(self.decoder.weight, -initrange, initrange) # Uniformly initialize decoder weights
107+
108+
# def forward(self, input, hidden):
109+
# """Defines the forward pass"""
110+
# emb = self.drop(self.encoder(input)) # Encode input and apply dropout
111+
# output, hidden = self.rnn(emb, hidden) # Pass through RNN
112+
# output = self.drop(output) # Apply dropout to RNN output
113+
# decoded = self.decoder(output) # Decode RNN output to token space
114+
# decoded = decoded.view(-1, self.ntoken) # Reshape for log_softmax
115+
# return F.log_softmax(decoded, dim=1), hidden # Return log probabilities and hidden state
116+
117+
# def init_hidden(self, bsz):
118+
# """Initializes hidden state"""
119+
# weight = next(self.parameters()).data # Get data tensor of the first parameter
120+
# if self.rnn_type == 'LSTM':
121+
# # For LSTM, initialize both hidden and cell states
122+
# return (weight.new_zeros(self.nlayers, bsz, self.nhid),
123+
# weight.new_zeros(self.nlayers, bsz, self.nhid))
124+
# else:
125+
# # For other RNN types, only initialize hidden state
126+
# return weight.new_zeros(self.nlayers, bsz, self.nhid)
127+
64128
# Temporarily leave PositionalEncoding module here. Will be moved somewhere else.
129+
130+
131+
#For Transfomers
65132
class PositionalEncoding(nn.Module):
66133
r"""Inject some information about the relative or absolute position of the tokens in the sequence.
67134
The positional encodings have the same dimension as the embeddings, so that the two can be summed.
@@ -142,3 +209,5 @@ def forward(self, src, has_mask=True):
142209
output = self.encoder(src, mask=self.src_mask)
143210
output = self.decoder(output)
144211
return F.log_softmax(output, dim=-1)
212+
213+

0 commit comments

Comments
 (0)