Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'Build a service to count characters, words and paragraphs' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Build a service to count characters, words and paragraphs
Authored by: murgatroyd on May 08, '12 04:49:29AM
Years past I wrote a tome with over 150,000 words. I wanted to know how many UNIQUE WORDS I hand employed in my masterpiece. For instance, of those 150,000 words there are 13,546 'the' words, 17,004 'and' words, 11,098 'for' words, etc., employed. I wrote an AppleScript:
(*
	the script functions till the final counts are displayed in Results panel, 
		then errors attempting to display properly.
	Applescript by Hanaan Rosenthal is being checked out today from SIBL, NYPL.
*)

set xrk to true as boolean

set {ten, kIndex, pk} to {10 as integer, 0 as integer, 0 as integer}

set {k24, k, k1, k11} to {"", {}, {}, {}}

set {wordIndex, h, kCount} to {0 as integer, 1 as number, 0 as integer}


set pathDM to (path to desktop folder as string) -- & "UniqueWords:")
set pathToDM to (pathDM & "treatise1.txt") as string
set kFile to (pathDM & "folder1:file1") as string

tell application "TextEdit"
	tell document 1
		set ParagraphCount to count (get paragraphs of it)
		
		--if file pathToDM exists then
		--activate it
		(*	else
		open pathToDM as document type text 
			with password "4DearMergatroid!"
	end if
*)
		
		log "Paragraph Count = " & ParagraphCount
		if ParagraphCount is less than 10 then set h2 to 1 as integer
		if ParagraphCount is greater than or equal to 10 and ParagraphCount is less than 50 then set h2 to 5 as integer
		if ParagraphCount is greater than or equal to 50 and ParagraphCount is less than 500 then set h2 to 10 as integer
		if ParagraphCount is greater than or equal to 500 then set h2 to 15 as integer
		
		set textReturn to false
		repeat while not textReturn
			try
				display dialog "Start counting " & return 
					& "from what numbered paragraph ...?" default answer "1" buttons {"cancel", "set"} 
					default button "set" with icon note
				set h to text returned of result as integer
				set textReturn to true
			on error
				display dialog 
					"Bye-bye" giving up after 1
				--close document pathToDM
				return
				
			end try
			
		end repeat
		if textReturn is false then display dialog "Bye." giving up after 2
		
		-- preliminaries taken care of, start the ball rolling ...	
		
		repeat while xrk is true
			if (h + h2) is greater than ParagraphCount then
				set h3 to ParagraphCount as integer
				set xrk to false
			else
				set h3 to (h + h2) as integer
			end if
			
			set k11 to paragraphs h thru h3
			
			repeat with kword in k11
				set k1 to k1 & (characters of kword) as string -- all characters in k1
			end repeat
			set k11 to {""}
			set wordIndex to (count words of k1) -- count the words in string k1
			
			set kIndex to (kIndex + wordIndex)
			try
				repeat with j from 1 to wordIndex
					if k is {""} then set k to word j of k1
					
					if word j of k1 is not in k then
						set end of k to word j of k1
						-- log " This is k -> " & k & " "
					end if
					
				end repeat
				
				set pk to pk + 1
				log "list " & ": " & pk & tab 
					& "total words: " & kIndex & tab 
					& "unique: " & (count k)
				set k1 to {""}
				set h to (h3 + 1)
				-- set h2 to (h2 + h)
			end try
		end repeat
		
		activate "Appleworks 6"
		
		set kpk2 to ((current date) & return & 
			return & "There were " & kIndex & " words counted ..." & 
			return & "with " & (count k) & " unique words.") as string
		
		set kpk1 to kpk2 & return & return & 
			"A dated record was created " & 
			"in UniqueWords:folder1:file1" as string
		
		select menu item 1 of menu item "New" of menu "File"
		make new text at beginning of document 1 with data kpk2
		save document 1 in kFile as file of type("TEXT")
		close document 1
		
		display dialog kpk1 buttons "Ok" default button 1
		
	end tell
end tell


log "Logging off ..." & tab & "byeeeeee."
Now what I was ultimately searching for with the script was a unique word count. Of those 150,000+ words inside the tome, a specific number, say 17,453 words were unique. I never got the script to function. Tomes with unique words under a count of 4096 and the script functions. Above that number and the script will crash. Each unique word has to have its own array If I recollect correctly the problem was the default number of arrays in AppleScript. Once the number of unique words counted by the script began to tally past ? 4096 was it? the default size limit of an AppleScript array was compromised and the script crashed. I also gave up working on perfecting the script at this time. If anyone is up for a challenge, a UniqueWords script, here's one. Think about it: a marker of one's ability to express themselves. A marker of ones command of the english or any language. The number (and information gleaned) is similar to an IQ score at certain K-12 to the adult stages in ones life. (I'd like half of a Nobel Prize winnings should anyone take this idea to the limit. It was MY idea that got you this fame and fortune, mind you :)

[ Reply to This | # ]
Build a service to count characters, words and paragraphs
Authored by: TvE on May 19, '12 10:28:37AM

What's wrong with "sort -u" to get the unique words and then "wc" to count the actual number - should that not work (once you have each word on a seperate line).

Or have I misunderstood the goal?



[ Reply to This | # ]
Build a service to count characters, words and paragraphs
Authored by: murgatroyd on May 19, '12 04:07:38PM

Thanks for the quick-thought response but you misunderstood the goal.

---
Someday, Murgatroyd will live !



[ Reply to This | # ]
Build a service to count characters, words and paragraphs
Authored by: murgatroyd on May 19, '12 04:13:32PM

I am of the belief an AppleScript array was unable hold more than (512 X 8) 4,096 unique numbers. Any larger number of unique words in any tome crashes any AppleScript. It's a software bug with arrays.

---
Someday, Murgatroyd will live !



[ Reply to This | # ]