StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
1892364
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
0
CommunityOwnedDate
2009-12-12T05:03:17.997
CreationDate
2009-12-12T05:03:17.997
FavoriteCount
0
LastActivityDate
2009-12-12T05:03:17.997
LastEditDate
LastEditorUserId
0
OwnerUserId
219398
ParentId
1521306
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
I think the idea of what amount of editing a line that can be done while it remains a descendent of some previously written line is very subjective, and based on context, both things that a computer cannot work with. You'd have to specify some sort of configurable minimum similarity on lines in your program I think... The other problem is that it is entirely possible for two identical lines to be written completely independently (for example incrementing the value of some variable), and this will be be quite a common thing, so your desired algorithm won't really give truthful or useful information about a line quite often. I would like to suggest an algorithm for this though (which makes tons of hopefully obvious assumptions by the way) so here goes: <pre><code>Convert both texts to lists of lines Copy the lists and Strip all whitespace from inside of each line Delete blank lines from both lists Repeat Do a Levenshtein distance from the old to new lists ... ... keeping all intermediate data Find all lines in the new text that were matched with old lines Mark the line in both new/old original lists as having been matched Delete the line from the new text (the copy) Optional: If some matched lines are in a contiguous sequence ... ... in either original text assign them to a grouping as well! Until there is nothing left but unmatchable lines in the new text Group together sequences of unmatched lines in both old and new texts ... ... which are contiguous in the original text Attribute each with the line match before and after Run through all groups in old text If any match before and after attributes with new text groups for each //If they are inside the same area basically Concatenate all the lines in both groups (separately and in order) Include a character to represent where the line breaks are Repeat Do a Levenshtein distance on these concatenations If there are any significantly similar subsequences found //I can't really define this but basically a high proportion //of matches throughout all lines involved on both sides For each matched subsequence Find suitable newline spots to delimit the subsequence Mark these lines matched in the original text //Warning splitting+merging of lines possible //No 1-to-1 correspondence of lines here! Delete the subsequence from the new text group concat Delete also from the new text working list of lines Until there are no significantly similar subsequences found Optional: Regroup based on remaining unmatched lines and repeat last step //Not sure if there's any point in trying that at the moment Concatenate the ENTIRE list of whitespaced-removed lines in the old text Concatenate the lines in new text also (should only be unmatched ones left) //Newline character added in both cases Repeat Do Levenshtein distance on these concatenations Match similar subsequences in the same way as earlier on //Don't need to worry deleting from list of new lines any more though //Similarity criteria should be a fair bit stricter here to avoid // spurious matchings. Already matched lines in old text might have // even higher strictness, since all of copy/edit/move would be rare While you still have matchings //Anything left unmatched in the old text is deleted stuff //Anything left unmatched in the new text is newly written by the author Print out some output to show all the comparing results! </code></pre> Well, hopefully you can see the basics of what I mean with that completely untested algorithm. Find obvious matches first, and verbatim moves of chunks of decreasing size, then compare stuff that's likely to be similar, then look for anything else which is similar, but both modified and moved: probably just coincidentally similar. Well, if you try implementing this, tell me how it works out, and what details you changed, and what kind of assignments you made to the various variables involved... I expect there will be some test cases where it works brilliantly and others where it just abyssmally fails due to some massive oversight. The idea is that most stuff will be matched before you get to the inefficient final loop, and indeed the previous one
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POIs there a diff algorithm that preserves line ownership
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USKernelJ
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.